Numerators of harmonic numbers

Harmonic numbers

The nth harmonic number, Hn, is the sum of the reciprocals of the integers up to and including n. For example,

H4 = 1 + 1/2 + 1/3 + 1/4 = 25/12.

Here’s a curious fact about harmonic numbers, known as Wolstenholme’s theorem:

For a prime p > 3, the numerator of Hp-1 is divisible by p2.

The example above shows this for p = 5. In that case, the numerator is not just divisible by p2, it is p2, though this doesn’t hold in general. For example, H10 = 7381/2520. The numerator 7381 is divisible by 112 = 121, but it’s not equal to 121.

Generalized harmonic numbers

The generalized harmonic numbers Hn,m are the sums of the reciprocals of the first n positive integers, each raised to the power m. Wolstenholme’s theorem also says something about these numbers too:

For a prime p > 3, the numerator of Hp-1,2 is divisible by p.

For example, H4,2 = 205/144, and the numerator is clearly divisible by 5.

Computing with Python

You can play with harmonic numbers and generalized harmonic numbers in Python using SymPy. Start with the import statement

from sympy.functions.combinatorial.numbers import harmonic

Then you can get the nth harmonic number with harmonic(n) and generalized harmonic numbers with harmonic(n, m).

To extract the numerators, you can use the method as_numer_denom to turn the fractions into (numerator, denominator) pairs. For example, you can create a list of the denominators of the first 10 harmonic numbers with

[harmonic(n).as_numer_denom()[0] for n in range(10)]

What about 0?

You might notice that harmonic(0) returns 0, as it should. The sum defining the harmonic numbers is empty in this case, and empty sums are defined to be zero.

High-dimensional integration

Numerically integrating functions of many variables almost always requires Monte Carlo integration or some variation. Numerical analysis textbooks often emphasize one-dimensional integration and, almost as a footnote, say that you can use a product scheme to bootstrap one-dimensional methods to higher dimensions. True, but impractical.

Traditional numerical integration routines work well in low dimensions. (“Low” depends on context, but let’s say less than 10 dimensions.) When you have the choice of a well-understood, deterministic method, and it works well, by all means use it. I’ve seen people who are so used to problems requiring Monte Carlo methods that they use these methods on low-dimensional problems where they’re not the most efficient (or robust) alternative.

Monte Carlo

Except for very special integrands, high dimensional integration requires either Monte Carlo integration or some variation such as quasi-Monte Carlo methods.

As I wrote about here, naive Monte Carlo integration isn’t practical for high dimensions. It’s unlikely that any of your integration points will fall in the region of interest without some sort of importance sampler. Constructing a good importance sampler requires some understanding of your particular problem. (Sorry. No one-size-fits-all black-box methods are available.)

Quasi-Monte Carlo (QMC)

Quasi-Monte Carlo methods (QMC) are interesting because they rely on something like a random sequence, but “more random” in a sense, i.e. more effective at exploring a high-dimensional space. These methods work well when an integrand has low effective dimension. The effective dimension is low when nearly all the action takes place on a lower dimensional manifold. For example, a function may ostensibly depend on 10,000 variables, while for practical purposes 12 of these are by far the most important. There are some papers by Art Owen that say, roughly, that Quasi-Monte Carlo methods work well if and only if the effective dimension is much smaller than the nominal dimension. (QMC methods are especially popular in financial applications.)

While QMC methods can be more efficient than Monte Carlo methods, it’s hard to get bounds on the integration error with these methods. Hybrid approaches, such as randomized QMC can perform remarkably well and give theoretical error bounds.

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) is a common variation on Monte Carlo integration that uses dependent random samples. In many applications, such as Bayesian statistics, you’d like to be able to create independent random samples from some probability distribution, but this is not practical. MCMC is a compromise. It produces a Markov chain of dependent samples, and asymptotically these samples have the desired distribution. The dependent nature of the samples makes it difficult to estimate error and to determine how well the integration estimates using the Markov chain have converged.

(Some people speak of the Markov chain itself converging. It doesn’t. The estimates using the Markov chain may converge. Speaking about the chain converging may just be using informal language, or it may betray a lack of understanding.)

There is little theory regarding the convergence of estimates from MCMC, aside from toy problems. An estimate can appear to have converged, while an important region of the integration problem remains unexplored. Despite its drawbacks, MCMC is the only game in town for many applications.

Consulting

If you’d like help with high dimensional integration, or any other kind of numerical integration, please contact me to discuss how we might work together.

Scientific computing in Python

Scientific computing in Python is expanding and maturing rapidly. Last week at the SciPy 2015 conference there were about twice as many people as when I’d last gone to the conference in 2013.

You can get some idea of the rapid develop of the scientific Python stack and its future direction by watching the final keynote of the conference by Jake VanderPlas.

I used Python for a while before I discovered that there were so many Python libraries for scientific computing. At the time I was considering learning Ruby or some other scripting language, but I committed to Python when I found out that Python has far more libraries for the kind of work I do than other languages do. It felt like I’d discovered a secret hoard of code. I expect it would be easier today to discover the scientific Python stack. (It really is becoming more of an integrated stack, not just a collection of isolated libraries. This is one of the themes in the keynote above.)

When people ask me why I use Python, rather than languages like Matlab or R, my response is that I do a mixture of mathematical programming and general programming. I’d rather do mathematics in a general programming language than do general programming in a mathematical language.

One of the drawbacks of Python, relative to C++ and related languages, is speed. This is a problem in languages like R as well. However, with Python there are ways to speed up code without completely rewriting it, such as Cython and Numba. The only reliable way I’ve found to make R much faster, is to rewrite it in another language.

Another drawback of Python until recently was that data manipulation and exploration were not as convenient as one would hope. However, that has changed due to developments such as Pandas, initiated by Wes McKinney. For more on how that came to be and where it’s going, see his keynote from the second day of SciPy 2015.

It’s not clear why Python has become the language of choice for so many people in scientific computing. Maybe if people like Travis Oliphant had decided to use some other language for scientific programming years ado, we’d all be using that language now. Python wasn’t intended to be a scientific programming language. And as Jake VanderPlas points out in his keynote, Python still is not a scientific programming language, but the foundation for a scientific programming stack. Maybe Python’s strength is that it’s not a scientific language. It has drawn more computer scientists to contribute to the core language than it would have if it had been more of a domain-specific language.

* * *

If you’d like help moving to the Python stack, please let me know.

Contact info diagram

My contact information arranged into a diagram:

Yesterday at SciPy 2015 Allen Downey did something similar for his contact info and gave me the idea for the image above.

LinkedIn doesn’t quite fit; you have to know that LinkedIn profiles stick linkedin.com/in/ before the user name.

When the last digits of powers don’t change

If you raise any integer to the fifth power, its last digit doesn’t change. For example, 25 = 32.

It’s easy to prove this assertion by brute force. Since the last digit of bn only depends on the last digit of b, it’s enough to verify that the statement above holds for 0, 1, 2, …, 9.

Although that’s a valid proof, it’s not very satisfying. Why fifth powers? We’re implicitly working in base 10, as humans usually do, so what is the relation between 5 and 10 in this context? How might we generalize it to other bases?

The key is that 5 = φ(10) + 1 where φ(n) is the number of positive integers less than n and relatively prime to n.

Euler discovered the φ function and proved that if a and m are relatively prime, then

aφ(m) = 1 (mod m)

This means that aφ(m) – 1 is divisible by m. (Technically I should use the symbol ≡ (U+2261) rather than = above since the statement is a congruence, not an equation. But since Unicode font support on various devices is disappointing, not everyone could see the proper symbol.)

Euler’s theorem tells us that if a is relatively prime to 10 then a4 ends in 1, so a5 ends in the same digit as a. That proves our original statement for numbers ending in 1, 3, 7, and 9. But what about the rest, numbers that are divisible by 2 or 5? We’ll answer that question and a little more general one at the same time. Let m = αβ where α and β are distinct primes. In particular, we could choose α = 2 and β = 5; We will show that

aφ(m)+1 = a (mod m)

for all a, whether relatively prime to m or not. This would show in addition, for example, that in base 15, every number keeps the same last “digit” when raised to the 9th power since φ(15) = 8.

We only need to be concerned with the case of a being a multiple of α or β since Euler’s theorem proves our theorem for the rest of the cases. If a = αβ our theorem obviously holds, so assume a is some multiple of α, a = kα with k less than β (and hence relatively prime to β).

We need to show that αβ divides (kα)φ(αβ)+1kα. This expression is clearly divisible by α, so the remaining task is to show that it is divisible by β. We’ll show that (kα)φ(αβ) – 1 is divisible by β.

Since α and k are relatively prime to β, Euler’s theorem tells us that αφ(β) and kφ(β) are congruent to 1 (mod β). This implies that kαφ(β) is congruent to 1, and so kαφ(α)φ(β) is also congruent to 1 (mod β). One of the basic properties of φ is that for relatively prime arguments α and β, φ(αβ) = φ(α)φ(β) and so we’re done.

Exercise: How much more can you generalize this? Does the base have to be the product of two distinct primes?

No, I’m not a bot.

Periodically someone on Twitter will suggest that one of my Twitter accounts is a bot. Others will reply in the second person plural, suggesting that there’s a group of people behind one of the accounts. These accounts aren’t run by a bot or a committee, just me.

I do use software to schedule my tweets in advance. Most of the tweets from my personal account are live. Most of the tweets from my topic accounts are scheduled, though some are live. All replies are manual, not automated, and I don’t scrape content from anywhere.

Occasionally I read the responses to these accounts and sometimes I reply. But with over half a million followers (total, not unique) I don’t try to keep up with all the responses. If you’d like to contact me, you can do so here. That I do keep up with.

claimtoken-5595a093e11d8

The name we give to bright ideas

From The Book of Strange New Things:

… I said that if science could come up with something like the Jump it could surely solve a problem like that. Severin seized hold of that word, “science.” Science, he said, is not some mysterious larger-than-life force, it’s just the name we give to bright ideas that individual guys have when they’re lying in bed at night, and that if the fuel thing bothered me so much, there was nothing stopping me from having a bright idea to solve it …

Algorithmic wizardry

Last week I wrote a short commentary on James Hague’s blog post Organization skills beat algorithmic wizardry. This week that post got more traffic than my server could handle. I believe it struck a chord with experienced software developers who know that the challenges they face now are not like the challenges they prepared for in school.

Although I completely agree that “algorithmic wizardry” is over-rated in general, my personal experience has been a little different. My role on projects has frequently been to supply a little bit of algorithmic wizardry. I’ve often been asked to look into a program that is taking too long to run and been able to speed it up by an order of magnitude or two by improving a numerical algorithm. (See an example here.)

James Hague says that “rarely is there some … algorithm that casts a looming shadow over everything else.” I believe he is right, though I’ve been called into projects precisely on those rare occasions when an algorithm does cast a shadow over everything else.

The Nickel Tour

If you’re new to this blog, welcome! Let me show you around.

Here are some of the most popular posts on this site and some other things I’ve written.

If you’d like to subscribe to this site you can do so by RSS or email. I also have a monthly newsletter.

You can find out more about me and my background here.

You can also find me on Twitter and Google+.

If you have any questions or comments, here’s my contact info.

The most important skill in software development

Here’s an insightful paragraph from James Hague’s blog post Organization skills beat algorithmic wizardry:

When it comes to writing code, the number one most important skill is how to keep a tangle of features from collapsing under the weight of its own complexity. I’ve worked on large telecommunications systems, console games, blogging software, a bunch of personal tools, and very rarely is there some tricky data structure or algorithm that casts a looming shadow over everything else. But there’s always lots of state to keep track of, rearranging of values, handling special cases, and carefully working out how all the pieces of a system interact. To a great extent the act of coding is one of organization. Refactoring. Simplifying. Figuring out how to remove extraneous manipulations here and there.

Algorithmic wizardry is easier to teach and easier to blog about than organizational skill, so we teach and blog about it instead. A one-hour class, or a blog post, can showcase a clever algorithm. But how do you present a clever bit of organization? If you jump to the solution, it’s unimpressive. “Here’s something simple I came up with. It may not look like much, but trust me, it was really hard to realize this was all I needed to do.” Or worse, “Here’s a moderately complicated pile of code, but you should have seen how much more complicated it was before. At least now someone stands a shot of understanding it.” Ho hum. I guess you had to be there.

You can’t appreciate a feat of organization until you experience the disorganization. But it’s hard to have the patience to wrap your head around a disorganized mess that you don’t care about. Only if the disorganized mess is your responsibility, something that means more to you than a case study, can you wrap your head around it and appreciate improvements. This means that while you can learn algorithmic wizardry through homework assignments, you’re unlikely to learn organization skills unless you work on a large project you care about, most likely because you’re paid to care about it.

Related posts:

AI Spring

Artificial intelligence, or at least the perception of artificial intelligence, has gone from disappointing to frightening in the blink of an eye. As Marc Andreessen said on Twitter this morning:

AI: From “It’s so horrible how little progress has been made” to “It’s so horrible how much progress has been made” in one step.

When I read this I thought of Pandora (the mythical figure, not the music service).

“Are you still working on opening that box? Any progress?”

“No, the lid just … won’t … budge … Oh wait, I think I got it.”

 

Related post: Why the robots aren’t coming in the way you expect by Mark Burgess

Ursula K. Le Guin has it backward

Ursula K. Le Guin is asking people to not buy books from Amazon because they market bestsellers, the literary equivalent of junk food. She said last week

I believe that reading only packaged microwavable fiction ruins the taste, destabilizes the moral blood pressure, and makes the mind obese.

I agree with that. That’s why I shop at Amazon.

If I liked to read best-selling junk food, I could find it at any bookstore. But I like to read less popular books, books I can only find from online retailers like Amazon. If fact, most of Amazon’s revenue comes from obscure books, not bestsellers.

Suppose I want to read something by, I don’t know, say, Ursula K. Le Guin. I doubt I could find a copy of any of her books, certainly not her less popular books, within 20 miles of my house, and I live in the 4th largest city in the US. There’s nothing by her in the closest Barnes and Noble. But I could easy find anything she’s ever written on Amazon.

If you’d like to support Amazon so they can continue to bring us fine authors like Ursula K. Le Guin, authors you can’t find in stores that mostly sell packaged microwavable fiction, you can buy one of the books mentioned on this blog from Amazon.

Reading equations forward and backward

There is no logical difference between writing A = B and writing B = A, but there is a psychological difference.

Equations are typically applied left to right. When you write A = B you imply that it may be useful to replace A with B. This is helpful to keep in mind when learning something new: the order in which an equation is written gives a hint as to how it may be applied. However, this way of thinking can also be a limitation. Clever applications often come from realizing that you can apply an equation in the opposite of the usual direction.

For example, Euler’s reflection formula says

Γ(z) Γ(1-z) = π / sin(πz).

Reading from left to right, this says that two unfamiliar/difficult things, values of the Gamma function, are related to a more familiar/simple thing, the sine function. It would be odd to look at this formula and say “Great! Now I can compute sines if I just know values of the Gamma function.” Instead, the usual reaction would be “Great! Now I can relate the value of Gamma at two different places by using sines.”

When we see Einstein’s equation

E = mc2

the first time, we think about creating energy from matter, such as the mass lost in nuclear fission. This applies the formula from left to right, relating what we want to know, an amount of energy, to what we do know, an amount of mass. But you could also read the equation from right to left, calculating the amount of energy, say in an accelerator, necessary to create a particle of a given mass.

Calculus textbooks typically have a list of equations, either inside the covers or in an appendix, that relate an integral on the left to a function or number on the right. This makes sense because calculus students compute integrals. But mathematicians often apply these equations in the opposite direction, replacing a number or function with an integral. To a calculus student this is madness: why replace a familiar thing with a scary thing? But integrals aren’t scary to mathematicians. Expressing a function as an integral is often progress. Properties of a function may be easier to see in integral form. Also, the integral may lend itself to some computational technique, such as reversing the order of integration in a double integral, or reversing the order to taking a limit and an integral.

Calculus textbooks also have lists of equations involving infinite sums, the summation always being on the left. Calculus students want to replace the scary thing, the infinite sum, with the familiar thing, the expression on the right. Generating functions turn this around, wanting to replace things with infinite sums. Again this would seem crazy to a calculus student, but it’s a powerful problem solving technique.

Differential equation students solve differential equations. They want to replace what they find scary, a differential equation, with something more familiar, a function that satisfies the differential equation. But mathematicians sometimes want to replace a function with a differential equation that it satisfies. This is common, for example, in studying special functions. Classical orthogonal polynomials satisfy 2nd order differential equations, and the differential equation takes a different form for different families of orthogonal polynomials. Why would you want to take something as tangible and familiar as a polynomial, something you might study as a sophomore in high school, and replace it with something as abstract and mysterious as a differential equation, something you might study as a sophomore in college? Because some properties, properties that you would not have cared about in high school, are more clearly seen via the differential equations.