Finding 2016 in pi

2016 appears in π starting at the 7173rd decimal place:

You can confirm this with Mathematica or Wolfram Alpha:

        Mod[ Floor[10^7177 pi] , 10000]

I found it using the following Python code:

        >>> from sympy import pi
        >>> digits = str(pi.evalf(10000))[2:]
        >>> digits.find('2016')
        7173

By the way, it’s also true that 2016 = 1 + 2 + 3 + … + 63.

Let go of something every Thursday

I heard of someone [1] who had a commitment to let go of something every Thursday. I don’t know anything about how they carried that out. It could mean throwing out or donating to charity a physical object each Thursday. Or maybe it could be handing over a responsibility or letting go of an ambition. It could be a combination, such as getting rid of an object that is a reminder of something intangible that you want to let go of.

This may mean reducing your total inventory of objects or obligations, or it could be simply turnover, making room for new things.

Getting rid of an obligation is not necessarily irresponsible, nor is letting go of an ambition necessarily lazy. Letting go of one obligation to take on another could be very responsible. Letting go of one ambition to pursue another could be a lot of work.

Related posts

[1] When I wrote this post I couldn’t remember who this was. I recalled later that it was Bob Goff.

Automate to save mental energy, not time

Automation doesn’t always save as much time or effort as we expect.

The xkcd cartoon above is looking at automation as an investment. Does the work I put in now eventually save more work than I put into it? Automation may be well worth it even if the answer is “no.”

Automation can be like a battery as well as an investment. Putting energy into batteries is a bad investment; you’ll never get out as much energy as you put in. But that’s not why you put energy into batteries. You put energy in while you can so you can use some of that energy later when you need it.

Write automation scripts when you have the time, energy, and motivation to do so and when nothing else is more important. (Or nothing is more interesting, if you’re looking for a way to procrastinate without feeling too guilty. This is called “moral compensation.”) You may indeed save more work than you put into writing the scripts. But you also may save mental energy just when you need it.

Suppose it takes you an hour to write a script that only saves you two minutes later. If that two minutes would have derailed your concentration at a critical moment, but it didn’t because you had the script, writing the script may have paid for itself, even though you invested 60 minutes to save 2 minutes.

If your goal is to save mental energy, not time, you have a different strategy for automation. If a script executes faster than a manual process, but it takes a long time to remember where to find the script and how to run it, it may be a net loss. The less often you run a script, the chattier the interface should be.

The same considerations apply to learning third party software. I suspect the time I’ve put into learning some features of Emacs, for example, will not pay for itself in terms of time invested versus time saved. But I’ve invested leisure time to save time when I’m working hard, not to save keystrokes but to save mental energy for the project at hand.

Related post: How much does typing speed matter?

The Dirac comb or Sha function

shaThe sha function, also known as the Dirac comb, is denoted with the Cyrillic letter sha (Ш, U+0428). This letter was chosen because it looks like how people visualize the function, a long series of vertical spikes. The function is called the Dirac comb for the same reason. This function is very important in Fourier analysis because it relates Fourier series and Fourier transforms. It relates sampling and periodization.  It’s its own Fourier transform, and with a few qualifiers discussed later, the only such function.

The Ш function, really the Ш distribution, is defined as

sha(x) = \sum_{n=-\infty}^\infty \delta(x-n)

Here δ(xn) is the Dirac delta distribution centered at n. The action of δ(xn) on a test function is to evaluate that function at n. You can envision Ш as an infinite sequence of spikes, one at each integer. The action of Ш on a test function is to add up its values at every integer.

Sampling

The product of Ш with a function f is a new distribution whose action on a test function φ is the sum of f φ over all integers. Or you could think of the distribution as a sort of clothesline on which to hang the sampled values of f, much the way a generating function works.

Periodizing

Next let’s look at a function f that lives on [0, 1], i.e. is zero everywhere outside the unit interval. The convolution of f with δ(xn) is f(xn), i.e. a copy of f shifted over to live on the interval [n, n+1]. So by taking the convolution with Ш, we create copies of f all over the real line. We’ve made f into a periodic function. So instead of saying “the function f extended to create a periodic function” you can simply say f*Ш.

Fourier transform

Now let’s think about the Fourier transform of Ш. The Fourier transform of δ(x) is 1, i.e. the function equal to 1 everywhere [1].  (The more concentrated a function is, the more spread out its Fourier transform. So if you have an infinitely concentrated function δ, its Fourier transform is perfectly flat, 1. You can calculate the transform rigorously, this is the intuition.) If you shift a function by n, you rotate its Fourier transform by exp(-2πinω). So we can compute the transform of Ш:

Fourier transform of sha = \sum_{n=-\infty}^\infty \exp(-2\pi i n \omega)

This equation only makes sense in terms of distributions. The right hand side does not converge in the classical sense; the individual terms don’t even go to zero, since each term has magnitude 1. So what kind of distribution is this thing on the right side? It is in fact the Ш function again, though this is not obvious.

To see that the exponential sum is actually the Ш function, i.e. that Ш is its own Fourier transform, we need to back up a little bit and define Fourier transform of a distribution. As usual with distributions, we take a classical theorem and turn it into a definition in a broader context.

For absolutely integrable functions, we have

\int_{-\infty}^ \infty \hat{f}(x) \, \varphi(x) \, dx = \int_{-\infty}^ \infty f(x) \, \hat{ \varphi }(x) \, dx

where the hat on top of a function indicates its Fourier transform. Inspired by the theorem above, we define the Fourier transform of a distribution f to be the functional whose action on a test function φ is given below.

 \hat{f} : \varphi \mapsto \int_{-\infty}^ \infty f(x) \, \hat{ \varphi }(x) \, dx

As we noted in a previous post, the integral above can be taken literally if f is a distribution associated with an ordinary function, but in general it means the application of the linear functional to the test function.

As a distribution, exp(-2πinω) acts on a test function φ by integrating against it. From the definition of a (classical) Fourier transform, this gives the Fourier transform of φ evaluated at n. So the Fourier transform of Ш acts on φ by summing the values of φ’s Fourier transform over all integers. By the Poisson summation formula, this is the same as summing the values of φ itself over all integers. Which is the same as applying Ш. So the Fourier transform of Ш has the same effect on test functions as Ш. In other words, Ш is its own Fourier transform.

Uniqueness

We haven’t been explicit about where our test functions come from. We require that xn φ(x) goes to zero as x goes to ±∞ for any positive integer n. These are called functions of rapid decay. And the distributions we define as linear functionals on such test functions are called tempered distributions.

The Ш distribution is essentially unique. Any tempered distribution with period 1 that equals its own Fourier transform must be a multiple of Ш.

***

[1] All Fourier transform calculations here use the convention I call (-1, τ, 1) in these notes on various definitions. This may be the most common definition, though there are several minor variations in common use.

The longer it has taken, the longer it will take

Suppose project completion time follows a Pareto (power law) distribution with parameter α. That is, for t > 1, the probability that completion time is bigger than t is t−α. (We start out time at t = 1 because that makes the calculations a little simpler.)

Now suppose we know that a project has lasted until t0 so far. Then the expected finish time is αt0/(α − 1) and so the expected additional time is t0/(α − 1). Note that both are proportional to t0. So the longer it has taken, the longer it will take. If the project is running late, you can expect the time remaining to be even more than the expected time before the project started. The finish line is moving away from you!

For example, suppose α = 2 (in applications of power laws, α is often between 1 and 3) and you’re measuring time in years. When the project starts at t = 1, it is expected to take one year, until t = 2. Now suppose you’re starting the second year and the project isn’t done. Now it’s expected to finish at t = 4, two more years. When you started, the project was supposed to take a year. One year later, it has taken a year, and should be expected to take two more years. I said “should be expected” rather than “is expected” because no one would believe such an estimate. (Ever heard of the Big Dig? Or other megaprojects?)

Note that we have computed the conditional probability given only the time it has taken so far, and no other information. If you know more, for example maybe you know that some specific pieces have been completed, then you should use that information.

This is related to the Lindy effect. The longer a cultural artifact has been around, the longer it is expected to last into the future.

Two meanings of distribution

There are a couple common uses of the term distribution in math. The most familiar is probability distribution, such as a beta distribution, a Poisson distribution, etc. Less familiar but still common is distributions in the sense of generalized functions, like the Dirac delta distribution. Anybody with much exposure to math will have heard of a probability distribution. Generalized functions are common knowledge in some areas of math such as differential equations or harmonic analysis, but mathematicians in other areas, say graph theory, may not have heard of them.

This post briefly answers two questions:

  1. What is a distribution as in a generalized function?
  2. What does it have to do with a probability distribution?

Most of this post will deal with the first question, but we’ll circle back to the second question by the end.

You may have heard that a Dirac delta function δ(x) is an “infinitely concentrated” function or a point mass. Or you may have heard some of the rules for working with it, such as that it is infinite at the origin, zero everywhere else, and integrates to 1. But no function can actually do what the delta function is said to do. Measure theory will let functions take on actual infinite values, but the value of a function at a single point, even if that value is infinite, cannot matter to its integral. Even putting that aside, if you say δ(x) is infinite at 0 and integrates to 1, then how do you make sense of expressions like 2 δ? Is it twice as infinite at 0, whatever that means? Is it twice as zero everywhere else? And what on earth could it mean to take a derivative or Fourier transform of δ(x)?

Generalized functions are a way to define things like the δ distribution rigorously. They let you preserve some of the intuitive/magical properties you want while also giving rules to keep you from getting into trouble. Regarding the paragraph above, the theory will let you integrate, differentiate, and take the Fourier transform of δ(x) but it won’t let you do things like say that 2δ = δ since 2×0 = 0 and 2×∞ = ∞.

Generalized functions are just functions, but not functions of real numbers. They are linear functions that take other functions [1] and return real numbers. The functions they act on are typically called test functions. To reduce the confusion of having different kinds of functions under discussion, linear functions that act on other functions are usually called functionals. A functional is just a function, a linear function from test functions to real numbers, but it helps to give it a different name.

You can write the action of a distribution f on a test function φ as if it were an integral:

f : \varphi \mapsto \int f(x)\, \varphi(x) \, dx

If f is a function, you can take the integral literally. Distributions generalize functions by associating a function with the linear operator that acts on a test function by multiplying by it and integrating.

But distributions include other kinds of linear functionals, in which case the integral expression is not literal. The δ distribution, for example, acts on a test function φ by returning φ(0). And here’s the connection to the intuitive idea of a function infinitely concentrated at 0. If a function integrates to 1, and is very concentrated near 0, then its integral when multiplied by φ is approximately φ(0).  You could make this rigorous and define generalized functions as limits of functions, but that approach is something of a dead end. The theory is much simpler using the linear functional definition.

How does this let you differentiate things like the Dirac delta? In a nutshell, you take what is a theorem for ordinary functions and turn it into a definition for generalized functions. I explain this in more detail here.

So the theory of distributions lets you use your intuition regarding “infinitely concentrated” functions and such. It also lets you carry out formal calculations, such as differentiating or taking the Fourier transform of distributions. But it also keeps you out of trouble. Back to our example above, what does 2δ mean? It’s simply the linear functional that takes a test function φ and returns 2 φ(0).

Now what does all this have to do with probability distributions? You can think of a probability density function as something that exists to be integrated. You find the probability of some event (set) by integrating a probability density over it. You find the expected value of some function by multiplying that function by a probability distribution and integrating. Likewise you could think of distributions in the sense of generalized functions as things that exist to be integrated. They act on test functions by being integrated against them, or by doing things analogous to integration that are more general.

People sometimes get confused because they look at probability densities outside of integrals and try to think of them is probabilities. They’re not. They are things you integrate to get probabilities. A probability density can, for example, be larger than 1, but a probability cannot. Likewise people sometimes get confused when they think of generalized functions on their own. If you give the generalized function something to act on, you’re more likely to be guided into doing the right thing. Distributions, whether probability distributions or generalized functions, act on other things.

***

[1] The space of test functions can vary. The most common choice is infinitely differentiable functions with compact support. But for Fourier analysis, the natural space of test functions consists of infinitely differentiable functions of rapid decay, i.e. functions φ such that xn φ(x) goes to zero as x goes to ±∞ for any positive integer n.

The reason is that the Fourier transform of such a function is another function of the same kind. Test functions of compact support aren’t suited for Fourier analysis because a function with compact support cannot have a Fourier transform with compact support. It’s related to the Heisenberg uncertainty principle: the more concentrated something is in the time domain, the less concentrated it is in the frequency domain. A signal can’t be time-limited and bandlimited.

Restarting @DSP_fact, ending @PerlRegex

I’m making a couple changes to my Twitter accounts.

First, I’m winding down @PerlRegex. I’ll stop tweeting there when my scheduled tweets run out. I suggest that everyone who has been following @PerlRegex start following @RegexTip instead. The latter is more general, but is mostly compatible with Perl.

Second, I’m reviving my @DSP_Fact. I stopped tweeting there a couple years ago, but I’d like to start posting there again. This time it’s going to be a little broader. I intend to include some material on acoustics, Fourier analysis (continuous and discrete), and maybe some other related material.

DSP_fact logo

Retooling

I was listening to a classic music station yesterday, and I heard the story of a professional pianist whose hand was injured in an accident. He then started learning trumpet and two years later he was a professional trumpeter. I didn’t catch the musician’s name.

I was not surprised that a professional in one instrument could become a professional in another, but I was surprised that he did it in only two years. It probably helped that he was no longer able to play piano; I imagine if he wanted to learn trumpet in addition to piano he would not have become so proficient so quickly.

If you go by the rule of thumb that it takes about 10 years to master anything, this professional pianist was 80% of the way to becoming a professional trumpeter before he touched a trumpet. Or to put it another way, 80% of being a professional musician who plays trumpet is becoming a professional musician.

Transferable skills are more difficult to acquire and more valuable than the context in which they’re exercised.

Related posts

Sinc and Jinc sums

In the previous post, we looked at an elegant equation involving integrals of the sinc function and computed the corresponding integrals for the jinc function.

\int_{-\infty}^\infty \mbox{sinc}(x) \, dx = \int_{-\infty}^\infty \mbox{sinc}^2(x) \, dx = \pi

It turns out the analogous equation holds for sums as well:

\sum_{n=-\infty}^\infty \mbox{sinc}(n) = \sum_{n=-\infty}^\infty \mbox{sinc}^2(n) = \pi

As before, we’d like to compute these two sums and see whether we can compute the corresponding sums for the jinc function.

The Poisson summation formula says that a function and its Fourier transform produce the same sums over the integers:

\sum_{n=-\infty}^\infty f(n) = \sum_{n=-\infty}^\infty \hat{f}(n)

Recall from the previous post that the Fourier transform of sinc is the function π box(π x) where the box function is 1 on [−1/2, 1/2] and zero elsewhere. The only integer n with πn inside [−1/2, 1/2] is 0, so the sum of sinc(n) over the integers equals π. A very similar argument shows that the sum of jinc(n) over the integers equals its Fourier transform at 0, which equals 2.

Let tri(x) be the triangle function, defined to be 1 − |x| for −1 < x < 1 and 0 otherwise. Then the Fourier transform of tri(x) is sinc2(π ω) and so π tri(π x) and sinc2 are Fourier transform pairs. The Poisson summation formula says the sum of sinc2 over the integers is the sum of π tri(π x) over the integers, which is π.

I don’t know the Fourier transform of jinc2 and doubt it’s easy to compute. Maybe the sum could be computed more easily without Fourier transforms, e.g. using contour integration.