# Fibonacci numbers and orthogonal polynomials

The famous Fibonacci numbers are defined by the initial conditions

F0 = 0,  F1 = 1

and the recurrence relation

Fn = Fn-1 + Fn-2

for n > 1.

The Fibonacci polynomials are defined similarly. The have the same initial conditions

F0(x) = 0, F1(x) = 1

but a slightly different recurrence relation

Fn(x) = x Fn-1(x) + Fn-2(x)

for n > 1.

Several families of orthogonal polynomials satisfy a similar recurrence relationship

Fn(x) = p(x) Fn-1(x) + c(n) Fn-2(x).

The table below gives the values of p(x) and c(n) for a few families of polynomials.

 Family p(x) c(n) P0 P1(x) Fibonacci x 1 0 1 Chebyshev T 2x -1 1 x Chebyshev U 2x -1 1 2x Hermite 2x 2 – 2n 1 2x

There are two common definitions of the Hermite polynomials, sometimes called the physicist convention and the probabilist convention. The table above gives the physicist convention. For the probabilist definition, change the 2’s to 1’s in the last row and leave the 1 alone.

(If you haven’t heard of orthogonal polynomials, you might be wondering “orthogonal to what?”. These polynomials are orthogonal in the sense of an inner product formed by multiplying two polynomials and a weight, then integrating. Orthogonal polynomials differ by the the weight and the limits of integration. For example, the (physicist) Hermite polynomials are orthogonal with weight function exp(-x2) and integrating over the entire real line. The probabilist Hermite polynomials are very similar, but use the standard normal distribution density exp(-x2/2)/√(2π) as the weight instead of exp(-x2).)

* * *

# NYT Book of Physics and Astronomy

I’ve enjoyed reading The New York Times Book of Physics and Astronomy, ISBN 1402793200, a collection of 129 articles written between 1888 and 2012. Its been much more interesting than its mathematical predecessor. I’m not objective — I have more to learn from a book on physics and astronomy than a book on math — but I think other readers might also find this new book more interesting.

I was surprised by the articles on the bombing of Hiroshima and Nagasaki. New York Times reporter William Lawrence was allowed to go on the mission over Nagasaki. He was not on the plane that dropped the bomb, but was in one of the other B-29 Superfortresses that were part of the mission. Lawrence’s story was published September 9, 1945, exactly one month later. Lawrence was also allowed to tour the ruins of Hiroshima. His article on the experience was published September 5, 1945. I was surprised how candid these articles were and how quickly they were published. Apparently military secrecy evaporated rapidly once WWII was over.

Another thing that surprised me was that some stories were newsworthy more recently than I would have thought. I suppose I underestimated how long it took to work out the consequences of a major discovery. I think we’re also biased to think that whatever we learned as children must have been known for generations, even though the dust may have only settled shortly before we were born.

# Concepts, explosions, and developments

Paul Halmos divided progress in math into three categories: concepts, explosions, and developments. This was in his 1990 article “Has progress in mathematics slowed down?”. (His conclusion was no.) This three-part classification not limited to math and could be useful in other areas.

Concepts are organizational ideas, frameworks, new vocabulary. Some of his examples were category theory and distributions (generalized functions).

Explosions solve old problems and generate a lot of attention among mathematicians and in the popular press. As Halmos puts is, “hot news not only for the Transactions, but also for the Times for a day, for Time for a week, and for student mathematics clubs for many months.” He cites the solution to the Four Color Theorem as an example. He no doubt would have cited Fermat’s Last Theorem had he written his article five years later.

Developments are “deep and in some cases even breathtaking developments (but not explosions) of the kind that might not make the Times, but could possibly get Fields medals for their discoverers.” One example he gives is the Atiyah-Singer index theorem.

The popular impression of math and science is that progress is all about explosions though it’s more about concepts and developments.

Related: Birds and Frogs by Freeman Dyson [pdf]

# Tragedies and messes

Dorothy Parker said “It’s not the tragedies that kill us; it’s the messes.”

Sometime that’s how I feel about computing. I think of messes such as having to remember that arc tangent is atan in R and Python, but arctan in NumPy and a in bc. Or that C, Python, and Perl use else if, elif, and elsif respectively. Or did I switch those last two?

These trivial but innumerable messes keep us from devoting our full energy to bigger problems.

One way to reduce these messes is to use fewer tools. Then you know less to be confused about. If you only use Python, for example, then elif is just how it is. But knowing more tools is worth the added mess, up to a point. Past some point, however, new tools add more mental burden than utility. You have to find the optimal combination of tools for yourself, and that combination will change over time.

To use fewer tools, you may need to use more complex tools. Maybe you can replace a list of moderately complex but inconsistent tools with one tool that is more complex but internally consistent.

# Standing desks considered harmful

Several studies have suggested that sitting all day is bad for you. So a growing number of people have decided to stand all day instead. This may not be any better, and may be worse. The Healthy Programmer quotes Dr. Alan Hedge saying that standing desks increase your risks of developing carotid atherosclerosis and varicose veins, and decrease your ability to carry out fine motor tasks.

I don’t know how much to make of Dr. Hedge’s remarks. I take most fitness studies with a grain of salt (which, by the way, some studies are saying isn’t bad for you after all). But it makes sense that it’s probably best not to sit all day or stand all day but to alternate your time sitting, standing, walking, and even spending some time in a hammock.

One simple way to move around more is to not sit in front of a computer when you’re not using a computer.

Related posts:

# Comparing Windows and Linux stability

In my experience, Ubuntu 12.04 is less stable than Windows 8. Ubuntu is more likely to freeze, crash, or otherwise act badly than Windows.

When I say this, people point out that Ubuntu is not the Linux kernel and that the problems I see come from the GUI or from Ubuntu itself. I can believe that.

So Linux is really stable when you don’t run it on a desktop. But the same is true of Windows. If you set up a Windows server and treat it like a server — don’t install random consumer software, don’t browse the web on it, don’t play games on it, etc. — then Windows is really stable too.

When people compare the stability of Linux and Windows, they may be biased a couple ways. First, Linux is more often deployed on servers and Windows more often on desktops. So they may unintentionally be comparing Linux servers to Windows desktops. Second, they may be thinking that Linux users’ computers are more stable than Windows users’ computers, which is probably true. Linux users typically know more about computers than Windows users. They are more able to avoid problems and more able to fix them when they occur.

I suspect the most important factors for a computer’s stability are how it is being used and who is using it, not what OS it is running.

# Stewart’s cube

This is a drawing of Stewart’s cube.  At each corner, the sum of the weights of the incoming edges is 83. Each edge has a prime weight, and all the weights are distinct.

Source

Update: See Austin Buchanan’s post. He found a similar cube with smaller primes.

# Iterating sines

Pick a starting point x0 and define x1 = f(x0), x2 = f(x1), x3 = f(x2) etc. For which functions and which starting points does the sequence converge? If the sequence converges, how quickly does it converge? In general these are hard questions, but this post will answer a couple special cases.

If f(x) = sin(0.9x) and x0 = 0.5 + i, here’s what the iterations look like:

If the Taylor series for f(x) looks like a1x + a2x2 + a3x3 + … the iterates will converge if |a1| < 1 and your starting point is any complex number in a sufficiently small disk around the origin. The sequence might converge for starting points outside that disk, but there’s no guarantee.

The reason I chose sin(0.9x) for the example above rather than sin(x) is that the former has a1 = 0.9 and satisfies the theorem. The series for sin(x) has leading coefficient 1 and so the theorem doesn’t apply. For sin(x), the sequence of iterates does in fact converge starting at x0 = 0.5 + i but it diverges for other points, such as any purely imaginary starting point.

How fast do the iterates of f(x) converge if |a1| < 1? There is some constant b with |a1| < b < 1 such that | xn | < bn |x0 |. In other words, the sequence converges geometrically.

If we look at iterates of sin(x), the sequence converges for any starting value on the real line. The convergence is slower, on the order of 1/√n. And for points off the real line, the sequence may or may not converge.

Source: Asymptotic Methods in Analysis (Dover)

Update: See Mike Croucher’s blog post for a plot of the starting points where the sine iterates converge, repleat with Mathematica code.

# Perl as a better …

Today I ran across Minimal Perl: For UNIX and Linux People. The book was published a few years ago but I hadn’t heard of it because I haven’t kept up with the Perl world. The following chapters from the table of contents jumped out at me because I’ve been doing a fair amount of awk and sed lately.:

3. Perl as a (better) grep command
4. Perl as a (better) sed command
5. Perl as a (better) awk command
6. Perl as a (better) find command

These chapters can be read a couple ways. The most obvious reading would be “Learn a few features of Perl and use it as a replacement for a handful of separate tools.”

But if you find these tools familiar and are not looking to replace them, you could read the book as saying “Here’s an introduction to Perl that teaches you the language by comparing it to things you already know well.”

The book suggests learning one tool instead of several, and in the bargain getting more powerful features, such as more expressive pattern matching. It also suggests not necessarily committing to learn the entire enormous Perl language, and not necessarily committing to use Perl for every programming task.

Regarding Perl’s pattern matching, I could relate to the following quip from the book.

What the only thing worse than not having a particular metacharacter … in a pattern-matching utility? Thinking you do, when you don’t! Unfortunately, that’s a common problem when using Unix utilities for pattern matching.

That was my experience just yesterday. I wrote a regular expression containing \d for a digit and couldn’t understand why it wasn’t matching.

Most of the examples rely on giving Perl command line options such as -e so that it acts more like command line utility. The book gives numerous examples carrying out common tasks in grep etc. and with Perl one-liners. The latter tend to be a little more verbose. If a task falls in the sweet spot of a common tool, that tool’s syntax will be more succinct. But when a task falls outside that sweet spot, such as matching a pattern that cannot be easily expressed with traditional regular expressions, the Perl solution will be shorter.

Related posts:

# Why are differentiable complex functions infinitely differentiable?

Complex analysis is filled with theorems that seem too good to be true. One is that if a complex function is once differentiable, it’s infinitely differentiable. How can that be? Someone asked this on math.stackexchange and this was my answer.

The existence of a complex derivative means that locally a function can only rotate and expand. That is, in the limit, disks are mapped to disks. This rigidity is what makes a complex differentiable function infinitely differentiable, and even more, analytic.

For a complex derivative to exist, it has to exist and have the same value for all ways the “h” term can go to zero in (f(z+h) – f(z))/h. In particular, h could approach 0 along any radial path, and that’s why an analytic function must map disks to disks in the limits.

By contrast, an infinitely differentiable function of two real variables could map a disk to an ellipse, stretching more in one direction than another. An analytic function can’t do that.

A smooth function of two variables could also flip a disk over, such as f(x, y) = (x, -y). An analytic function can’t do that either. That’s why complex conjugation is not an analytic function.

You might think that if complex differentiability is so restrictive, there must not be many complex differentiable functions. And yet nearly every function you’ve run across in school — trig functions, logs, polynomials, classical probability distributions, etc. — are all differentiable when extended to functions of a complex variable. According to this quote, “95 percent, if not 100 percent, of the functions studied by physics, engineering, and even mathematics students” are hypergeometric, a very special case of complex differentiable functions.

Contact me if you’d like help with complex analysis.

# How to avoid shell scripting

Suppose you know a scripting language (Perl, Python, Ruby, etc) and you’d rather not learn shell scripting (bash, PowerShell, batch, etc.). Or maybe you know shell scripting on one platform and don’t want to take the time right now to learn shell scripting on another platform. For example, maybe you know bash on Linux but don’t want to learn PowerShell on Windows, or vice versa.

One strategy would be to use your preferred language to generate shell scripts. Shell scripts are trivial when they’re just a list of commands: do this, do this, do this, etc. Where shell scripting gets more complicated is when you have variables, branching logic, library calls, all the stuff you already know how to do in another language. Maybe you could do all the complicated logic in your “native language” and just generate a shell script that’s simply as list of instructions with no other logic.

Another strategy is to make system calls from your preferred language. Most scripting languages have a system() function that takes a string and executes it as a system command. The advantage of this approach is that it could have conditional logic that the code generation approach could not handle. The disadvantage is that you have to sort out what process the system() call is running under etc.

Maybe you want to learn shell scripting, but you need to get work done now that you don’t yet know how to do. One of these strategies could buy you some time. You might transition, for example, from Python to PowerShell by generating more sophisticated shell scripts over time and writing simpler generator code until you just write scripts directly.

* * *

# Calculating entropy

For a set of positive probabilities pi summing to 1, their entropy is defined as

(For this post, log will mean log base 2, not natural log.)

This post looks at a couple questions about computing entropy. First, are there any numerical problems computing entropy directly from the equation above?

Second, imagine you don’t have the pi values directly but rather counts ni that sum to N. Then pi = ni/N. To apply the equation directly, you’d first need to compute N, then go back through the data again to compute entropy. If you have a large amount of data, could you compute the entropy in one pass?

To address the second question, note that

so you can sum ni and ni log ni in the same pass.

One of the things you learn in numerical analysis is to look carefully at subtractions. Subtracting two nearly equal numbers can result in a loss of precision. Could the numbers above be nearly equal? Maybe if the ni are ridiculously large. Not just astronomically large — astronomically large numbers like the number of particles in the universe are fine — but ridiculously large, numbers whose logarithms approach the limits of machine-representable numbers. (If we’re only talking about numbers as big as the number of particles in the universe, their logs will be at most three-digit numbers).

Now to the problem of computing the sum of ni log ni. Could the order of the terms matter? This also applies to the first question of the post if we look at summing the pi log pi. In general, you’ll get better accuracy summing a lot positive numbers by sorting them and adding from smallest to largest and worse accuracy by summing largest to smallest. If adding a sorted list gives essentially the same result when summed in either direction, summing the list in any other order should too.

To test the methods discussed here, I used two sets of count data, one on the order of a million counts and the other on the order of a billion counts. Both data sets had approximately a power law distribution with counts varying over seven or eight orders of magnitude. For each data set I computed the entropy four ways: two equations times two orders. I convert the counts to probabilities and use the counts directly, and I sum smallest to largest and largest to smallest.

For the smaller data set, all four methods produced the same answer to nine significant figures. For the larger data set, all four methods produced the same answer to seven significant figures. So at least for the kind of data I’m looking at, it doesn’t matter how you calculate entropy, and you might as well use the one-pass algorithm to get the result faster.

# New Twitter account for networks

As I mentioned a few days ago, I’m winding down three of my Twitter accounts.

But I have started a new one: NetworkFact. This account has tweets about graph theory, analysis of large networks, etc.

List of daily tip accounts