Distance to Mars

The distance between the Earth and Mars depends on their relative positions in their orbits and varies quite a bit over time. This post will show how to compute the approximate distance over time. We’re primarily interested in Earth and Mars, though this shows how to calculate the distance between any two planets.

The planets have elliptical orbits with the sun at one focus, but these ellipses are nearly circles centered at the sun. We’ll assume the orbits are perfectly circular and lie in the same plane. (Now that Pluto is not classified as a planet, we can say without qualification that the planets have nearly circular orbits. Pluto’s orbit is much more elliptical than any of the planets.)

We can work in astronomical units (AUs) so that the distance from the Earth to the sun is 1. We can also work in units of years so that the period is also 1. Then we could describe the position of the Earth at time t as exp(2πit).

Mars has a larger orbit and a longer period. By Kepler’s third law, the size of the orbit and the period are related: the square of the period is proportional to the cube of the radius. Because we’re working in AUs and years, the proportionality constant is 1. If we denote the radius of Mars’ orbit by r, then its orbit can be described by

r exp(2πi (r-3/2 t ))

Here we pick our initial time so that at t = 0 the two planets are aligned.

The distance between the planets is just the absolute value of the difference between their positions:

| exp(2πit) – r exp(2πi (r-3/2 t)) |

The following code computes and plots the distance from Earth to Mars over time.

from scipy import exp, pi, absolute, linspace
import matplotlib.pyplot as plt

    def earth(t):
        return exp(2*pi*1j*t)

    def mars(t):
        r = 1.524 # semi-major axis of Mars orbit in AU
        return r*exp(2*pi*1j*(r**-1.5*t))

    def distance(t):
        return absolute(earth(t) - mars(t))

    x = linspace(0, 20, 1000)
    plt.plot(x, distance(x))
    plt.xlabel("Time in years")
    plt.ylabel("Distance in AU")
    plt.ylim(0, 3)

And the output looks like this:

Notice that the distance varies from about 0.5 to about 2.5. That’s because the radius of Mars’ orbit is about 1.5 AU. So when the planets are exactly in phase, they are 0.5 AU apart and when they’re exactly out of phase they are 2.5 AU apart. In other words the distance ranges from 1.5 – 1 to 1.5 + 1.

The distance function seems to be periodic with period about 2 years. We can do a little calculation by hand to show that is the case and find the period exactly.

The distance squared is the distance times its complex conjugate. If we let ω = -3/2 then the distance squared is

d2(t) = (exp(2πit) – r exp(2πiωt)) (exp(-2πit) – r exp(-2πiωt))

which simplifies to

1 + r2 – 2r cos(2π(1 – ω)t)

and so the (squared) distance is periodic with period 1/(1 – ω) = 2.13.

Notice that the plot of distance looks more angular at the minima and more rounded near the maxima. Said another way, the distance changes more rapidly when the planets leave their nearest approach than their furthest approach. You can prove this by taking square root of d2(t) and computing its derivative.

Let f(t) = 1 + r2 – 2r cos(2π(1 – ω)t). By the chain rule, the derivative of the square root of  f(t) is 1/2  f(t)-1/2 f‘(t). Near a maximum or a minimum, f‘(t) takes on the same values. But the term f(t)-1/2 is largest when f(t) is smallest and vice versa because of the negative exponent.

Impulse response

You may expect that a burst of input will cause a burst of output. Sometimes that’s the case, but often a burst of input results in a long, smoothly decreasing succession of output. You may not get immediate results, but long-term results. This is true of life in general, but it’s also true in a precise sense of differential equations.

One of the surprises from differential equations is that an infinitely concentrated input usually results in a diffuse output. A fundamental solution to a differential equation is a solution to the equation with a Dirac delta as the forcing function. In a sense, your input is so concentrated that it’s not actually a function. And yet the output may be a nice continuous function, and not one that is not particularly concentrated.

The situation is analogous to striking a bell. The input, the hammer blow to the bell, is extremely short, but the response of the bell is long and smooth. Solving a differential equation with a delta function as input is like learning about a bell by listening to how it rings when you strike it. A better analogy would be striking the bell in many places; a fundamental solution actually solves for a delta function with a position argument, not just a single delta function.

If you’re curious how this informal talk of “infinitely concentrated” input and delta “functions” can be made rigorous, start by reading this post.

Related post: Life lessons from differential equations

Permutations and tests

Suppose a test asks you to place 10 events in chronological order. Label these events A through J so that chronological order is also alphabetical order.

If a student answers BACDEFGHIJ, then did they make two mistakes or just one? Two events are in the wrong position, but they made one transposition error. The simplest way to grade such a test would be to count the number of events that are in the correct position. Is this the most fair way to grade?

If you decide to count how many transpositions are needed to correct a student’s answer, do you count any transposition or only adjacent transpositions? For example, if someone answered JBCDEFGHIA, then transposing the A and the J is enough to put the results in order. But reversing the first and last event seems like a bigger mistake than reversing the first two events. Counting only adjacent transpositions would penalize this mistake more. You would have to swap the J with each of the eight letters between J and A. But it hardly seems that answering JBCDEFGHIA is eight times worse than answering BACDEFGHIJ.

Maybe counting transpositions is too much work. So we just go back to counting how many events are in the right place. But then suppose someone answers JABCDEFGHI. This is completely wrong since every event is in the wrong position. But the student obviously knows something, since the relative order of nearly all of the events is correct. From one perspective there was only one mistake: J comes last, not first.

What is the worst possible answer? Maybe getting the order exactly backward? If you have an odd number of events, then getting the order backward means one event is in the right place, and so that doesn’t receive the lowest possible score.

This is an interesting problem beyond grading exams. (As for grading exams, I’d suggest simply not using questions of this type on an exam.) In manufacturing, how serious a mistake is it to reverse two consecutive components versus two distant components? You could also ask the same question when comparing DNA sequences or other digital signals. The best way to assign a distance between the actual and desired sequence would depend entirely on context.

How did our ancestors sleep?

Electric lighting has changed the way we sleep, encouraging us to lose sleep by staying awake much longer after dark than we otherwise would.

Or maybe not. A new study of three contemporary hunter-gatherer tribes found that they stay awake long after dark and sleep an average of 6.5 hours a night. They also don’t nap much [1]. This suggests the way we sleep may not be that different from our ancient forebears.

Historian A. Roger Ekirch suggested that before electric lighting it was common to sleep in two four-hour segments with an hour or so of wakefulness in between. His theory was based primarily on medieval English texts that refer to “first sleep” and “second sleep” and has other literary support as well. A small study found that subjects settled into the sleep pattern Ekirch predicted when they were in a dark room for 14 hours each night for a month. But the hunter-gatherers don’t sleep this way.

Maybe latitude is an important factor. The hunter-gatherers mentioned above live between 2 and 20 degrees south of the equator whereas England is 52 degrees north of the equator. Maybe two-phase sleep was more common at high latitudes with long winter nights. Of course there are many differences between modern/ancient [2] hunter-gatherers and medieval Western Europeans besides latitude.

Two studies have found two patterns of how people sleep without electric lights. Maybe electric lights don’t have as much impact on how people sleep as other factors.

Related post: Paleolithic nonsense

* * *

[1] The study participants were given something like a Fitbit to wear. The article said that naps less than 15 minutes would be below the resolution of the monitors, so we don’t know how often the participants took cat naps. We only know that they rarely took longer naps.

[2] There is an implicit assumption that the contemporary hunter-gatherers live and, in particular, sleep like their ancient ancestors. This seems reasonable, though we can’t be certain. There is also the bigger assumption that the tribesmen represent not only their ancestors but all paleolithic humans. Maybe they do, and we don’t have much else to go on, but we don’t know. I suspect there was more diversity in the paleolithic era than we assume.

Fibonacci formula for pi

Here’s an unusual formula for pi based on the product and least common multiple of the first m Fibonacci numbers.


\pi = \lim_{m\to\infty} \sqrt{\frac{6 \log F_1 \cdots F_m}{\log \mbox{lcm}( F_1, \ldots, F_m )}}

Unlike the formula I wrote about a few days ago relating Fibonacci numbers and pi, this one is not as simple to prove. The numerator inside the root is easy enough to estimate asymptotically, but estimating the denominator depends on the distribution of primes.

Source: Yuri V. Matiyasevich and Richard K. Guy, A new formula for π, American Mathematical Monthly, Vol 93, No. 8 (October 1986), pp. 631-635.


PACE: Property Assessed Clean Energy

Energy efficiency improvements can pay for themselves in the long run. Financing can make the improvements immediately cash-flow positive, but only if the loan tenor can match the useful life of the equipment. This enables the payments to be low enough that the projected energy savings exceeds the payments.

PACE, which stands for Property Assessed Clean Energy, is a nation-wide program that makes long-term financing available for energy upgrades, repaid through an annual assessment added to your property tax bill. Though it is a national initiative, each state must create its own PACE program, and so there is some variety in the forms PACE can take. Texas passed legislation in June 2013 authorizing local governments to implement PACE programs. Yesterday the Houston City Council unanimously passed a Resolution of Intent to adopt PACE.

I have been working with PACE Houston, a private company that develops PACE projects by providing strategic advice to property owners. If you’re in the Houston area and are interested in help with PACE financing, you could contact me, or go to the contact page on the PACE Houston web site.

A rose by any other name: Data science etc.

I help people make decisions in the face of uncertainty. Sounds interesting.

I’m a data scientist. Not sure what that means, but it sounds cool.

I study machine learning. Hmm. Maybe interesting, maybe a little ominous.

I’m into big data. Exciting or passé, depending on how many times you’ve heard the term.

Even though each of these descriptions makes a different impression, they’re all essentially the same thing. You could throw in a few more terms too, like artificial intelligence, inferential science, decision theory, or inverse probability.

There are distinctions. These terms don’t entirely overlap, but the overlap is huge. They all have to do with taking data and making an inference.

“Decision-making under uncertainty” emphasizes that you never have complete data, and yet you need to make decisions anyway. “Decision theory” emphasizes that the whole point of analyzing data is to do something as a result, and suggests that focusing directly on the decision itself, rather than proxies along the way, is the best way to do this.

“Data science” stresses that there is more to the process of making inferences than what falls under the traditional heading of “statistics.” Statistics has never been only about “the grotesque phenomenon generally known as mathematical statistics,” as Francis Anscombe described it. Things like data cleaning and visualization have always been part of the practice of statistics, though not the theory of statistics. Data science also emphasizes the role of computation. Some say a data scientist is a statistician who can program. Some say data science is statistics on a Mac.

Despite the hype around the term data science, it’s growing on me. It has its drawbacks, but so does every other name.

Machine learning, like decision theory, emphasizes the ultimate goal of doing something with data rather than creating an accurate model of the process that generates the data. If you can create such a model, so much the better. But it may not be necessary to have a great model in order to accomplish what you originally set out to do. “Naive Bayes,” for example, is a classification algorithm that is admittedly naive. It knowingly makes a gross simplification, assuming events are independent that we know are certainly not independent, and yet it often works well enough.

“Big data” is a big can of worms. It is often concerned with data sets that are indeed big, but it also implies other things, such as the way the data become available, as a real time stream rather than as a complete static set. See Erik Meijer’s Big data cube. And that’s just when the term “big data” is used in some fairly meaningful way. It’s also used so broadly as to be meaningless.

The term “statistics” literally means the mathematics of the interests of states, as in governments, because these were the first applications of statistics. So while “statistics” may be the most established and perhaps most respectable term discussed here, it’s not great. As I remarked here, “The term statistics would be equivalent to governmentistics, a historically accurate but otherwise useless term.” Statistics emphasizes probability models and mathematical rigor more than other variations on data analysis do. Statisticians criticize machine learning folks for being sloppy. Machine learning folks criticize statisticians for being too conservative, or for being too focused on description and not focused enough on prediction.

Bayesian statistics is much older than what is now sometimes called “classical” statistics. It was essential dormant during the first half of the 20th century before experiencing a renaissance in the second half of the century. Bayesian statistics was originally called “inverse probability” for good reason. Probability theory takes the probabilities of events as given and makes inferences about possible outcomes. Bayesian statistics does the inverse, taking data as given and inferring the probabilities that lead to the data. All statistics does something like this, but Bayesian statistics is consistent in forming all inference directly as probabilities. Frequetist (“classical”) statistics also infers probabilities, but the results, things like p-values and confidence intervals, are not the probabilities of what most people think they are. See Anthony O’Hagan’s description here.

Data analysis has gone by many names over time, sometimes with meaningful distinctions and sometimes not. Often people make a distinction without a difference.

Fibonacci numbers, arctangents, and pi

Here’s an unusual formula for π. Let Fn be the nth Fibonacci number. Then

\pi = 4 \sum_{n=1}^\infty \arctan\left( \frac{1}{F_{2n+1}} \right)

As mysterious as this equation may seem, it’s not hard to prove. The arctangent identity

\arctan\left(\frac{1}{F_{2n+1}}\right) = \arctan\left(\frac{1}{F_{2n}}\right) - \arctan\left(\frac{1}{F_{2n+2}}\right)

shows that the sum telescopes, leaving only the first term, arctan(1) = π/4. To prove the arctangent identity, take the tangent of both sides, use the addition law for tangents, and use the Fibonacci identity

F_{n+1} F_{n-1} - F_n^2 = (-1)^n

See this post for an even more remarkable formula relating Fibonacci numbers and π.

Second languages and selection bias

When I was growing up, I was told that you could never become fluent in a second language, and I believed it. I had no reason not to. I didn’t know anybody who had become fluent at a second language, and I could think of plenty of people who had learned English as a second language but who weren’t fluent.

But how would I know if someone had learned English fluently? If they were fluent, I’d assume they were native speakers. I knew people had learned English as a second language because they weren’t fluent. This is selection bias, where the selection of data you see is influenced by the very thing you’re interested in.

A famous example of selection bias is that of British bombers returning from missions over Germany in World War II. Statistician Abraham Wald advised the RAF to add armor precisely where these bombers were not shot, reasoning that he was only able to inspect bombers that survived their missions. More on this story here.

When I was in college, I had a roommate who had learned Spanish fluently as a second language. I thought “That’s not possible!” though of course it is possible. I immediately began to wonder what else that I thought was impossible was merely difficulty.

Number of digits in n!

The other day I ran across the fact that 23! has 23 digits. That made me wonder how often n! has n digits.

There can only be a finite number of cases, because n! grows faster than 10n for n > 10, and it’s reasonable to guess that 23 might be the largest case. Turns out it’s not, but it’s close. The only cases where n! has n digits are 1, 22, 23, and 24. Once you’ve found these by brute force, it’s not hard to show that they must be the only ones because of the growth rate of n!.

Is there a convenient way to find the number of digits in n! without having to compute n! itself? Sure. For starters, the number of digits in the base 10 representation of a number x is

⌊ log10 x ⌋ + 1.

where ⌊ z ⌋ is the floor of z, the largest integer less than or equal to z. The log of the factorial function is easier to compute than the factorial itself because it won’t overflow. You’re more likely to find a function to compute the log of the gamma function than the log of factorial, and more likely to find software that uses natural logs than logs base 10. So in Python, for example, you could compute the number of digits with this:

from scipy.special import gammaln
from math import log, floor

def digits_in_factorial(n):
    return floor( gammaln(n+1)/log(10.0) ) + 1

What about a more elementary formula, one that doesn’t use the gamma function? If you use Stirling’s approximation for factorial and take log of that you should at least get a good approximation. Here it is again in Python:

from math import log, floor, pi

def stirling(n):
    return floor( ((n+0.5)*log(n) - n + 0.5*log(2*pi))/log(10) ) + 1

The code above is exact for every n > 2 as far as I’ve tested, up to n = 1,000,000. (Note that one million factorial is an extremely large number. It has 5,565,709 digits. And yet we can easily say something about this number, namely how many digits it has!)

The code may break down somewhere because the error in Stirling’s approximation or the limitations of floating point arithmetic. Stirling’s approximation gets more accurate as n increases, but it’s conceivable that a factorial value could be so close to a power of 10 that the approximation error pushes it from one side of the power of 10 to the other. Maybe that’s not possible and someone could prove that it’s not possible.

You could extend the code above to optionally take another base besides 10.

def digits_in_factorial(n, b=10):
    return floor( gammaln(n+1)/log(b) ) + 1

def stirling(n, b=10):
    return floor( ((n+0.5)*log(n) - n + 0.5*log(2*pi))/log(b) ) + 1

The code using Stirling’s approximation still works for all n > 2, even for b as small as 2. This is slightly surprising since the number of bits in a number is more detailed information than the number of decimal digits.

Data analysis vs statistics

John Tukey preferred the term “data analysis” over “statistics.” In his paper Data Anaysis, Computation and Mathematics, he explains why.

My title speaks of “data analysis” not “statistics”, and of “computation” not “computing science”; it does not speak of “mathematics”, but only last. Why? …

My brother-in-squared-law, Francis J. Anscombe has commented on my use of “data analysis” in the following words:

Whereas the content of Tukey’s remarks is always worth pondering, some of his terminology is hard to take. He seems to identify “statistics” with the grotesque phenomenon generally known as “mathematical statistics”, and finds it necessary to replace “statistical analysis” with “data analysis.”

(Tukey calls Anscombe his “brother-in-squared-law” because Anscombe was a fellow statistician as well as his brother-in-law. At first I thought Tukey had said “brother-in-law-squared”, which could mean his brother-in-law’s brother-in-law, but I suppose it was a pun on the role of least-square methods in statistics.)

Tukey later says

I … shall stick to this attitude today, and shall continue to use the words “data analysis”, in part to indicate that we can take probability seriously, or leave it alone, as may from time to time be appropriate or necessary.

It seems Tukey was reserving the term “statistics” for that portion of data analysis which is rigorously based on probability.

Technical arbitrage

There are huge opportunities to take technology that is well-known and undervalued in one context and apply it in another where it is unknown but valuable. You could call this technical arbitrage, analogous to financial arbitrage, taking advantage of the price difference of something in two markets.

As with financial arbitrage, the hard part is spotting opportunities, not necessarily acting on them. If you want to be a hero with regular expressions, as in the xkcd cartoon below, the key isn’t knowing regular expressions well. The key is knowing about regular expressions in a context where no one else does.

To spot a technical arbitrage opportunity, you need to know what that technology can and cannot (easily) do. You also need to recognize situations where the technology can help. You don’t need to be able to stand up to a quiz show style oral exam.


Related post: You can be a hero with a simple idea

The academic cocoon

In the novel Enchantment, the main character, Ivan, gives a bitter assessment of his choice of an academic career, saying it was for “men who hadn’t yet grown up.”

The life he had chosen was a cocoon. Surrounded by a web of old manuscripts and scholarly papers, he would achieve tenure, publish frequently, teach a group of carefully selected graduate students, be treated like a celebrity by the handful of people who had the faintest idea who he was, and go to his grave deluded into thinking he had achieved greatness when in fact he stayed in school all his life. Where was the plunge into the unknown?

I don’t believe the author meant to imply that an academic career is necessarily so insular. In the context of the quote, Ivan says his father, also a scholar, “hadn’t stayed in the cocoon” but had taken great risks. But there is certainly the danger of living in a tiny world and believing that you’re living in a much larger world. Others may face the same danger, though it seems particularly acute for academics.

It’s interesting that Ivan criticizes scholars for avoiding the unknown. Scholars would say they’re all about pursuing the unknown. But scholars pursue known unknowns, questions that can be precisely formulated within the narrow bounds of their discipline. The “plunge into the unknown” here refers to unknown unknowns, messier situations where not only are the outcomes unknown, even the risks themselves are unknown.

Doubly and triply periodic functions

A function f is periodic if there exists a constant period ω such that f(x) = f(x + ω) for all x. For example, sine and cosine are periodic with period 2π.

There’s only one way a function on the real line can be periodic. But if you think of functions of a complex variable, it makes sense to look at functions that are periodic in two different directions. Sine and cosine are periodic as you move horizontally across the complex plane, but not if you move in any other direction. But you could imagine a function that’s periodic vertically as well as horizontally.

A doubly periodic function satisfies f(x) = f(x + ω1) and f(x) = f(x + ω2) for all x and for two different fixed complex periods, ω1 and ω2, with different angular components, i.e. the two periods are not real multiples of each other. For example, the two periods could be 1 and i.

How many doubly periodic functions are there? The answer depends on how much regularity you require. If you ask that the functions be differentiable everywhere as functions of a complex variable (i.e. entire), the only doubly periodic functions are constant functions [1]. But if you relax your requirements to allow functions to have singularities, there’s a wide variety of functions that are doubly periodic. These are the elliptic functions. They’re periodic in two independent directions, and meromorphic (i.e. analytic except at isolated poles). [2]

What about triply periodic functions? If you require them to be meromorphic, then the only triply periodic functions are constant functions. To put it another way, if a meromorphic function is periodic in three directions, it’s periodic in every direction for every period, i.e. constant. If a function has three independent periods, you can construct a sequence with a limit point where the function is constant, and so it’s constant everywhere.

* * *

[1] Another way to put this is to say that elliptic functions must have at least one pole inside the parallelogram determined by the lines from the origin to ω1 and ω2. A doubly periodic function’s values everywhere are repeats of its values on this parallelogram. If the function were continuous over this parallelogram (i.e. with no poles) then it would be bounded over the parallelogram and hence bounded everywhere. But Liovuille’s theorem says a bounded entire function must be constant.

[2] We don’t consider arbitrary singularities, only isolated poles. There are doubly periodic functions with essential singularities, but these are outside the definition of elliptic functions.

Taking away a damaging tool

This is a post about letting go of something you think you need. It starts with an illustration from programming, but it’s not about programming.

Bob Martin published a dialog yesterday about the origin of structured programming, the idea that programs should not be written with goto statements but should use less powerful, more specialized ways to transfer control. Edsgar Dijkstra championed this idea, most famously in his letter Go-to statement considered harmful. Since then there have been countless “considered harmful” articles that humorously allude to Dijkstra’s letter.

Toward the end of the dialog, Uncle Bob’s interlocutor says “Hurray for Dijkstra” for inventing the new technology of structured programming. Uncle Bob corrects him

New Technology? No, no, you misunderstand. … He didn’t invent anything. What he did was to identify something we shouldn’t do. That’s not a technology. That’s a discipline.

        Huh? I thought Structured Programming made things better.

Oh, it did. But not by giving us some new tools or technologies. It made things better by taking away a damaging tool.

The money quote is the last line above: It made things better by taking away a damaging tool.

Creating new tools gets far more attention than removing old tools. How might we be better off by letting go of a tool? When our first impulse is that we need a new technology, might we need a new discipline instead?

Few people have ever been able to convince an entire profession to let go of a tool they assumed was essential. If we’re to have any impact, most of us will need to aim much, much lower. It’s enough to improve our personal productivity and possibly that of a few peers. Maybe you personally would be better off without something that is beneficial to most people.

What are some technologies you’ve found that you’re better off not using?

Related posts: