Today’s an international prime day

Posted on 29 November 2013 by John

Today’s a prime day. Whether you write the date in American (MMDDYY), European (DDMMYY), or ISO 8601 (YYYYMMDD) format, you get a prime. That is, 112913 and 291113 and 20131129 are all prime numbers.

We’ll call a date an American prime date if MMDDYY is prime, a European prime date if DDMMYY is prime, and an ISO prime date if YYYYMMDD is prime. (Single-digit days and months are padded with a zero.) If a date is prime by all three criteria, we’ll call it an international prime date. Today is an international prime date, and there won’t be another one until August 11, 2019.

If a date is an American prime date and a European prime date, we’ll call it a transatlantic prime date. After today, the next transatlantic prime dates are December 4 and 13 this year. There will be no transatlantic prime dates in 2014, 2015, and 2016 since these dates correspond to numbers that are divisible by either 2 or 5. The first transatlantic prime date of 2017 will be January 16.

We got the definition wrong

Posted on 26 November 2013 by John

When I was in grad school, I had a course in Banach spaces with Haskell Rosenthal. One day he said “We got the definition wrong.” It took a while to understand what he meant.

There’s nothing logically inconsistent about the definition of Banach spaces. What I believe he meant is that the definition is too broad to permit nice classification theorems.

I had intended to specialize in functional analysis in grad school, but my impression after taking that course was that researchers in the field, at least locally, were only interested in questions of the form “Does every Banach space have the property …” In my mind, this translated to “Can you construct a space so pathological that it lacks a property enjoyed by every space that anyone cares about?” This was not for me.

I ended up studying differential equations. I found it more interesting to use Banach spaces to prove theorems about PDEs than to study them for their own sake. From my perspective there was nothing wrong with their definition.

Beneficial but not sufficient

Posted on 26 November 2013 by John

The phrase necessary but not sufficient refers to something that you’ve got to have, but it isn’t enough. For example, being divisible by 2 is a necessary but not sufficient condition for being divisible by 6. Odd numbers are not divisible by 6, so being even is necessary. But evenness is not sufficient because, for example, 8 is an even number not divisible by 6.

Wrongly believing that nice theoretical properties are sufficient for a good model is known as a reification error. I don’t know of a name for wrongly believing theoretical properties are necessary. Believing theoretical criteria are sufficient when they’re not is a sophomoric error. Believing theoretical criteria are necessary when they’re not is a more subtle error.

Maybe it would be helpful to use a phrase like “beneficial but not sufficient” to indicate that some property increases our confidence in a model, though it may not be necessary.

Book review: Practical Data Analysis

Posted on 25 November 2013 by John

Many people have drawn Venn diagrams to locate machine learning and related ideas in the intellectual landscape. Drew Conway’s diagram may have been the first. It has at least been frequently referenced.

By this classification, Hector Cuesta’s new book Practical Data Analysis is located toward the “hacking skills” corner of the diagram. No single book can cover everything, and this one emphasizes practical software knowledge more than mathematical theory or details of a particular problem domain.

The biggest strength of the book may be that it brings together in one place information on tools that are used together but whose documentation is scattered. The book is great source for sample code. The source code is available on GitHub, though it’s more understandable in the context of the book.

Much of the book uses Python and related modules and tools including:

NumPy
mlpy
PIL
twython
Pandas
NLTK
IPython
Wakari

It also uses D3.js (with JSON, CSS, HTML, …), MongoDB (with MapReduce, Mongo Shell, PyMongo, …), and miscellaneous other tools and APIs.

There’s a lot of material here in 360 pages, making it a useful reference.

Applicable math

Posted on 22 November 2013 by John

When I was in college, my advisor and I published a paper in a journal called “Applicable Analysis.” At the time, I thought that was a good name for a journal. It suggested research that was toward the applied end of the spectrum but not tied to a specific application.

Now when I hear “applicable analysis” I wonder what inapplicable analysis or inapplicable math in general would be. I’d hesitate to call any area of math inapplicable. Certainly some areas of math are applied more frequently and more directly than others, but I’ve been repeatedly surprised by useful applications of areas of math not traditionally classified as “applied.”

Convenient and innocuous priors

Posted on 21 November 2013 by John

Andrew Gelman has some interesting comments on non-informative priors this morning. Rather than thinking of the prior as a static thing, think of it as a way to prime the pump.

… a non-informative prior is a placeholder: you can use the non-informative prior to get the analysis started, then if your posterior distribution is less informative than you would like, or if it does not make sense, you can go back and add prior information. …

At first this may sound like tweaking your analysis until you get the conclusion you want. It’s like the old joke about consultants: the client asks what 2+2 equals and the consultant counters by asking the client what he wants it to equal. But that’s not what Andrew is recommending.

A prior distribution cannot strictly be non-informative, but there are common intuitive notions of what it means to be non-informative. It may be helpful to substitute “convenient” or “innocuous” for “non-informative.” My take on Andrew’s advice is something like this.

Start with a prior distribution that’s easy to use and that nobody is going to give you grief for using. Maybe the prior doesn’t make much difference. But if your convenient/innocuous prior leads to too vague a conclusion, go back and use a more realistic prior, one that requires more effort or risks more criticism.

It’s odd that realistic priors can be more controversial than unrealistic priors, but that’s been my experience. It’s OK to be unrealistic as long as you’re conventional.

10 software tool virtues

Posted on 19 November 2013 by John

Diomidis Spinellis gives a list of 10 software tool sins in The Tools at Hand episode of his Tools of the Trade podcast. Here are his points, but turned around. For each sin he lists, I give the opposite as a virtue.

10. Maintain API documentation with the source code.

9. Integrate unit testing in development.

8. Track bugs electronically.

7. Let the compiler do what it can do better than you.

6. Learn how to script your tools to work together.

5. Pay attention to compiler warnings and fix them.

4. Use a version control system.

3. Use tools to find definitions rather than scanning for them.

2. Use a debugger.

1. Use tools that eliminate repetitive manual editing.

I turned the original list around because I believe it’s easier to agree that the things above are good than it is to see that their lack is bad. Some items are opposites, like #5: you either pay attention to warnings or you ignore them. But some are not, like #8. Tracking bugs electronically is a good idea, but I wouldn’t call tracking bugs on paper a “sin.”

Related post: Reducing development friction comments on another podcast from Diomidis Spinellis.

Why can you multiply different units but not add them?

Posted on 15 November 2013 by John

It’s usually an error to add quantities that have different units. If you have an equation that adds meters and kilograms, for example, something is almost certainly wrong. A formula that multiples meters and kilograms might be correct, but not one that adds them.

Why is this? It’s because the laws of nature don’t depend on our choice of units. For example, F = ma whether you use English, metric, or any other system of units.

Changing units introduces a multiplicative factor. You convert inches to centimeters, for example, by multiplying by 2.54.

If you were to add a length to a mass, and say you cut your unit of length in half but left your unit of mass the same, this wouldn’t be a multiplicative change (unless the mass were zero). But when you multiply or divide quantities with different units, the result does scale multiplicatively.

* * *

For daily tidbits about units, follow @UnitFact on Twitter.

Sensitive dependence on initial conditions

Posted on 12 November 2013 by John

The following problem illustrates how the smallest changes to a problem can have large consequences. As explained at the end of the post, this problem is a little artificial, but it illustrates difficulties that come up in realistic problems.

Suppose you have a simple differential equation,

y'' - y = 0

where y(t) gives displacement in meters at a time measured in seconds. Assume initial conditions y(0) = 1 and y'(0) = -1.

Then the unique solution is y(t) = exp(-t). The solution is decreasing and positive. But a numerical solution to the equation might turn around and start increasing, or it might go negative.

Suppose the initial conditions are y(0) = 1 as before but now y'(0) = -1 ± ε. You could think of the ε as a tiny measurement error in the initial velocity, or a limitation in representing the initial velocity as a number in a computer.

The following graph shows the solutions corresponding to ε = 10^-6, 0, and -10^-6.

plot of 3 solutions, epsilon positive, negative, zero

The exact solution is

$y(t) = \left(1 \mp \frac{\varepsilon}{2}\right)\exp(-t) \pm \frac{\varepsilon}{2}\exp(t)$

If the initial velocity is exactly -1, then ε = 0 and we only have the solution exp(-t). But if the initial velocity is a little more than -1, there is an exponentially increasing component of the solution that will eventually overtake the exponentially decreasing component. And if the initial velocity is a little less than -1, there is a negative component increasing exponentially in magnitude. Eventually it will overtake the positive component and cause the solution to become negative.

The solution starts to misbehave (i.e. change direction or go negative) at

$t = \frac{1}{2} \log\left( \frac{2}{\varepsilon} \mp 1 \right)$

If ε is small, 2/ε is much larger than 1, and so the final 1 above makes little difference. So we could say the solution goes bad at approximately

$t = \frac{1}{2} \log\left( \frac{2}{\varepsilon} \right)$

whether the error is positive or negative. If ε is ±0.000001, for example, the solution goes bad around t = 7.25 seconds. (The term we dropped only effects the 6th decimal place.)

On a small time scale, less than 7.25 seconds, we would not notice the effect of the (initially) small exponentially growing component. But eventually it matters, a lot. We can delay the point where the solution goes bad by controlling the initial velocity more carefully, but this doesn’t buy us much time. If we make ε ten times smaller, we postpone the time when the solution goes bad to 8.41, only a little over a second later.

[Looking at the plot, the solution does not obviously go wrong at t = 7.25. But it does go qualitatively wrong at that point: a solution that should be positive and decreasing becomes either negative or increasing.]

Trying to extend the quality of the solution by making ε smaller is a fool’s errand. Every order of magnitude decrease in ε only prolongs the inevitable by an extra second or so.

This is a somewhat artificial example. The equation and initial conditions were selected to make the sensitive dependence obvious. The sensitive dependence is itself sensitive: small changes to the problem, such as adding a small first derivative term, make it better behaved.

However, sensitive dependence is a practical phenomena. Numerical methods can, for example, pick up an initially small component of an increasing solution while trying to compute a decreasing solution. Sometimes the numerical sensitivity is a reflection of a sensitive physical system. But if the physical system is stable and the numerical solution is not, the problem may not be numerical. The problem may be a bad model. The numerical difficulties may be trying to tell you something, in which case increasing the accuracy of the numerical method may hide a more basic problem.

Elusive statistics