Today’s an international prime day

Today’s a prime day. Whether you write the date in American (MMDDYY), European (DDMMYY), or ISO (YYYYMMDD) format, you get a prime. That is, 112913 and 291113 and 20131129 are all prime numbers.

We’ll call a date an American prime date if MMDDYY is prime, a European prime date if DDMMYY is prime, and an ISO prime date if YYYYMMDD is prime. (Single-digit days and months are padded with a zero.) If a date is prime by all three criteria, we’ll call it an international prime date. Today is an international prime date, and there won’t be another one until August 11, 2019.

If a date is an American prime date and a European prime date, we’ll call it a transatlantic prime date. After today, the next transatlantic prime dates are December 4 and 13 this year. There will be no transatlantic prime dates in 2014, 2015, and 2016 since these dates correspond to numbers that are divisible by either 2 or 5. The first transatlantic prime date of 2017 will be January 16.

We got the definition wrong

When I was in grad school, I had a course in Banach spaces with Haskell Rosenthal. One day he said “We got the definition wrong.” It took a while to understand what he meant.

There’s nothing logically inconsistent about the definition of Banach spaces. What I believe he meant is that the definition is too broad to permit nice classification theorems.

I had intended to specialize in functional analysis in grad school, but my impression after taking that course was that researchers in the field, at least locally, were only interested in questions of the form “Does every Banach space have the property …” In my mind, this translated to “Can you construct a space so pathological that it lacks a property enjoyed by every space that anyone cares about?” This was not for me.

I ended up studying differential equations. I found it more interesting to use Banach spaces to prove theorems about PDEs than to study them for their own sake. From my perspective there was nothing wrong with their definition.

Related post: Remembering Ted Odell

Beneficial but not sufficient

The phrase necessary but not sufficient refers to something that you’ve got to have, but it isn’t enough. For example, being divisible by 2 is a necessary but not sufficient condition for being divisible by 6. Odd numbers are not divisible by 6, so being even is necessary. But evenness is not sufficient because, for example, 8 is an even number not divisible by 6.

Wrongly believing that nice theoretical properties are sufficient for a good model is known as a reification error. I don’t know of a name for wrongly believing theoretical properties are necessary. Believing theoretical criteria are sufficient when they’re not is a sophomoric error. Believing theoretical criteria are necessary when they’re not is a more subtle error.

Maybe it would be helpful to use a phrase like “beneficial but not sufficient” to indicate that some property increases our confidence in a model, though it may not be necessary.

Book review: Practical Data Analysis

Many people have drawn Venn diagrams to locate machine learning and related ideas in the intellectual landscape. Drew Conway’s diagram may have been the first. It has at least been frequently referenced.

By this classification, Hector Cuesta’s new book Practical Data Anaysis is located toward the “hacking skills” corner of the diagram. No single book can cover everything, and this one emphasizes practical software knowledge more than mathematical theory or details of a particular problem domain.

The biggest strength of the book may be that it brings together in one place information on tools that are used together but whose documentation is scattered. The book is great source for sample code. The source code  is available on GitHub, though it’s more understandable in the context of the book.

Much of the book uses Python and related modules and tools including:

  • NumPy
  • mlpy
  • PIL
  • twython
  • Pandas
  • NLTK
  • IPython
  • Wakari

It also uses D3.js (with JSON, CSS, HTML, …), MongoDB (with MapReduce, Mongo Shell, PyMongo, …), and miscellaneous other tools and APIs.

There’s a lot of material here in 360 pages, making it a useful reference.

Applicable math

When I was in college, my advisor and I published a paper in a journal called “Applicable Analysis.” At the time, I thought that was a good name for a journal. It suggested research that was toward the applied end of the spectrum but not tied to a specific application.

Now when I hear “applicable analysis” I wonder what inapplicable analysis or inapplicable math in general would be. I’d hesitate to call any area of math inapplicable. Certainly some areas of math are applied more frequently and more directly than others, but I’ve been repeatedly surprised by useful applications of areas of math not traditionally classified as “applied.”

 

 

Convenient and innocuous priors

Andrew Gelman has some interesting comments on non-informative priors this morning. Rather than thinking of the prior as a static thing, think of it as a way to prime the pump.

… a non-informative prior is a placeholder: you can use the non-informative prior to get the analysis started, then if your posterior distribution is less informative than you would like, or if it does not make sense, you can go back and add prior information. …

At first this may sound like tweaking your analysis until you get the conclusion you want. It’s like the old joke about consultants: the client asks what 2+2 equals and the consultant counters by asking the client what he wants it to equal. But that’s not what Andrew is recommending.

A prior distribution cannot strictly be non-informative, but there are common intuitive notions of what it means to be non-informative. It may be helpful to substitute “convenient” or “innocuous” for “non-informative.” My take on Andrew’s advice is something like this.

Start with a prior distribution that’s easy to use and that nobody is going to give you grief for using. Maybe the prior doesn’t make much difference. But if your convenient/innocuous prior leads to too vague a conclusion, go back and use a more realistic prior, one that requires more effort or risks more criticism.

It’s odd that realistic priors can be more controversial than unrealistic priors, but that’s been my experience. It’s OK to be unrealistic as long as you’re conventional.

10 software tool virtues

Diomidis Spinellis gives a list of 10 software tool sins in The Tools at Hand episode of his Tools of the Trade podcast. Here are his points, but turned around. For each sin he lists, I give the opposite as a virtue.

10. Maintain API documentation with the source code.

9. Integrate unit testing in development.

8. Track bugs electronically.

7. Let the compiler do what it can do better than you.

6. Learn how to script your tools to work together.

5. Pay attention to compiler warnings and fix them.

4. Use a version control system.

3. Use tools to find definitions rather than scanning for them.

2. Use a debugger.

1. Use tools that eliminate repetitive manual editing.

I turned the original list around because I believe it’s easier to agree that the things above are good than it is to see that their lack is bad. Some items are opposites, like #5: you either pay attention to warnings or you ignore them. But some are not, like #8. Tracking bugs electronically is a good idea, but I wouldn’t call tracking bugs on paper a “sin.”

Related post: Reducing development friction comments on another podcast from Diomidis Spinellis.

Why can you multiply different units but not add them?

It’s usually an error to add quantities that have different units. If you have an equation that adds meters and kilograms, for example, something is almost certainly wrong. A formula that multiples meters and kilograms might be correct, but not one that adds them.

Why is this? It’s because the laws of nature don’t depend on our choice of units. For example, F = ma whether you use English, metric, or any other system of units.

Changing units introduces a multiplicative factor. You convert inches to centimeters, for example, by multiplying by 2.54.

If you were to add a length to a mass, and say you cut your unit of length in half but left your unit of mass the same, this wouldn’t be a multiplicative change (unless the mass were zero). But when you multiply or divide quantities with different units, the result does scale multiplicatively.

 

Sensitive dependence on initial conditions

The following problem illustrates how the smallest changes to a problem can have large consequences. As explained at the end of the post, this problem is a little artificial, but it illustrates difficulties that come up in realistic problems.

Suppose you have a simple differential equation,

y'' - y = 0

where y(t) gives displacement in meters at a time measured in seconds. Assume initial conditions y(0) = 1 and y'(0) = -1.

Then the unique solution is y(t) = exp(-t). The solution is a decreasing, positive solution. But a numerical solution to the equation might turn around and start increasing, or it might go negative.

Suppose the initial conditions are y(0) = 1 as before but now y'(0) = -1 ± ε. You could think of the ε as a tiny measurement error in the initial velocity, or a limitation in representing the initial velocity as a number in a computer.

The following graph shows the solutions corresponding to ε = 10-6, 0, and -10-6.

plot of 3 solutions, epsilon positive, negative, zero

The exact solution is

 y(t) = \left(1 \mp \frac{\varepsilon}{2}\right)\exp(-t) \pm \frac{\varepsilon}{2}\exp(t)

If the initial velocity is exactly -1, then ε = 0 and we only have the solution exp(-t). But if the initial velocity is a little more than -1, there is an exponentially increasing component of the solution that will eventually overtake the exponentially decreasing component. And if the initial velocity is a little less than -1, there is a negative component increasing exponentially in magnitude. Eventually it will overtake the positive component and cause the solution to become negative.

The solution starts to misbehave (i.e. change direction or go negative) at

t = \frac{1}{2} \log\left( \frac{2}{\varepsilon} \mp 1 \right)

If ε is small, 2/ε is much larger than 1, and so the final 1 above makes little difference. So we could say the solution goes bad at approximately

t = \frac{1}{2} \log\left( \frac{2}{\varepsilon} \right)

whether the error is positive or negative. If ε is ±0.000001, for example, the solution goes bad around t = 7.25 seconds. (The term we dropped only effects the 6th decimal place.)

On a small time scale, less than 7.25 seconds, we would not notice the effect of the (initially) small exponentially growing component. But eventually it matters, a lot. We can delay the point where the solution goes bad by controlling the initial velocity more carefully, but this doesn’t buy us much time. If we make ε ten times smaller, we postpone the time when the solution goes bad to 8.41, only a little over a second later.

[Looking at the plot, the solution does not obviously go wrong at t = 7.25. But it does go qualitatively wrong at that point: a solution that should be positive and decreasing becomes either negative or increasing.]

Trying to extend the quality of the solution by making ε smaller is a fool’s errand. Every order of magnitude decrease in ε only prolongs the inevitable by an extra second or so.

This is a somewhat artificial example. The equation and initial conditions were selected to make the sensitive dependence obvious. The sensitive dependence is itself sensitive: small changes to the problem, such as adding a small first derivative term, make it better behaved.

However, sensitive dependence is a practical phenomena. Numerical methods can, for example, pick up an initially small component of an increasing solution while trying to compute a decreasing solution. Sometimes the numerical sensitivity is a reflection of a sensitive physical system. But if the physical system is stable and the numerical solution is not, the problem may not be numerical. The problem may be a bad model. The numerical difficulties may be trying to tell you something, in which case increasing the accuracy of the numerical method may hide a more basic problem.

Related post: Approximating a solution that doesn’t exist

Elusive statistics

From Controversies in the Foundations of Statistics by Bradley Efron:

Statistics seems to be a difficult subject for mathematicians, perhaps because its elusive and wide-ranging character mitigates against the traditional theorem-proof method of presentation. It may come as some comfort then that statistics is also a difficult subject for statisticians.

Related posts:

Number theory determinant and SymPy

Let σ(n) be the sum of the positive divisors of n and let gcd(a, b) be the greatest common divisor of a and b.

Form an n by n matrix M whose (i, j) entry is σ(gcd(i, j)). Then the determinant of M is n!.

The following code shows that the theorem is true for a few values of n and shows how to do some common number theory calculations in SymPy.

from sympy import gcd, divisors, Matrix, factorial

def f(i, j):
    return sum( divisors( gcd(i, j) ) )

def test(n):
    r = range(1, n+1)
    M = Matrix( [ [f(i, j) for j in r] for i in r] )
    return M.det() - factorial(n)

for n in range(1, 11):
    print test(n)

As expected, the test function returns zeros.

If we replace the function σ above by τ where τ(n) is the number of positive divisors of n, the corresponding determinant is 1. To test this, replace sum by len in the definition of f and replace factorial(n) by 1.

In case you’re curious, both results are special cases of the following more general theorem. I don’t know whose theorem it is. I found it here.

For any arithmetic function f(m), let g(m) be defined for all positive integers m by

g(m) = \sum_{d \,\mid \,m} \mu(d) f\left(\frac{m}{d}\right)

Let M be the square matrix of order n with ij element f(gcd(i, j)). Then

\det M = \prod_i^n g(j)

Here μ is the Möbius function. The two special cases above correspond to g(m) = m and g(m) = 1.

I before E?

How well does the spelling rule “i before e except after c” hold? I searched the 5,000 most common English words (from here) to see.

70% of the words containing ‘ie” or “ei” follow the rule.

If you weigh the word counts by word frequency, the rule only holds 54% of the time.

There’s a longer version of the rule that adds “or when sounding as ‘a’ as in neighbor or weigh.”  This version holds for 79% of the words in my list. And when weighted by frequency, the rule holds 85% of the time.

Update: Here’s an even more accurate version from Merriam-Webster:

i before e,
except after c,
or when sounded as a,
as in ‘neighbor’ and ‘weigh’,
or when it appears in comparatives and superlatives like ‘fancier’,
or when the c sounds as sh as in ‘glacier’,
or when the vowel sounds like ee as in ‘seize’,
or i as in ‘height’,
or when it shows up in compound words such as ‘albeit’,
or when it shows up in -ing inflections of verbs that end in e, like queueing,
or occasionally in technical words that have a strong etymological link to their parent languages such as ‘cuneiform’ and ‘caffeine’,
and in numerous other random exceptions such as ‘science’, ‘forfeit’, and ‘weird.’

 

“I got the easy ones wrong”

This morning my daughter told me that she did well on a spelling test, but she got the easiest words wrong. Of course that’s not exactly true. The words that are hardest for her to spell are the ones she in fact did not spell correctly. She probably meant that she missed the words she felt should have been easy. Maybe they were short words. Children can be intimidated by long words, even though long words tend to be more regular and thus easier to spell.

Our perceptions of what is easy are often upside-down. We feel that some things should be easy even though our experience tells us otherwise.

Sometimes the trickiest parts of a subject come first, but we think that because they come first they should be easy. For example, force-body diagrams come at the beginning of an introductory physics class, but they can be hard to get right. Newton didn’t always get them right. More advanced physics, say celestial mechanics, is in some ways easier, or at least less error-prone.

“Elementary” and “easy” are not the same. Sometimes they’re opposites. Getting off the ground, so to speak, may be a lot harder than flying.

Mathematics and one-handed pottery

From Michael Atiyah’s essay Trends in Pure Mathematics.

In any given field of mathematics there are always some very fine points which present great technical challenges to the specialist but not are usually of interest to the general mathematician. To make an analogy, if you want to buy some hand-made pottery you are usually not interested in whether the potter used two hands or only one, though the potter would no doubt take great pride in his single-handed effort. In general the mathematical results that have the widest impact are not technically the most difficult.

This essay appears in the first volume of Atiyah’s collected works.

See his comments on this passage in this recent interview.

Hard-working lazy people

“It was a favorite theme of C. S. Lewis that only lazy people work hard. By lazily abdicating the essential work of deciding and directing, establishing values and setting goals, other people do it for us; then we find ourselves frantically, at the last minute, trying to satisfy a half dozen demands on our time, none of which is essential to our vocation, to stave off the disaster of disappointing someone.” — Eugene Peterson