As much as we admire simplicity and strive for simplicity, something in us isn’t happy when we achieve it.
Sometimes we’re disappointed with a simple solution because, although we don’t realize it yet, we didn’t properly frame the problem it solves.
I’ve been in numerous conversations where someone says effectively, “I understand that 2+3 = 5, but what if we made it 5.1?” They really want an answer of 5.1, or maybe larger, for reasons they can’t articulate. They formulated a problem whose solution is to add 2 and 3, but that formulation left out something they care about. In this situation, the easy response to say is “No, 2+3 = 5. There’s nothing we can do about that.” The more difficult response is to find out why “5” is an unsatisfactory result.
Sometimes we’re uncomfortable with a simple solution even though it does solve the right problem.
If you work hard and come up with a simple solution, it may look like you didn’t put in much effort. And if someone else comes up with the simple solution, you may look foolish.
Sometimes simplicity is disturbing. Maybe it has implications we have to get used to.
Update: A couple people have replied via Twitter saying that we resist simplicity because it’s boring. I think beneath that is that we’re not ready to move on to a new problem.
When you’re invested in a problem, it can be hard to see it solved. If the solution is complicated, you can keep working for a simpler solution. But once someone finds a really simple solution, it’s hard to justify continuing work in that direction.
A simple solution is not something to dwell on but to build on. We want some things to be boringly simple so we can do exciting things with them. But it’s hard to shift from producer to consumer: Now that I’ve produced this simple solution, and still a little sad that it’s wrapped up, how can I use it to solve something else?
A Menger sponge is created by starting with a cube a recursively removing chunks of it. Draw a 3×3 grid on one face of the cube, then remove the middle square, all the way through the cube. Then do the same for each of the eight remaining squares. Repeat this process over and over, and do it for each face.
The holes are all rectangular, so it’s surprising that the geometry is so varied when you slice open a Menger sponge. For example, when you cut it on the diagonal, you can see stars! (I wrote about this here.)
I mentioned this blog post to a friend at Go 3D Now, a company that does 3D scanning and animation, and he created the video below. The video starts out by taking you through the sponge, then at about the half way part the sponge splits apart.
Harmonic numbers are sort of a discrete analog of logarithms since
As n goes to infinity, the difference between Hn and log n is Euler’s constant γ = 0.57721… 
How would you compute Hn? For small n, simply use the definition. But if n is very large, there’s a way to approximate Hn without having to do a large sum.
Since in the limit Hn – log n goes to γ, a crude approximation would be
But we could do much better by adding a couple terms to the approximation above.  That is,
The error in the approximation above is between 0 and 1/120n4.
So if you used this to compute the 1000th harmonic number, the error would be less than one part in 120,000,000,000,000. Said another way, for n = 1000 the approximation differs from the exact value in the 15th significant digit, approximately the resolution of floating point numbers (i.e. IEEE 754 double precision).
And the formula is even more accurate for larger n. If we wanted to compute the millionth harmonic number, the error in our approximation would be somewhere around the 26th decimal place.
I’ve written quite a few pages that are separate from the timeline of the blog. These are a little hidden, not because I want to hide them, but because you can’t make everything equally easy to find. These notes cover a variety of topics:
The study of the planet Mercury provides two examples of the bandwagon effect. In her new book Worlds Fantastic, Worlds Familiar, planetary astronomer Bonnie Buratti writes
The study of Mercury … illustrates one of the most confounding bugaboos of the scientific method: the bandwagon effect. Scientists are only human, and they impose their own prejudices and foregone conclusions on their experiments.
Around 1800, Johann Schroeter determined that Mercury had a rotational period of 24 hours. This view held for eight decades.
In the 1880’s, Giovanni Schiaparelli determined that Mercury was tidally locked, making one rotation on its axis for every orbits around the sun. This view also held for eight decades.
In 1965, radar measurements of Mercury showed that Mercury completes 3 rotations in every 2 orbits around the sun.
Studying Mercury is difficult since it is only visible near the horizon and around sunrise and sunset, i.e. when the sun’s light interferes. And it is understandable that someone would confuse a 3:2 resonance with tidal locking. Still, for two periods of eight decades each, astronomers looked at Mercury and concluded what they expected.
The difficulty of seeing Mercury objectively was compounded by two incorrect but satisfying metaphors. First that Mercury was like Earth, rotating every 24 hours, then that Mercury was like the moon, orbiting the sun the same way the moon orbits Earth.
Buratti mentions the famous Millikan oil drop experiment as another example of the bandwagon effect.
… Millikan’s value for the electron’s charge was slightly in error—he had used a wrong value for the viscosity of air. But future experimenters all seemed to get Millikan’s number. Having done the experiment myself I can see that they just picked those values that agreed with previous results.
Buratti explains that Millikan’s experiment is hard to do and “it is impossible to successfully do it without abandoning most data.” This is what I like to call acceptance-rejection modeling.
Acceptance-rejection modeling: Throw out data that don’t fit with your model, and what’s left will.
The name comes from the acceptance-rejection method of random number generation. For example, the obvious way to generate truncated normal random values is to generate (unrestricted) normal random values and simply throw out the ones that lie outside the interval we’d like to truncate to. This is inefficient if we’re truncating to a small interval, but it always works. We’re conforming our samples to a pre-determined distribution, which is OK when we do it intentionally. The problem comes when we do it unintentionally.
This post serves two purposes. It will empirically explore a question in number theory and demonstrate quantile-quantile (q-q) plots. It will shed light on a question raised in the previous post. And if you’re not familiar with q-q plots, it will serve as an introduction to such plots.
The previous post said that for almost all x > 1, the fractional parts of the powers of x are uniformly distributed. Although this is true for almost all x, it can be hard to establish for any particular x. The previous post ended with the question of whether the fractional parts of the powers of 3/2 are uniformly distributed.
First, lets just plot the sequence (3/2)n mod 1.
Looks kinda random. But is it uniformly distributed? One way to tell would be to look at the empirical cumulative distribution function (ECDF) and see how it compares to a uniform cumulative distribution function. This is what a quantile-quantile plot does. In our case we’re looking to see whether something has a uniform distribution, but you could use a q-q plot for any distribution. It may be most often used to test normality by looking at whether the ECDF looks like a normal CDF.
If a sequence is uniformly distributed, we would expect 10% of the values to be less than 0.1. We would expect 20% of the values to be less than 0.2. Etc. In other words, we’d expect the quantiles to line up with their theoretical values, hence the name “quantile-quantile” plot. On the horizontal axis we plot uniform values between 0 and 1. On the vertical axis we plot the sorted values of (3/2)n mod 1.
A qq-plot indicates a good fit when values line up near the diagonal, as they do here.
For contrast, let’s look at a qq-plot for the powers of the plastic constant mod 1.
Here we get something very far from the diagonal line. The plot is flat on the left because many of the values are near 0, and it’s flat on the right because many values are near 1.
Incidentally, the Kolmogorov-Smirnov goodness of fit test is basically an attempt to quantify the impression you get from looking at a q-q plot. It’s based on a statistic that measures how far apart the empirical CDF and theoretical CDF are.
A few days ago I wrote about how powers of the golden ratio are nearly integers but powers of π are not. This post is similar but takes a little different perspective. Instead of looking at how close powers are to the nearest integers, we’ll look at how close they are to their floor, the largest integer below. Put another way, we’ll throw away the integer parts and look at the decimal parts.
First a theorem:
For almost all x > 1, the sequence (xn) for n = 1, 2, 3, … is u.d. mod 1. 
Here “almost all” is a technical term meaning that the set of x‘s for which the statement above does not hold has Lebesgue measure zero. The abbreviation “u.d.” stands for “uniformly distributed.” A sequence uniformly distributed mod 1 if the fractional parts of the sequence are distributed like uniform random variables.
Even though the statement holds for almost all x, it’s hard to prove for particular values of x. And it’s easy to find particular values of x for which the theorem does not hold.
… it is interesting to note that one does not know whether sequences such as (en), (πn), or even ((3/2)n) are u.d. mod 1 or not.
The curious part about the quote above is that it’s not clear whether powers of 3/2 are uniformly distributed mod 1. I wouldn’t expect powers of any rational number to be u.d. mod 1. Either my intuition was wrong, or it’s right but hasn’t been proved, at least not when  was written.
The next post will look at powers of 3/2 mod 1 and whether they appear to be uniformly distributed.
One of the case studies in Michael Beirut’s book How to is the graphic design for the planned community Celebration, Florida. The logo for the town’s golf course is an illustration of the bike shed principle.
C. Northcote Parkinson observed that it is easier for a committee to approve a nuclear power plant than a bicycle shed. Nuclear power plants are complex, and no one on a committee presumes to understand every detail. Committee members must rely on the judgment of others. But everyone understands bicycle sheds. Also, questions such as what color to paint the bike shed don’t have objective answers. And so bike sheds provoke long discussions.
People argue about bike sheds because they understand bike sheds. Beirut said something similar about the Celebration Golf Club logo which features a silhouette of a golfer.
Designing the graphics for Celebration’s public golf club was much harder than designing the town seal. It took me some time to realize why: none of our clients were Schwinn-riding, polytailed girls [as in the town seal], but most of them were enthusiastic golfers. The silhouette on the golf club design was refined endlessly as various executives demonstrated their swings in client meetings.
Image credit: By Source, Fair use, https://en.wikipedia.org/w/index.php?curid=37643922
Last week I wrote a blog post showing that powers of the golden ratio are nearly integers. Specifically, the distance from φn to the nearest integer decreases exponentially as n increases. Several people pointed out that the golden constant is a Pisot number, the general class of numbers whose powers are exponentially close to integers.
The so-called plastic constant P is another Pisot number, in fact the smallest Pisot number. P is the real root of x3 – x – 1 = 0.
Because P is a Pisot number, we know that its powers will be close to integers, just like powers of the golden ratio, but the way they approach integers is more interesting. The convergence is slower and less regular.
We will the first few powers of P, first looking at the distance to the nearest integer on a linear scale, then looking at the absolute value of the distance on a logarithmic scale.
As a reminder, here’s what the corresponding plots looked like for the golden ratio.
When I first saw ring theory, my impression was that there were dozens of kinds of rings with dozens of special relations between them—more than I could keep up with. In reality, there just a few basic kinds of rings, and the relations between them are simple.
Here’s a diagram that shows the basic kinds of rings and the relations between them. (I’m only looking at commutative rings, and I assume ever ring has a multiplicative identity.)
The solid lines are unconditional implications. The dashed line is a conditional implication.
Every field is a Euclidean domain.
Every Euclidean domain is a principal ideal domain (PID).
Every principal ideal domain is a unique factorization domain (UFD).
Every unique factorization domain is an integral domain.
A finite integral domain is a field.
Incidentally, the diagram has a sort of embedded pun: the implications form a circle, i.e. a ring.
In his paper Mindless statistics, Gerd Gigerenzer uses a Freudian analogy to describe the mental conflict researchers experience over statistical hypothesis testing. He says that the “statistical ritual” of NHST (null hypothesis significance testing) “is a form of conflict resolution, like compulsive hand washing.”
In Gigerenzer’s analogy, the id represents Bayesian analysis. Deep down, a researcher wants to know the probabilities of hypotheses being true. This is something that Bayesian statistics makes possible, but more conventional frequentist statistics does not.
The ego represents R. A. Fisher’s significance testing: specify a null hypothesis only, not an alternative, and report a p-value. Significance is calculated after collecting the data. This makes it easy to publish papers. The researcher never clearly states his hypothesis, and yet takes credit for having established it after rejecting the null. This leads to feelings of guilt and shame.
The superego represents the Neyman-Pearson version of hypothesis testing: pre-specified alternative hypotheses, power and sample size calculations, etc. Neyman and Pearson insist that hypothesis testing is about what to do, not what to believe. 
* * *
I assume Gigerenzer doesn’t take this analogy too seriously. In context, it’s a humorous interlude in his polemic against rote statistical ritual.
But there really is a conflict in hypothesis testing. Researchers naturally think in Bayesian terms, and interpret frequentist results as if they were Bayesian. They really do want probabilities associated with hypotheses, and will imagine they have them even though frequentist theory explicitly forbids this. The rest of the analogy, comparing the ego and superego to Fisher and Neyman-Pearson respectively, seems weaker to me. But I suppose you could imagine Neyman and Pearson playing the role of your conscience, making you feel guilty about the pragmatic but unprincipled use of p-values.
* * *
 “No test based upon a theory of probability can by itself provide any valuable evidence of the truth or falsehood of a hypothesis. But we may look at the purpose of tests from another viewpoint. Without hoping to know whether each separate hypothesis is true or false, we may search for rules to govern behaviour in regard to them, in following which we insure that, in the long run of experience, we shall not often be wrong.”
Neyman J, Pearson E. On the problem of the most efficient tests of statistical hypotheses. Philos Trans Roy Soc A, 1933;231:289, 337.
This morning I was reading Terry Tao’s overview of the work of Yves Meyer and ran across this line:
The powers φ, φ2, φ3, … of the golden ratio lie unexpectedly close to integers: for instance, φ11 = 199.005… is unusually close to 199.
I’d never heard that before, so I wrote a little code to see just how close golden powers are to integers.
Here’s a plot of the difference between φn and the nearest integer:
(Note that if you want to try this yourself, you need extended precision. Otherwise you’ll get strange numerical artifacts once φn is too large to represent exactly.)
By contrast, if we make the analogous plot replacing φ with π we see that the distance to the nearest integer looks like a uniform random variable:
The distance from powers of φ to the nearest integer decreases so fast that cannot see it in the graph for moderate sized n, which suggests we plot the difference on the log scale. (In fact we plot the log of the absolute value of the difference since the difference could be negative and the log undefined.) Here’s what we get:
After an initial rise, the curve is apparently a straight line on a log scale, i.e. the absolute distance to the nearest integer decreases almost exactly exponentially.
In a recent interview, Tyler Cowen discusses complacency, (neruo-)diversity, etc.
Let me give you a time machine and send you back to Vincent van Gogh, and you have some antidepressants to make him better. What actually would you do, should you do, could you do? We really don’t know. Maybe he would have had a much longer life and produced more wonderful paintings. But I worry about the answer to that question.
And I think in general, for all the talk about diversity, we’re grossly undervaluing actual human diversity and actual diversity of opinion. Ways in which people—they can be racial or ethnic but they don’t have to be at all—ways in which people are actually diverse, and obliterating them somewhat. This is my Toquevillian worry and I think we’ve engaged in the massive social experiment of a lot more anti-depressants and I think we don’t know what the consequences are. I’m not saying people shouldn’t do it. I’m not trying to offer any kind of advice or lecture.
I don’t share Cowen’s concern regarding antidepressants. I haven’t thought about it before. But I am concerned with how much we drug restless boys into submission. (Girls too, of course, but it’s usually boys.)