Uncategorized

Student’s future, teacher’s past

“Teachers should prepare the student for the student’s future, not for the teacher’s past.” — Richard Hamming

I ran across the above quote from Hamming this morning. It made me wonder whether I tried to prepare students for my past when I used to teach college students.

How do you prepare a student for the future? Mostly by focusing on skills that will always be useful, even as times change: logic, clear communication, diligence, etc.

Negative forecasting is more reliable here than positive forecasting. It’s hard to predict what’s going to be in demand in the future (besides timeless skills), but it’s easier to predict what’s probably not going to be in demand. The latter aligns with Hamming’s exhortation not to prepare students for your past.

Changing your mind

From Dorothy Sayers’ essay Why Work?

It is always strange and painful to have to change a habit of mind; though, when we have made the effort, we may find a great relief, even a sense of adventure and delight, in getting rid of the false and returning to the true.

Cauchy, Benford, and a problem with NHST

Introduction

Samples from a Cauchy distribution nearly follow Benford’s law. I’ll demonstrate this below. The more data you see, the more confident you should be of this. But with a typical statistical approach, crudely applied NHST (null hypothesis significance testing), the more data you see, the less convinced you are.

This post assumes you’ve read the previous post that explains what Benford’s law is and looks at how well samples from a Weibull distribution follow that law.

This post has two purposes. First, we show that samples from a Cauchy distribution approximately follow Benford’s law. Second, we look at problems with testing goodness of fit with NHST.

Cauchy data

We can reuse the code from the previous post to test Cauchy samples, with one modification. Cauchy samples can be negative, so we have to modify our leading_digit function to take an absolute value.

      def leading_digit(x):
          y = log10(abs(x)) % 1
          return int(floor(10**y))

We’ll also need to import cauchy from scipy.stats and change where we draw samples to use this distribution.

      samples = cauchy.rvs(0, 1, N)

Here’s how a sample of 1000 Cauchy values compared to the prediction of Benford’s law:

|---------------+----------+-----------|
| Leading digit | Observed | Predicted |
|---------------+----------+-----------|
|             1 |      313 |       301 |
|             2 |      163 |       176 |
|             3 |      119 |       125 |
|             4 |       90 |        97 |
|             5 |       69 |        79 |
|             6 |       74 |        67 |
|             7 |       63 |        58 |
|             8 |       52 |        51 |
|             9 |       57 |        46 |
|---------------+----------+-----------|

Here’s a bar graph of the same data.Bar graph of Cauchy leading digits compared to Benford's law

Problems with NHST

A common way to measure goodness of fit is to use a chi-square test. The null hypothesis would be that the data follow a Benford distribution. We look at the chi-square statistic for the observed data, based on a chi-square distribution with 8 degrees of freedom (one less than the number of categories, which is 9 because of the nine digits). We compute the p-value, the probability of seeing a chi-square statistic this larger or larger, and reject our null hypothesis if this p-value is too small.

Here’s how our chi-square values and p-values vary with sample size.

|-------------+------------+---------|
| Sample size | chi-square | p-value |
|-------------+------------+---------|
|          64 |     13.542 |  0.0945 |
|         128 |     10.438 |  0.2356 |
|         256 |     13.002 |  0.1118 |
|         512 |      8.213 |  0.4129 |
|        1024 |     10.434 |  0.2358 |
|        2048 |      6.652 |  0.5745 |
|        4096 |     15.966 |  0.0429 |
|        8192 |     20.181 |  0.0097 |
|       16384 |     31.855 | 9.9e-05 |
|       32768 |     45.336 | 3.2e-07 |
|-------------+------------+---------|

The p-values eventually get very small, but they don’t decrease monotonically with sample size. This is to be expected. If the data came from a Benford distribution, i.e. if the null hypothesis were true, we’d expect the p-values to be uniformly distributed, i.e. they’d be equally likely to take on any value between 0 and 1. And not until the two largest samples do we see values that don’t look consistent with uniform samples from [0, 1].

In one sense NHST has done its job. Cauchy samples do not exactly follow Benford’s law, and with enough data we can show this. But we’re rejecting a null hypothesis that isn’t that interesting. We’re showing that the data don’t exactly follow Benford’s law rather than showing that they do approximately follow Benford’s law.

What personality classifications have in common

There are many ways to divide people into four personality types, from the classical—sanguine, choleric, melancholic, and phlegmatic—to contemporary systems such as the DISC profile. The Myers-Briggs system divides people into sixteen personality types. I just recently ran across the “enneagram,” an ancient system for dividing people into nine categories.

There’s one thing advocates of all the aforementioned systems agree on: the number of basic personality types is a perfect square.

Resisting simplicity

As much as we admire simplicity and strive for simplicity, something in us isn’t happy when we achieve it.

Sometimes we’re disappointed with a simple solution because, although we don’t realize it yet, we didn’t properly frame the problem it solves.

I’ve been in numerous conversations where someone says effectively, “I understand that 2+3 = 5, but what if we made it 5.1?” They really want an answer of 5.1, or maybe larger, for reasons they can’t articulate. They formulated a problem whose solution is to add 2 and 3, but that formulation left out something they care about. In this situation, the easy response to say is “No, 2+3 = 5. There’s nothing we can do about that.” The more difficult response is to find out why “5” is an unsatisfactory result.

Sometimes we’re uncomfortable with a simple solution even though it does solve the right problem.

If you work hard and come up with a simple solution, it may look like you didn’t put in much effort. And if someone else comes up with the simple solution, you may look foolish.

Sometimes simplicity is disturbing. Maybe it has implications we have to get used to.

Update: A couple people have replied via Twitter saying that we resist simplicity because it’s boring. I think beneath that is that we’re not ready to move on to a new problem.

When you’re invested in a problem, it can be hard to see it solved. If the solution is complicated, you can keep working for a simpler solution. But once someone finds a really simple solution, it’s hard to justify continuing work in that direction.

A simple solution is not something to dwell on but to build on. We want some things to be boringly simple so we can do exciting things with them. But it’s hard to shift from producer to consumer: Now that I’ve produced this simple solution, and still a little sad that it’s wrapped up, how can I use it to solve something else?

Related posts:

Technical notes and other relatively hidden content

I’ve written quite a few pages that are separate from the timeline of the blog. These are a little hidden, not because I want to hide them, but because you can’t make everything equally easy to find. These notes cover a variety of topics:

You can find an index of all these notes here.

Some of the most popular notes:

And here is some more relatively hidden content:

Assignment complete, twenty years later

In one section of his book The Great Good Thing, novelist Andrew Klavan describes how he bluffed his way through high school and college, not reading anything he was assigned. He doesn’t say what he majored in, but apparently he got an English degree without reading a book. He only tells of one occasion where a professor called his bluff.

Even though he saw no value in the books he was assigned, he bought and saved every one of them. Then sometime near the end of college he began to read and enjoy the books he hadn’t touched.

I wanted to read their works now, all of them, and so I began. After I graduated, after Ellen and I moved together to New York, I piled the books I had bought in college in a little forest of stacks around my tattered wing chair. And I read them. Slowly, because I read slowly, but every day, for hours, in great chunks. I pledged to myself I would never again pretend to have read a book I hadn’t or fake my way through a literary conversation or make learned reference on the page to something I didn’t really know. I made reading part of my daily discipline, part of my workday, no matter what. Sometimes, when I had to put in long hours to make a living, it was a real slog. …

It took me twenty years. In twenty years, I cleared those stacks of books away. I read every book I had bought in college, cover to cover. I read many of the other books by the authors of those books and many of the books those authors read and many of the books by the authors of those books too.

There came a day when I was in my early forties … when it occurred to me that I had done what I set out to do. …

Against all odds, I had managed to get an education.

 

Microresumés

I posted a couple things on Twitter today about micro-resumés. First, here’s how I’d summarize my work in a tweet.

(The formatting is a little off above. It’s leaving out a couple line breaks at the end that were in the original tweet.)

That’s not a bad summary. I’ve worked in applied math, software development, and statistics. Now I consult in those areas.

Next, I did the same for Frank Sinatra.

This one’s kinda obscure. It’s a reference to the title cut from his album That’s Life.

I’ve been a puppet, a pauper, a pirate
A poet, a pawn and a king.
I’ve been up and down and over and out
And I know one thing.
Each time I find myself flat on my face
I pick myself up and get back in the race.

How efficient is Morse code?

telegraph

Morse code was designed so that the most frequently used letters have the shortest codes. In general, code length increases as frequency decreases.

How efficient is Morse code? We’ll compare letter frequencies based on Google’s research with the length of each code, and make the standard assumption that a dash is three times as long as a dot.

|--------+------+--------+-----------|
| Letter | Code | Length | Frequency |
|--------+------+--------+-----------|
| E      | .    |      1 |    12.49% |
| T      | -    |      3 |     9.28% |
| A      | .-   |      4 |     8.04% |
| O      | ---  |      9 |     7.64% |
| I      | ..   |      2 |     7.57% |
| N      | -.   |      4 |     7.23% |
| S      | ...  |      3 |     6.51% |
| R      | .-.  |      5 |     6.28% |
| H      | .... |      4 |     5.05% |
| L      | .-.. |      6 |     4.07% |
| D      | -..  |      5 |     3.82% |
| C      | -.-. |      8 |     3.34% |
| U      | ..-  |      5 |     2.73% |
| M      | --   |      6 |     2.51% |
| F      | ..-. |      6 |     2.40% |
| P      | .--. |      8 |     2.14% |
| G      | --.  |      7 |     1.87% |
| W      | .--  |      7 |     1.68% |
| Y      | -.-- |     10 |     1.66% |
| B      | -... |      6 |     1.48% |
| V      | ...- |      6 |     1.05% |
| K      | -.-  |      7 |     0.54% |
| X      | -..- |      8 |     0.23% |
| J      | .--- |     10 |     0.16% |
| Q      | --.- |     10 |     0.12% |
| Z      | --.. |      8 |     0.09% |
|--------+------+--------+-----------|

There’s room for improvement. Assigning the letter O such a long code, for example, was clearly not optimal.

But how much difference does it make? If we were to rearrange the codes so that they corresponded to letter frequency, how much shorter would a typical text transmission be?

Multiplying the code lengths by their frequency, we find that an average letter, weighted by frequency, has code length 4.5268.

What if we rearranged the codes? Then we would get 4.1257 which would be about 9% more efficient. To put it another way, Morse code achieved 91% of the efficiency that it could have achieved with the same codes. This is relative to Google’s English corpus. A different corpus would give slightly different results.

Toward the bottom of the table above, letter frequencies correspond poorly to code lengths, though this hardly matters for efficiency. But some of the choices near the top of the table are puzzling. The relative frequency of the first few letters has remained stable over time and was well known long before Google. (See ETAOIN SHRDLU.) Maybe there were factors other than efficiency that influenced how the most frequently used characters were encoded.

Update: Some sources I looked at said that a dash is three times as long as a dot, including the space between dots or dashes. Others said there is a pause as long as a dot between elements. If you use the latter timing, it takes an average time equal to 6.0054 dots to transmit an English letter, and this could be improved to 5.6616. By that measure Morse code is about 93.5% efficient. (I only added time for space inside the code for a letter because the space between letters is the same no matter how they are coded.)

Data-driven charity

In this post I interview GiveDirectly co-founder Paul Niehaus about charitable direct cash transfers and their empirical approach to charity.

Paul Niehaus of GiveDirectly

JC: Can you start off by telling us a little bit about Give Directly, and what you do?

PN: GiveDirectly is the first nonprofit that lets individual donors like you and me send money directly to the extreme poor. And that’s it—we don’t buy them things we think they need, or tell them what they should be doing, or how they should be doing it. Michael Faye and I co-founded GD, along with Jeremy Shapiro and Rohit Wanchoo, because on net we felt (and still feel) the poor have a stronger track record putting money to use than most of the intermediaries and experts who want to spend it for them.

JC: What are common objections you brush up against, and how do you respond?

PN: We’ve all heard and to some extent internalized a lot of negative stereotypes about the extreme poor—you can’t just give them money, they’ll blow it on alcohol, they won’t work as hard, etc. And it’s only in the last decade or so with the advent of experimental testing that we’ve build a broad evidence base showing that in fact quite the opposite is the case—in study after study the poor have used money sensibly, and if anything drank less and worked more. So to us it’s simply a question of catching folks up on the data.

JC: Why do you think randomized controlled trials are emerging in development economics just in the past decade or so when it has been a standard tool gold standard in other areas for much longer?

PN: I agree that experimental testing in development is long overdue. And to be blunt, I think it came late because we worry more about getting real results when we’re helping ourselves than we do when we’re helping others. When it comes to helping others, we get our serotonin from believing we’re making a difference, not the actual difference we make (which we may never find out, for example when we give to charities overseas). And so it’s tempting to succumb to wishful thinking rather than rigorous testing.

JC: What considerations went into the design of your pending basic income trial? What would you have loved to do differently methodologically if you had 10X the budget? 100X?

PN: This experiment is all about scale, in a couple of ways. First, there have been some great basic income pilots in the past, but they haven’t committed to supporting people for more than a few years. That’s important because a big argument the “pro” camp makes is that guaranteeing long-term economic security will free people up to take risks, be more creative, etc.—and a big worry the “con” camp raises is that it will cause people to stop trying. So it was important to commit to support over a long period. We’re doing over a decade—12 years—and with more funding we’d go even longer.

Second, it’s important to test this by randomizing at the community level, not just the individual level. That’s because a lot of the debate over basic income is about how community interactions will change (vs purely individual behavior). So we’re enrolling entire villages—and with more funding, we could make that entire counties, etc. That lets you start to understanding impacts on community cohesion, social capital, the macroeconomy, etc.

JC: In what ways do you think math has served as a good or poor guide for development economics over the years?

PN: I think the far more important question is why has math—and in particular statistics—played such a small role in development decision-making, while “success stories” and “theories of change” have played such large ones.

JC: Can you say something about the efficiency of GiveDirectly?

PN: What we’ve tried to do at GD is, first, be very clear about our marginal cost structure—typically around 90% in the hands of the poor, 10% on costs of enrolling them and delivering funds; and second, provide evidence on how these transfers affect a wide range of outcomes and let donors judge for themselves how valuable those outcomes are.

JC: What is your vision for a methodologically sound poverty reduction research program? What are the main pitfalls and challenges you see?

PN: First, we need to run experiments at larger scales. Testing new ideas in a few villages, run by an NGO, is a great start, but it’s not always an accurate to guide to how an intervention will perform when a government tries to deliver it nation-wide, or how doing something at that scale will affect the broader economy (what we call “general equilibrium effects”). I’ve written about this recently with Karthik Muralidharan based on some of our recent experiences running large-scale evaluations in India.

Second, we need to measure value created for the poor. RCTs tell us how an intervention changes “outcomes,” but not how valuable those outcomes are. That’s fine if you want to assign your own values to outcomes—I could be an education guy, say, and care only about years of formal schooling. But if we care at all about the values and priorities of the poor themselves, we need a different approach. One simple step is to ask people how much money an intervention is worth to them—what economists call their “willingness to pay.” If we’re spending $100 on a program, we’d hope it’s worth at least that much to the beneficiary. If not, begs the question why we don’t just give them the money.

JC: What can people do to help?

PN: Lots of things. Here are a few:

  1. Set up a recurring donation, preferably to the basic income project. Worst case scenario your money will make life much better for someone in extreme poverty; best case, it will also generate evidence that redefines anti-poverty policy.
  2. Follow ten recipients on GDLive. Share things they say that you find interesting. Give us feedback on the experience (which is very beta).
  3. Ask five friends whether they give money to poor people. Find out what they think and why. Share the evidence and information we’ve published and then give us feedback—what was helpful? What was missing?
  4. Ask other charities to publish the experimental evidence on their interventions prominently on their websites, and to explain why they are confident that they can add more value for the poor by spending money on their behalf than the poor could create for themselves if they had the money. Some do! But we need to create a world where simply publishing a few “success stories” doesn’t cut it any more.

Related post: Interview with Food for the Hungry CIO

Monthly highlights

If you enjoy reading the articles here, you might like a monthly review of the most popular posts.

I send out a newsletter at the end of each month. I’ve sent out around 20 so far. They all have two parts:

  1. a review of the most popular posts of the month, and
  2. a few words about what I’ve been up to.

That’s it. Short and sweet. I might send out more mail than this someday, but I’ve been doing this for nearly two years I’ve never sent more than one email a month.

If you’d like to subscribe, just enter your email address in the box on the side of the page labeled “Subscribe to my newsletter.” If you’re not reading this directly on the site, say you’re reading it in an RSS reader, then you can follow this link.

Changing names

I’ve just started reading Laurus, an English translation of a contemporary Russian novel. The book opens with this paragraph.

He had four names at various times. A person’s life is heterogeneous, so this could be seen as an advantage. Life’s parts sometimes have little in common, so little that it might appear that various people lived them. When this happens, it is difficult not to feel surprised that all these people carry the same name.

This reminded me of the section of James Scott’s Seeing Like a State that explains how names used to be more variable.

Among some peoples, it is not uncommon for individuals to have different names during different stages of life (infancy, childhood, adulthood) and in some cases after death; added to these are names used for joking, rituals, and mourning and names used for interactions with same-sex friends or with in-laws. Each name is specific to a certain phase of life, social setting, or interlocutor.

If someone’s name had more than one component, the final component might come from their profession (which could change) rather than their ancestry. Scott goes on to say

The invention of permanent, inherited patronyms was … the last step in establishing the necessary preconditions of modern statecraft. In almost every case it was a state project, designed to allow officials to identify, unambiguously, the majority of its citizens.

In short, governments insisted people adopt fixed names to make them easier to tax and to conscript. Before fixed names, governments would ask towns to provide so much tax money or so many soldiers because it could not tax or conscript citizens directly. For a famous example, see Luke’s account of the birth of Jesus: all went to be registered, each to his own town.

It’s hard to imagine people not needing fixed names. But when people lived on a smaller scale, interacting with a number of people closer to Dunbar’s number, there was no danger of ambiguity because there was more context.

 

 

Some frequently asked questions

I don’t have an FAQ page per se, but I’ve written a few blog posts where I answer some questions, and here I’ll answer a few more.

Should I get a PhD?

See my answer here and take a look at some of the other answers on the same site.

Do you have any advice for people going out on their own?

Yes. See my post Advice for going solo.

Shortly after I went out on my own, I wrote this post responding to questions people had about my particular situation. My answers there remain valid, except one. I said that planned to do anything I can do well that also pays well. That was true at the time, but I’ve gotten a little more selective since then.

Can you say more about the work you’ve been doing?

Only in general terms. For example, I did some work with psychoacoustics earlier this year, and lately I’ve been working with medical device startups and giving expert testimony.

Nearly all the work I do is covered under NDA (non-disclosure agreement). Occasionally a project will be public, such as the white paper I wrote for Hitachi Data Systems comparing replication and erasure coding. But usually a project is confidential, though I hope to be able to say more about some projects after they come to market.

Miscellaneous other questions

I wrote an FAQ post of sorts a few years ago. Here are the questions from that post that people still ask fairly often.

Any more questions?

You can use this page to send me a question and see my various contact information. The page also has a link to a vCard you could import into your contact manager.