Uncategorized

Atavachron

In the Star Trek episode “All Our Yesterdays” the people of the planet Sarpeidon have escaped into their past because their sun is about to become a supernova. They did this via a time machine called the Atavachron.

One detail of the episode has stuck with me since I first saw it many years ago: although people can go back to any period in history, they have to be prepared somehow, and once prepared they cannot go back. Kirk, Spock, and McCoy only have hours to live because they traveled back in time via the Atavachron without being properly prepared. (Kirk is in a period analogous to Renaissance England while Spock and McCoy are in an ice age.)

If such time travel were possible, I expect you would indeed need to be prepared. Life in Renaissance England or the last ice age would be miserable for someone with contemporary expectations, habits, fitness, etc., though things weren’t as bad for the people at the time. Neither would life be entirely pleasant for someone thrust into our time from the past. Cultures work out their own solutions to life’s problems, and these solutions form a package. It may not be possible to swap components in and out à la carte and maintain a working solution.

Why isn’t everything normally distributed?

Adult heights follow a Gaussian, a.k.a. normal, distribution [1]. The usual explanation is that many factors go into determining one’s height, and the net effect of many separate causes is approximately normal because of the central limit theorem.

If that’s the case, why aren’t more phenomena normally distributed? Someone asked me this morning specifically about phenotypes with many genetic inputs.

The central limit theorem says that the sum of many independent, additive effects is approximately normally distributed [2]. Genes are more digital than analog, and do not produce independent, additive effects. For example, the effects of dominant and recessive genes act more like max and min than addition. Genes do not appear independently—if you have some genes, you’re more likely to have certain other genes—nor do they act independently—some genes determine how other genes are expressed.

Height is influenced by environmental effects as well as genetic effects, such as nutrition, and these environmental effects may be more additive or independent than genetic effects.

Incidentally, if effects are independent but multiplicative rather than additive, the result may be approximately log-normal rather than normal.

* * *

Fine print:

[1] Men’s heights follow a normal distribution, and so do women’s. Adults not sorted by sex follow a mixture distribution as described here and so the distribution is flatter on top than a normal. It gets even more complicated when you considered that there are slightly more women than men in the world. And as with many phenomena, the normal distribution is a better description near the middle than at the extremes.

[2] There are many variations on the central limit theorem. The classical CLT requires that the random variables in the sum be identically distributed as well, though that isn’t so important here.

Quaternions in Paradise Lost

Last night I checked a few books out from a library. One was Milton’s Paradise Lost and another was Kuipers’ Quaternions and Rotation Sequences. I didn’t expect any connection between these two books, but there is one.

photo of books mentioned here

The following lines from Book V of Paradise Lost, starting at line 180, are quoted in Kuipers’ book:

Air and ye elements, the eldest birth
Of nature’s womb, that in quaternion run
Perpetual circle, multiform, and mix
And nourish all things, let your ceaseless change
Vary to our great maker still new praise.

When I see quaternion I naturally think of Hamilton’s extension of the complex numbers, discovered in 1843. Paradise Lost, however, was published in 1667.

Milton uses quaternion to refer to the four elements of antiquity: air, earth, water, and fire. The last three are “the eldest birth of nature’s womb” because they are mentioned in Genesis before air is mentioned.

 

Too easy

When people sneer at a technology for being too easy to use, it’s worth trying out.

If the only criticism is that something is too easy or “OK for beginners” then maybe it’s a threat to people who invested a lot of work learning to do things the old way.

The problem with the “OK for beginners” put-down is that everyone is a beginner sometimes. Professionals are often beginners because they’re routinely trying out new things. And being easier for beginners doesn’t exclude the possibility of being easier for professionals too.

Sometimes we assume that harder must be better. I know I do. For example, when I first used Windows, it was so much easier than Unix that I assumed Unix must be better for reasons I couldn’t articulate. I had invested so much work learning to use the Unix command line, it must have been worth it. (There are indeed advantages to doing some things from the command line, but not the work I was doing at the time.)

There often are advantages to doing things the hard way, but something isn’t necessary better because it’s hard. The easiest tool to pick up may not be best tool for long-term use, but then again it might be.

Most of the time you want to add the easy tool to your toolbox, not take the old one out. Just because you can use specialized professional tools doesn’t mean that you always have to.

Related post: Don’t be a technical masochist

The opposite of an idiot

The origin of the word idiot is “one’s own,” the same root as idiom. So originally an idiot was someone in his own world, someone who takes no outside input. The historical meaning carries over to some degree: When you see a smart person do something idiotic, it’s usually because he’s acting alone.

The opposite of an idiot would not be someone who is wise, but someone who takes too much outside input, someone who passively lets others make all his decisions or who puts every decision to a vote.

An idiot lives only in his own world; the opposite of an idiot has no world of his own. Both are foolish, but I think the Internet encourages more of the latter kind of foolishness. It’s not turning people into idiots, it’s turning them into the opposite.

Stand-alone code for numerical computing

For this week’s resource post, see the page Stand-alone code for numerical computing. It points to small, self-contained bits of code for special functions (log gamma, erf, etc.) and for random number generation (normal, Poisson, gamma, etc.).

The code is available in Python, C++, and C# versions. It could easily be translated into other languages since it hardly uses any language-specific features.

I wrote these functions for projects where you don’t have a numerical library available or would like to minimize dependencies. If you have access to a numerical library, such as SciPy in Python, then by all means use it (although SciPy is missing some of the random number generators provided here). In C++ and especially C#, it’s harder to find some of this functionality.

Last week: Code Project articles

Next week: Clinical trial software

Decide what to abandon

Sometimes it’s rational to walk away from something you’ve invested a great deal in.

It’s hard imagine how investors could abandon something as large and expensive as a shopping mall. And yet it must have been a sensible decision. If anyone disagreed, they could buy the abandoned mall on the belief that they could make a profit.

The idea that you should stick to something just because you’ve invested in it goes by many names: sunk cost fallacy, escalation of commitment, gambler’s ruin, etc. If further investment will simply lose more money, there’s no economic reason to continue investing, regardless of how much money you’ve spent. (There may be non-economic reasons. You may have a moral obligation to fulfill a commitment or to clean up a mess you’ve made.)

Most of us have not faced the temptation to keep investing in an unprofitable shopping mall, but everyone is tempted by the sunk cost fallacy in other forms: finishing a novel you don’t enjoy reading, holding on to hopelessly tangled software, trying to earn a living with a skill people no longer wish to pay for, etc.

According to Peter Drucker, “It cannot be said often enough that one should not postpone; one abandons.”

The first step in a growth policy is not to decide where and how to grow. It is to decide what to abandon. In order to grow, a business must have a systematic policy to get rid of the outgrown, the obsolete, the unproductive.

It’s usually more obvious what someone else should abandon than what we should abandon. Smart businesses turn to outside consultants for such advice. Smart individuals turn to trusted friends. An objective observer without our emotional investment can see things more clearly than we can.

* * *

This posted started out as a shorter post on Google+.

Photo above by Steve Petrucelli via flickr

Code Project articles

This week’s resource post lists some articles along with source code I’ve posted on CodeProject.

Probability

Pitfalls in Random Number Generation includes several lessons learned the hard way.

Simple Random Number Generation is a random number generator written in C# based on George Marsaglia’s WMC algorithm.

Finding probability distribution parameters from percentiles

Numerical computing

Avoiding Overflow, Underflow, and Loss of Precision explains why the most obvious method for evaluating mathematical functions may not work. The article includes C++ source code for evaluating some functions that come up in statistics (particularly logistic regression) that could have problems if naïvely implemented.

An introduction to numerical programming in C#

Five tips for floating point programming gives five of the most important things someone needs to know when working with floating point numbers.

Optimizing a function of one variable with Brent’s method.

Fast numerical integration using the double-exponential transform method. Optimally efficient numerical integration for analytic functions over a finite interval.

Three methods for root-finding in C#

Getting started with the SciPy (Scientific Python) library

Numerical integration with Simpson’s rule

Filling in the gaps: simple interpolation discusses linear interpolation and inverse interpolation, and gives some suggestions for what to do next if linear interpolation isn’t adequate.

PowerShell

Automated Extract and Build from Team System using PowerShell explains a PowerShell script to automatically extract and build Visual Studio projects from Visual Studio Team System (VSTS) version control.

PowerShell Script for Reviewing Text Show to Users is a tool for finding errors in prose displayed to users that might not be exposed during testing.

Monitoring unreliable scheduled tasks describes a simple program for monitoring legacy processes.

C++

Calculating percentiles in memory-bound applications gives an algorithm and C++ code for calculating percentiles of a list too large to fit into memory.

Quick Start for C++ TR1 Regular Expressions answers 10 of the first questions that are likely to come to mind when someone wants to use the new regular expression support in C++.

Resource series

Last week: Miscellaneous math notes

Next week: Stand-alone numerical code

Rare letter combinations and key chords

A bigram is a pair of letters. For various reasons—word games, cryptography, user interface development, etc.—people are interested in knowing which bigrams occur most often, and so such information is easy to find. But sometimes you might want to know which bigrams occur least often, and that’s harder to find. My interest is finding safe key-chord combinations for Emacs.

Peter Norvig calculated frequencies for all pairs of letters based on the corpus Google extracted from millions of books. He gives a table that will show you the frequency of a bigram when you mouse over it. I scraped his HTML page to create a simple CSV version of the data. My file lists bigrams, frequencies to three decimal places, and the raw counts: bigram_frequencies.csv. The file is sorted in decreasing order of frequency.

The Emacs key-chord module lets you bind pairs of letters to Emacs commands. For example, if you map a command to jk, that command will execute whenever you type j and k in quick succession. In that case if you want the literal sequence “jk” to appear in a file, pause between typing the j and the k. This may sound like a bad idea, but I haven’t run into any problems using it. It allows you to execute your most frequently used commands very quickly. Also, there’s no danger of conflict since neither basic Emacs nor any of its common packages use key chords.

The table below gives bigrams whose percentage frequency rounds to zero keeping three decimal places. See the data file for details.

Since Q is always followed by U in native English words, it’s safe to combine Q with any other letter. (If you need to type Qatar, just pause a little after typing the Q.) It’s also safe to use any consonant after J and most consonants after Z. (It’s rare for a consonant to follow Z, but not quite rare enough to round to zero. ZH and ZL occur with 0.001% frequency, ZY 0.002% and ZZ 0.003%.)

Double letters make especially convenient key chords since they’re easy to type quickly. JJ, KK, QQ, VV, WW, and YY all have frequency rounding to zero. HH and UU have frequency 0.001% and AA, XX, and ZZ have frequency 0.003%.

Note that the discussion above does not distinguish upper and lower case letters in counting frequencies, but Emacs key chords are case-sensitive. You could make a key chord out of any pair of capital letters unless you like to shout in online discussions, use a lot of acronyms, or write old-school FORTRAN.

Update (2 Feb 2015):

This post only considered ordered bigrams. But Emacs key chords are unordered, combinations of keys pressed at or near the same time. This means, for example, that qe would not be a good keychord because although QE is a rare bigram, EQ is not (0.057%). The file unordered_bigram_frequencies.csv gives the combined frequencies of bigrams and their reverse (except for double letters, in which case it simply gives the frequency).

Combinations of J and a consonant are still mostly good key chords except for JB (0.023%), JN (0.011%), and JD (0.005%).

Combinations of Q and a consonant are also good key chords except for QS (0.007%), QN (0.006%), and QC (0.005%). And although O is a vowel, QO still makes a good key chord (0.001%).

Numerical computing resources

This week’s resource post: some numerical computing pages on this site.

See also the Twitter account SciPyTip and numerical programming articles I’ve written for Code Project.

Last week: Regular expressions

Next week: Probability approximations

Blog seventh anniversary

Seven years

This blog is seven years old today. I’ve written 2,273 posts so far, a little less than one per day.

Over the holidays I combed through older posts looking for typos, broken links, etc. I fixed a lot of little things, but I’m sure I missed a few. If you find any problems, please let me know.

Most popular posts

Here are some of the most popular posts for the last three years.

In 2011 I split my most popular post list into three parts:

In 2010 I split the list into two parts:

I didn’t post a list of most popular posts for 2009, but the most popular post that year was Why programmers are not paid in proportion to their productivity.

Finally, my most popular posts for 2008.

Other writing

Here are a few other places you can find things I write:

Probability resources

Each Wednesday I post a list of notes on some topic. This week it’s probability.

See also posts tagged probability and statistics and the Twitter account ProbFact.

Last week: Python resources

Next week: Regular expression resources

Photo review

Here are some of the photos I took on my travels last year.

Bicycles on the Google campus in Mountainview, California:
Google bicycles

Sunrise at Isle Vista, California:
Sunrise

View from University of California Santa Barbara:

Reflection of the Space Needle in the EMP museum in Seattle, Washington:

Paradise Falls, Thousand Oaks, California:

Bed and breakfast in Geldermalsen, The Netherlands:

Amsterdam, The Netherlands:

Chinatown in San Francisco, California

Heidelberg, Germany