Data is code and code is data. The distinction between software (“code”) and input (“data”) is blurry at best, arbitrary at worst. And this distinction, or lack thereof, has interesting implications for regulation.
In some contexts software is regulated but data is not, or at least software comes under different regulations than data. For example, maybe you have to maintain test records for software but not for data.
Suppose as part of some project you need to search for files containing the word “apple” and you use the command line utility
grep. The text “apple” is data, input to the
grep program. Since grep is a widely used third party tool, it doesn’t have to be validated, and you haven’t written any code.
Next you need to search for “apple” and “Apple” and so you search on the regular expression “[aA]pple” rather than a plain string. Now is the regular expression “[aA]pple” code? It’s at least a tiny step in the direction of code.
What about more complicated regular expressions? Regular expressions are equivalent to deterministic finite automata, which sure seem like code. And that’s only regular expressions as originally defined. The term “regular expression” has come to mean more expressive patterns. Perl regular expressions can even contain arbitrary Perl code.
In practice we can agree that certain things are “code” and others are “data,” but there are gray areas where people could sincerely disagree. And someone wanting to be argumentative could stretch this gray zone to include everything. One could argue, for example, that all software is data because it’s input to a compiler or interpreter.
You might say “data is what goes into a database and code is what goes into a compiler.” That’s a reasonable rule of thumb, but databases can store code and programs can store data. Programmers routinely have long discussions about what belongs in a database and what belongs in source code. Throw regulatory considerations into the mix and there could be incentives to push more code into the database or more data into the source code.
* * *
See Slava Akhmechet’s essay The Nature of Lisp for a longer discussion of the duality between code and data.
Thank you for reading my blog. I’m starting a new email newsletter to address two things that readers have mentioned.
Some say they enjoy the blog, but I post more often than they care to keep up with, particularly if they’re only interested in the non-technical posts.
Others have said they’d like to know more about my consulting business. There are some interesting things going on there, but I’d rather not write about them on the blog.
The newsletter will address both of these groups. I’ll highlight a few posts from the blog, some technical and some not, and I’ll say a little about what I’ve been up to.
If you’d like to receive the newsletter, you can sign up here.
I won’t share your email address with anyone and you can unsubscribe at any time.
Twitter once provided RSS feeds for all Twitter accounts. They no longer provide this service. However, third parties can create RSS feeds from the content of Twitter accounts. BazQux has done this for my daily tip accounts, so you can subscribe to any of my accounts via RSS using the feeds linked to below.
If you would like to subscribe to more Twitter accounts via RSS, you could subscribe to the BazQux service and create a custom RSS feed for whatever Twitter, Google+, or Facebook accounts you’d like to follow.
From The World Beyond Your Head:
The appeal of magic is that it promises to render objects plastic to the will without one’s getting too entangled with them. Treated at arm’s length, the object can issue no challenge to the self. … The clearest contrast … that I can think of is the repairman, who must submit himself to the broken washing machine, listen to it with patience, notice its symptoms, and then act accordingly. He cannot treat it abstractly; the kind of agency he exhibits is not at all magical.
Related post: Programming languages and magic
From JPL scientist Rich Terrile:
In everyone’s pocket right now is a computer far more powerful than the one we flew on Voyager, and I don’t mean your cell phone—I mean the key fob that unlocks your car.
These days technology is equated with computer technology. For example, the other day I heard someone talk about bringing chemical engineering and technology together, as if chemical engineering isn’t technology. If technology only means computer technology, then the Voyager probes are very low-tech.
And yet Voyager 1 has left the solar system! (Depending on how you define the solar system.*) It’s the most distant man-made object, about 20 billion kilometers away. It’s still sending back data 38 years after it launched, and is expected to keep doing so for a few more years before its power supply runs too low. Voyager 2 is doing fine as well, though it’s taking longer to leave the solar system. Surely this is a far greater technological achievement than a key fob.
* * *
* Voyager 1 has left the heliosphere, far beyond Pluto, and is said to be in the “interstellar medium.” But it won’t reach the Oort cloud for another 300 years and won’t leave the Oort cloud for 30,000 years.
Source: The Interstellar Age: Inside the Forty-Year Voyager Mission
In Book VIII of Paradise Lost, the angel Raphael tells Adam what difficulties men will have with astronomy:
Hereafter, when they come to model heaven
And calculate the stars: how they will wield the
The mighty frame, how build, unbuild, contrive
To save appearances, how gird the sphere
With centric and eccentric scribbled o’er,
Cycle and epicycle, orb in orb.
Related post Quaternions in Paradise Lost
Every positive integer is either part of the sequence ⌊ nπ ⌋ or the sequence ⌊ nπ/(π – 1) ⌋ where n ranges over positive integers, and no positive integer is in both sequences.
This is a special case of Beatty’s theorem.
In the Star Trek episode “All Our Yesterdays” the people of the planet Sarpeidon have escaped into their past because their sun is about to become a supernova. They did this via a time machine called the Atavachron.
One detail of the episode has stuck with me since I first saw it many years ago: although people can go back to any period in history, they have to be prepared somehow, and once prepared they cannot go back. Kirk, Spock, and McCoy only have hours to live because they traveled back in time via the Atavachron without being properly prepared. (Kirk is in a period analogous to Renaissance England while Spock and McCoy are in an ice age.)
If such time travel were possible, I expect you would indeed need to be prepared. Life in Renaissance England or the last ice age would be miserable for someone with contemporary expectations, habits, fitness, etc., though things weren’t as bad for the people at the time. Neither would life be entirely pleasant for someone thrust into our time from the past. Cultures work out their own solutions to life’s problems, and these solutions form a package. It may not be possible to swap components in and out à la carte and maintain a working solution.
Adult heights follow a Gaussian, a.k.a. normal, distribution . The usual explanation is that many factors go into determining one’s height, and the net effect of many separate causes is approximately normal because of the central limit theorem.
If that’s the case, why aren’t more phenomena normally distributed? Someone asked me this morning specifically about phenotypes with many genetic inputs.
The central limit theorem says that the sum of many independent, additive effects is approximately normally distributed . Genes are more digital than analog, and do not produce independent, additive effects. For example, the effects of dominant and recessive genes act more like max and min than addition. Genes do not appear independently—if you have some genes, you’re more likely to have certain other genes—nor do they act independently—some genes determine how other genes are expressed.
Height is influenced by environmental effects as well as genetic effects, such as nutrition, and these environmental effects may be more additive or independent than genetic effects.
Incidentally, if effects are independent but multiplicative rather than additive, the result may be approximately log-normal rather than normal.
* * *
 Men’s heights follow a normal distribution, and so do women’s. Adults not sorted by sex follow a mixture distribution as described here and so the distribution is flatter on top than a normal. It gets even more complicated when you considered that there are slightly more women than men in the world. And as with many phenomena, the normal distribution is a better description near the middle than at the extremes.
 There are many variations on the central limit theorem. The classical CLT requires that the random variables in the sum be identically distributed as well, though that isn’t so important here.
Last night I checked a few books out from a library. One was Milton’s Paradise Lost and another was Kuipers’ Quaternions and Rotation Sequences. I didn’t expect any connection between these two books, but there is one.
The following lines from Book V of Paradise Lost, starting at line 180, are quoted in Kuipers’ book:
Air and ye elements, the eldest birth
Of nature’s womb, that in quaternion run
Perpetual circle, multiform, and mix
And nourish all things, let your ceaseless change
Vary to our great maker still new praise.
When I see quaternion I naturally think of Hamilton’s extension of the complex numbers, discovered in 1843. Paradise Lost, however, was published in 1667.
Milton uses quaternion to refer to the four elements of antiquity: air, earth, water, and fire. The last three are “the eldest birth of nature’s womb” because they are mentioned in Genesis before air is mentioned.
For the last fifteen Wednesdays I’ve been posting links to technical notes. This is the end of the series.
You can find most of the links from previous Wednesday posts on one page by going to technical notes from the navigation menu at the top of the site.
When people sneer at a technology for being too easy to use, it’s worth trying out.
If the only criticism is that something is too easy or “OK for beginners” then maybe it’s a threat to people who invested a lot of work learning to do things the old way.
The problem with the “OK for beginners” put-down is that everyone is a beginner sometimes. Professionals are often beginners because they’re routinely trying out new things. And being easier for beginners doesn’t exclude the possibility of being easier for professionals too.
Sometimes we assume that harder must be better. I know I do. For example, when I first used Windows, it was so much easier than Unix that I assumed Unix must be better for reasons I couldn’t articulate. I had invested so much work learning to use the Unix command line, it must have been worth it. (There are indeed advantages to doing some things from the command line, but not the work I was doing at the time.)
There often are advantages to doing things the hard way, but something isn’t necessary better because it’s hard. The easiest tool to pick up may not be best tool for long-term use, but then again it might be.
Most of the time you want to add the easy tool to your toolbox, not take the old one out. Just because you can use specialized professional tools doesn’t mean that you always have to.
Related post: Don’t be a technical masochist
The origin of the word idiot is “one’s own,” the same root as idiom. So originally an idiot was someone in his own world, someone who takes no outside input. The historical meaning carries over to some degree: When you see a smart person do something idiotic, it’s usually because he’s acting alone.
The opposite of an idiot would not be someone who is wise, but someone who takes too much outside input, someone who passively lets others make all his decisions or who puts every decision to a vote.
An idiot lives only in his own world; the opposite of an idiot has no world of his own. Both are foolish, but I think the Internet encourages more of the latter kind of foolishness. It’s not turning people into idiots, it’s turning them into the opposite.
For this week’s resource post, see the page Stand-alone code for numerical computing. It points to small, self-contained bits of code for special functions (log gamma, erf, etc.) and for random number generation (normal, Poisson, gamma, etc.).
The code is available in Python, C++, and C# versions. It could easily be translated into other languages since it hardly uses any language-specific features.
I wrote these functions for projects where you don’t have a numerical library available or would like to minimize dependencies. If you have access to a numerical library, such as SciPy in Python, then by all means use it (although SciPy is missing some of the random number generators provided here). In C++ and especially C#, it’s harder to find some of this functionality.
Last week: Code Project articles
Next week: Clinical trial software