… I said that if science could come up with something like the Jump it could surely solve a problem like that. Severin seized hold of that word, “science.” Science, he said, is not some mysterious larger-than-life force, it’s just the name we give to bright ideas that individual guys have when they’re lying in bed at night, and that if the fuel thing bothered me so much, there was nothing stopping me from having a bright idea to solve it …
Last week I wrote a short commentary on James Hague’s blog post Organization skills beat algorithmic wizardry. This week that post got more traffic than my server could handle. I believe it struck a chord with experienced software developers who know that the challenges they face now are not like the challenges they prepared for in school.
Although I completely agree that “algorithmic wizardry” is over-rated in general, my personal experience has been a little different. My role on projects has frequently been to supply a little bit of algorithmic wizardry. I’ve often been asked to look into a program that is taking too long to run and been able to speed it up by an order of magnitude or two by improving a numerical algorithm. (See an example here.)
James Hague says that “rarely is there some … algorithm that casts a looming shadow over everything else.” I believe he is right, though I’ve been called into projects precisely on those rare occasions when an algorithm does cast a shadow over everything else.
If you’re new to this blog, welcome! Let me show you around.
You can find out more about me and my background here.
If you have any questions or comments, here’s my contact info.
Here’s an insightful paragraph from James Hague’s blog post Organization skills beat algorithmic wizardry:
When it comes to writing code, the number one most important skill is how to keep a tangle of features from collapsing under the weight of its own complexity. I’ve worked on large telecommunications systems, console games, blogging software, a bunch of personal tools, and very rarely is there some tricky data structure or algorithm that casts a looming shadow over everything else. But there’s always lots of state to keep track of, rearranging of values, handling special cases, and carefully working out how all the pieces of a system interact. To a great extent the act of coding is one of organization. Refactoring. Simplifying. Figuring out how to remove extraneous manipulations here and there.
Algorithmic wizardry is easier to teach and easier to blog about than organizational skill, so we teach and blog about it instead. A one-hour class, or a blog post, can showcase a clever algorithm. But how do you present a clever bit of organization? If you jump to the solution, it’s unimpressive. “Here’s something simple I came up with. It may not look like much, but trust me, it was really hard to realize this was all I needed to do.” Or worse, “Here’s a moderately complicated pile of code, but you should have seen how much more complicated it was before. At least now someone stands a shot of understanding it.” Ho hum. I guess you had to be there.
You can’t appreciate a feat of organization until you experience the disorganization. But it’s hard to have the patience to wrap your head around a disorganized mess that you don’t care about. Only if the disorganized mess is your responsibility, something that means more to you than a case study, can you wrap your head around it and appreciate improvements. This means that while you can learn algorithmic wizardry through homework assignments, you’re unlikely to learn organization skills unless you work on a large project you care about, most likely because you’re paid to care about it.
Artificial intelligence, or at least the perception of artificial intelligence, has gone from disappointing to frightening in the blink of an eye. As Marc Andreessen said on Twitter this morning:
AI: From “It’s so horrible how little progress has been made” to “It’s so horrible how much progress has been made” in one step.
When I read this I thought of Pandora (the mythical figure, not the music service).
“Are you still working on opening that box? Any progress?”
“No, the lid just … won’t … budge … Oh wait, I think I got it.”
Related post: Why the robots aren’t coming in the way you expect by Mark Burgess
Ursula K. Le Guin is asking people to not buy books from Amazon because they market bestsellers, the literary equivalent of junk food. She said last week
I believe that reading only packaged microwavable fiction ruins the taste, destabilizes the moral blood pressure, and makes the mind obese.
I agree with that. That’s why I shop at Amazon.
If I liked to read best-selling junk food, I could find it at any bookstore. But I like to read less popular books, books I can only find from online retailers like Amazon. If fact, most of Amazon’s revenue comes from obscure books, not bestsellers.
Suppose I want to read something by, I don’t know, say, Ursula K. Le Guin. I doubt I could find a copy of any of her books, certainly not her less popular books, within 20 miles of my house, and I live in the 4th largest city in the US. There’s nothing by her in the closest Barnes and Noble. But I could easy find anything she’s ever written on Amazon.
If you’d like to support Amazon so they can continue to bring us fine authors like Ursula K. Le Guin, authors you can’t find in stores that mostly sell packaged microwavable fiction, you can buy one of the books mentioned on this blog from Amazon.
There is no logical difference between writing A = B and writing B = A, but there is a psychological difference.
Equations are typically applied left to right. When you write A = B you imply that it may be useful to replace A with B. This is helpful to keep in mind when learning something new: the order in which an equation is written gives a hint as to how it may be applied. However, this way of thinking can also be a limitation. Clever applications often come from realizing that you can apply an equation in the opposite of the usual direction.
For example, Euler’s reflection formula says
Γ(z) Γ(1-z) = π / sin(πz).
Reading from left to right, this says that two unfamiliar/difficult things, values of the Gamma function, are related to a more familiar/simple thing, the sine function. It would be odd to look at this formula and say “Great! Now I can compute sines if I just know values of the Gamma function.” Instead, the usual reaction would be “Great! Now I can relate the value of Gamma at two different places by using sines.”
When we see Einstein’s equation
E = mc2
the first time, we think about creating energy from matter, such as the mass lost in nuclear fission. This applies the formula from left to right, relating what we want to know, an amount of energy, to what we do know, an amount of mass. But you could also read the equation from right to left, calculating the amount of energy, say in an accelerator, necessary to create a particle of a given mass.
Calculus textbooks typically have a list of equations, either inside the covers or in an appendix, that relate an integral on the left to a function or number on the right. This makes sense because calculus students compute integrals. But mathematicians often apply these equations in the opposite direction, replacing a number or function with an integral. To a calculus student this is madness: why replace a familiar thing with a scary thing? But integrals aren’t scary to mathematicians. Expressing a function as an integral is often progress. Properties of a function may be easier to see in integral form. Also, the integral may lend itself to some computational technique, such as reversing the order of integration in a double integral, or reversing the order to taking a limit and an integral.
Calculus textbooks also have lists of equations involving infinite sums, the summation always being on the left. Calculus students want to replace the scary thing, the infinite sum, with the familiar thing, the expression on the right. Generating functions turn this around, wanting to replace things with infinite sums. Again this would seem crazy to a calculus student, but it’s a powerful problem solving technique.
Differential equation students solve differential equations. They want to replace what they find scary, a differential equation, with something more familiar, a function that satisfies the differential equation. But mathematicians sometimes want to replace a function with a differential equation that it satisfies. This is common, for example, in studying special functions. Classical orthogonal polynomials satisfy 2nd order differential equations, and the differential equation takes a different form for different families of orthogonal polynomials. Why would you want to take something as tangible and familiar as a polynomial, something you might study as a sophomore in high school, and replace it with something as abstract and mysterious as a differential equation, something you might study as a sophomore in college? Because some properties, properties that you would not have cared about in high school, are more clearly seen via the differential equations.
Haskell advocates are fond of saying that a Haskell function cannot launch missiles without you knowing it. Pure functions have no side effects, so they can only do what they purport to do. In a language that does not enforce functional purity, calling a function could have arbitrary side effects, including launching missiles. But this cannot happen in Haskell.
The difference between pure functional languages and traditional imperative languages is not quite that simple in practice.
Programming with pure functions is conceptually easy but can be awkward in practice. You could just pass each function the state of the world before the call, and it returns the state of the world after the call. It’s unrealistic to pass a program’s entire state as an argument each time, so you’d like to pass just that state that you need to, and have a convenient way of doing so. You’d also like the compiler to verify that you’re only passing around a limited slice of the world. That’s where monads come in.
Suppose you want a function to compute square roots and log its calls. Your square root function would have to take two arguments: the number to find the root of, and the state of the log before the function call. It would also return two arguments: the square root, and the updated log. This is a pain, and it makes function composition difficult.
Monads provide a sort of side-band for passing state around, things like our function call log. You’re still passing around the log, but you can do it implicitly using monads. This makes it easier to call and compose two functions that do logging. It also lets the compiler check that you’re passing around a log but not arbitrary state. A function that updates a log, for example, can effect the state of the log, but it can’t do anything else. It can’t launch missiles.
Once monads get large and complicated, it’s hard to know what side effects they hide. Maybe they can launch missiles after all. You can only be sure by studying the source code. Now how do you know that calling a C function, for example, doesn’t launch missiles? You study the source code. In that sense Haskell and C aren’t entirely different.
The Haskell compiler does give you assurances that a C compiler does not. But ultimately you have to study source code to know what a function does and does not do.
Related post: Monads are hard because …
This afternoon I got a review copy of the book Creating Symmetry: The Artful Mathematics of Wallpaper Patterns. Here’s a striking curves from near the beginning of the book, one that the author calls the “mystery curve.”
The curve is the plot of exp(it) – exp(6it)/2 + i exp(-14it)/3 with t running from 0 to 2π.
Here’s Python code to draw the curve.
import matplotlib.pyplot as plt from numpy import pi, exp, real, imag, linspace def f(t): return exp(1j*t) - exp(6j*t)/2 + 1j*exp(-14j*t)/3 t = linspace(0, 2*pi, 1000) plt.plot(real(f(t)), imag(f(t))) # These two lines make the aspect ratio square fig = plt.gcf() fig.gca().set_aspect('equal') plt.show()
Maybe there’s a more direct way to plot curves in the complex plane rather than taking real and imaginary parts.
Updated code for the aspect ratio per Janne’s suggestion in the comments.
Several people have been making fun visualizations that generalize the example above.
Brent Yorgey has written two posts, one choosing frequencies randomly and another that animates the path of a particle along the curve and shows how the frequency components each contribute to the motion.
Mike Croucher developed a Jupyter notebook that lets you vary the frequency components with sliders.
This post gives some notes on ways to create a Unix-like command line experience on Windows, without using a virtual machine like VMWare or a quasi-virtual machine like Cygwin.
Finding Windows ports of Unix utilities is easy. The harder part is finding a shell that behaves as expected. (Of course “as expected” depends on your expectations!)
I’d recommend the combination of Gow and Clink for most people. If you’re an Emacs power user you might like Eshell.
The built-in command line on Windows is
cmd. It’s sometimes called the “DOS prompt” though that’s misleading. DOS died two decades ago and the
cmd shell has improved quite a bit since then.
cmd has some features you might not expect, such as
popd. However, I don’t believe it has anything analogous to
dirs to let you see the directory stack.
PowerShell is a very sophisticated scripting environment, but the interactive shell itself (e.g. command editing functionality) is basically
cmd. (I haven’t kept up with PowerShell and that may have changed.) This means that writing a PowerShell script is completely different from writing a batch file, but the experience of navigating the command line is essentially the same as
You can run shells inside Emacs. By default,
M-x shell brings up a
cmd prompt inside an Emacs buffer. You can also use Emacs’ own shell with the command
Eshell is a shell implemented in Emacs Lisp. Using Eshell is very similar across platforms. On a fresh Windows machine, with nothing like Gow installed, Eshell provides some of the most common Unix utilities. You can use the
which command to see whether you’re using a native executable or Emacs Lisp code. For example, if you type
which ls into Eshell, you get the response
eshell/ls is a compiled Lisp function in `em-ls.el'
The primary benefit of Eshell is that provides integration with Emacs. As the documentation says
Eshell is not a replacement for system shells such as
zsh. Use Eshell when you want to move text between Emacs and external processes …
Eshell does not provide some of the command editing features you might expect from
bash. But the reason for this is clear: if you’re inside Emacs, you’d want to use the full power of Emacs editing, not the stripped-down editing features of a command line. For example, you cannot use
^foo^bar to replace
bar in the previous command. Instead, you could retrieve the previous command and edit it just as you edit any other line inside Emacs.
bash you can use
!^ to recall the first argument of the previous command and
$_ instead. Many of the other
bash shortcuts that begin with
! work as expected:
!-3, etc. Directory navigation commands like
popd work as in
Gow comes with a
bash shell, a Windows command line program that creates a
bash-like environment. I haven’t had much experience with it, but it seems to be a faithful bash implementation with few compromises for Windows, for better and for worse. For example, it doesn’t understand backslashes as directory separators.
Clink is not a shell per se but an extension to
cmd. It adds the functionality of the Gnu
readline library to the Windows command line and so you can use all the Emacs-like editing commands that you can with
bash: Control-a to move to the beginning of a line, Control-k to delete the rest of a line, etc.
Clink also gives you Windows-like behavior that Windows itself doesn’t provide, such as being able to paste text onto the command line with Control-v.
I’ve heard that Clink will work with PowerShell, but I was not able to make it work.
The command editing and history shortcuts beginning with
! mentioned above all work with Clink, as do substitutions like
In my opinion, the combination of Gow and Clink gives a good compromise between a Windows and Unix work environment. And if you’re running Windows, a compromise is probably what you want. Otherwise, simply run a (possibly virtual) Linux machine. Attempts to make Windows too Unix-like run down an uncanny valley where it’s easy to waste a lot of time.
Data is code and code is data. The distinction between software (“code”) and input (“data”) is blurry at best, arbitrary at worst. And this distinction, or lack thereof, has interesting implications for regulation.
In some contexts software is regulated but data is not, or at least software comes under different regulations than data. For example, maybe you have to maintain test records for software but not for data.
Suppose as part of some project you need to search for files containing the word “apple” and you use the command line utility
grep. The text “apple” is data, input to the
grep program. Since grep is a widely used third party tool, it doesn’t have to be validated, and you haven’t written any code.
Next you need to search for “apple” and “Apple” and so you search on the regular expression “[aA]pple” rather than a plain string. Now is the regular expression “[aA]pple” code? It’s at least a tiny step in the direction of code.
What about more complicated regular expressions? Regular expressions are equivalent to deterministic finite automata, which sure seem like code. And that’s only regular expressions as originally defined. The term “regular expression” has come to mean more expressive patterns. Perl regular expressions can even contain arbitrary Perl code.
In practice we can agree that certain things are “code” and others are “data,” but there are gray areas where people could sincerely disagree. And someone wanting to be argumentative could stretch this gray zone to include everything. One could argue, for example, that all software is data because it’s input to a compiler or interpreter.
You might say “data is what goes into a database and code is what goes into a compiler.” That’s a reasonable rule of thumb, but databases can store code and programs can store data. Programmers routinely have long discussions about what belongs in a database and what belongs in source code. Throw regulatory considerations into the mix and there could be incentives to push more code into the database or more data into the source code.
* * *
See Slava Akhmechet’s essay The Nature of Lisp for a longer discussion of the duality between code and data.
This is a thumbnail version of a large, high-resolution image by Ulysse Carion. Thanks to Aleksey Shipilëv (@shipilev) for pointing it out.
It’s hard to see in the thumbnail, but the map gives the change in velocity needed at each branch point. You can find the full 2239 x 2725 pixel image here or click on the thumbnail above.
Every positive integer can be written as the sum of distinct Fibonacci numbers. For example, 10 = 8 + 2, the sum of the fifth Fibonacci number and the second.
This decomposition is unique if you impose the extra requirement that consecutive Fibonacci numbers are not allowed.  It’s easy to see that the rule against consecutive Fibonacci numbers is necessary for uniqueness. It’s not as easy to see that the rule is sufficient.
Every Fibonacci number is itself the sum of two consecutive Fibonacci numbers—that’s how they’re defined—so clearly there are at least two ways to write a Fibonacci number as the sum of Fibonacci numbers, either just itself or its two predecessors. In the example above, 8 = 5 + 3 and so you could write 10 as 5 + 3 + 2.
The nth Fibonacci number is approximately φn/√5 where φ = 1.618… is the golden ratio. So you could think of a Fibonacci sum representation for x as roughly a base φ representation for √5x.
You can find the Fibonacci representation of a number x using a greedy algorithm: Subtract the largest Fibonacci number from x that you can, then subtract the largest Fibonacci number you can from the remainder, etc.
Programming exercise: How would you implement a function that finds the largest Fibonacci number less than or equal to its input? Once you have this it’s easy to write a program to find Fibonacci representations.
* * *
 This is known as Zeckendorf’s theorem, published by E. Zeckendorf in 1972. However, C. G. Lekkerkerker had published the same result 20 years earlier.
Thank you for reading my blog. I’m starting a new email newsletter to address two things that readers have mentioned.
Some say they enjoy the blog, but I post more often than they care to keep up with, particularly if they’re only interested in the non-technical posts.
Others have said they’d like to know more about my consulting business. There are some interesting things going on there, but I’d rather not write about them on the blog.
The newsletter will address both of these groups. I’ll highlight a few posts from the blog, some technical and some not, and I’ll say a little about what I’ve been up to.
If you’d like to receive the newsletter, you can sign up here.
I won’t share your email address with anyone and you can unsubscribe at any time.