AI Spring

Artificial intelligence, or at least the perception of artificial intelligence, has gone from disappointing to frightening in the blink of an eye. As Marc Andreessen said on Twitter this morning:

AI: From “It’s so horrible how little progress has been made” to “It’s so horrible how much progress has been made” in one step.

When I read this I thought of Pandora (the mythical figure, not the music service).

“Are you still working on opening that box? Any progress?”

“No, the lid just … won’t … budge … Oh wait, I think I got it.”

 

Related post: Why the robots aren’t coming in the way you expect by Mark Burgess

Ursula K. Le Guin has it backward

Ursula K. Le Guin is asking people to not buy books from Amazon because they market bestsellers, the literary equivalent of junk food. She said last week

I believe that reading only packaged microwavable fiction ruins the taste, destabilizes the moral blood pressure, and makes the mind obese.

I agree with that. That’s why I shop at Amazon.

If I liked to read best-selling junk food, I could find it at any bookstore. But I like to read less popular books, books I can only find from online retailers like Amazon. If fact, most of Amazon’s revenue comes from obscure books, not bestsellers.

Suppose I want to read something by, I don’t know, say, Ursula K. Le Guin. I doubt I could find a copy of any of her books, certainly not her less popular books, within 20 miles of my house, and I live in the 4th largest city in the US. There’s nothing by her in the closest Barnes and Noble. But I could easy find anything she’s ever written on Amazon.

If you’d like to support Amazon so they can continue to bring us fine authors like Ursula K. Le Guin, authors you can’t find in stores that mostly sell packaged microwavable fiction, you can buy one of the books mentioned on this blog from Amazon.

Reading equations forward and backward

There is no logical difference between writing A = B and writing B = A, but there is a psychological difference.

Equations are typically applied left to right. When you write A = B you imply that it may be useful to replace A with B. This is helpful to keep in mind when learning something new: the order in which an equation is written gives a hint as to how it may be applied. However, this way of thinking can also be a limitation. Clever applications often come from realizing that you can apply an equation in the opposite of the usual direction.

For example, Euler’s reflection formula says

Γ(z) Γ(1-z) = π / sin(πz).

Reading from left to right, this says that two unfamiliar/difficult things, values of the Gamma function, are related to a more familiar/simple thing, the sine function. It would be odd to look at this formula and say “Great! Now I can compute sines if I just know values of the Gamma function.” Instead, the usual reaction would be “Great! Now I can relate the value of Gamma at two different places by using sines.”

When we see Einstein’s equation

E = mc2

the first time, we think about creating energy from matter, such as the mass lost in nuclear fission. This applies the formula from left to right, relating what we want to know, an amount of energy, to what we do know, an amount of mass. But you could also read the equation from right to left, calculating the amount of energy, say in an accelerator, necessary to create a particle of a given mass.

Calculus textbooks typically have a list of equations, either inside the covers or in an appendix, that relate an integral on the left to a function or number on the right. This makes sense because calculus students compute integrals. But mathematicians often apply these equations in the opposite direction, replacing a number or function with an integral. To a calculus student this is madness: why replace a familiar thing with a scary thing? But integrals aren’t scary to mathematicians. Expressing a function as an integral is often progress. Properties of a function may be easier to see in integral form. Also, the integral may lend itself to some computational technique, such as reversing the order of integration in a double integral, or reversing the order to taking a limit and an integral.

Calculus textbooks also have lists of equations involving infinite sums, the summation always being on the left. Calculus students want to replace the scary thing, the infinite sum, with the familiar thing, the expression on the right. Generating functions turn this around, wanting to replace things with infinite sums. Again this would seem crazy to a calculus student, but it’s a powerful problem solving technique.

Differential equation students solve differential equations. They want to replace what they find scary, a differential equation, with something more familiar, a function that satisfies the differential equation. But mathematicians sometimes want to replace a function with a differential equation that it satisfies. This is common, for example, in studying special functions. Classical orthogonal polynomials satisfy 2nd order differential equations, and the differential equation takes a different form for different families of orthogonal polynomials. Why would you want to take something as tangible and familiar as a polynomial, something you might study as a sophomore in high school, and replace it with something as abstract and mysterious as a differential equation, something you might study as a sophomore in college? Because some properties, properties that you would not have cared about in high school, are more clearly seen via the differential equations.

Launching missiles with Haskell

Haskell advocates are fond of saying that a Haskell function cannot launch missiles without you knowing it. Pure functions have no side effects, so they can only do what they purport to do. In a language that does not enforce functional purity, calling a function could have arbitrary side effects, including launching missiles. But this cannot happen in Haskell.

The difference between pure functional languages and traditional imperative languages is not quite that simple in practice.

Programming with pure functions is conceptually easy but can be awkward in practice. You could just pass each function the state of the world before the call, and it returns the state of the world after the call. It’s unrealistic to pass a program’s entire state as an argument each time, so you’d like to pass just that state that you need to, and have a convenient way of doing so. You’d also like the compiler to verify that you’re only passing around a limited slice of the world. That’s where monads come in.

Suppose you want a function to compute square roots and log its calls. Your square root function would have to take two arguments: the number to find the root of, and the state of the log before the function call. It would also return two arguments: the square root, and the updated log. This is a pain, and it makes function composition difficult.

Monads provide a sort of side-band for passing state around, things like our function call log. You’re still passing around the log, but you can do it implicitly using monads. This makes it easier to call and compose two functions that do logging. It also lets the compiler check that you’re passing around a log but not arbitrary state. A function that updates a log, for example, can effect the state of the log, but it can’t do anything else. It can’t launch missiles.

Once monads get large and complicated, it’s hard to know what side effects they hide. Maybe they can launch missiles after all. You can only be sure by studying the source code. Now how do you know that calling a C function, for example, doesn’t launch missiles? You study the source code. In that sense Haskell and C aren’t entirely different.

The Haskell compiler does give you assurances that a C compiler does not. But ultimately you have to study source code to know what a function does and does not do.

Related post: Monads are hard because …

Mystery curve

This afternoon I got a review copy of the book Creating Symmetry: The Artful Mathematics of Wallpaper Patterns. Here’s a striking curves from near the beginning of the book, one that the author calls the “mystery curve.”

The curve is the plot of exp(it) – exp(6it)/2 + i exp(-14it)/3 with t running from 0 to 2π.

Here’s Python code to draw the curve.

import matplotlib.pyplot as plt
from numpy import pi, exp, real, imag, linspace

def f(t):
    return exp(1j*t) - exp(6j*t)/2 + 1j*exp(-14j*t)/3

t = linspace(0, 2*pi, 1000)

plt.plot(real(f(t)), imag(f(t)))

# These two lines make the aspect ratio square
fig = plt.gcf()
fig.gca().set_aspect('equal')

plt.show()

Maybe there’s a more direct way to plot curves in the complex plane rather than taking real and imaginary parts.

Updated code for the aspect ratio per Janne’s suggestion in the comments.

Related posts:

Several people have been making fun visualizations that generalize the example above.

Brent Yorgey has written two posts, one choosing frequencies randomly and another that animates the path of a particle along the curve and shows how the frequency components each contribute to the motion.

Mike Croucher developed a Jupyter notebook that lets you vary the frequency components with sliders.

John Golden created visualizations in GeoGerba here and here.

Jennifer Silverman showed how these curves are related to decorative patterns that popular in the 1960’s. She also created a coloring book and a video.

Dan Anderson accused me of nerd sniping him and created this visualization.

Unix-like shells on Windows

This post gives some notes on ways to create a Unix-like command line experience on Windows, without using a virtual machine like VMWare or a quasi-virtual machine like Cygwin.

Finding Windows ports of Unix utilities is easy. The harder part is finding a shell that behaves as expected. (Of course “as expected” depends on your expectations!)

There have been many projects to port Unix utilities to Windows, particularly GnuWin32 and Gow. Some of the command shells I’ve tried are:

  • Cmd
  • PowerShell
  • Eshell
  • Bash
  • Clink

I’d recommend the combination of Gow and Clink for most people. If you’re an Emacs power user you might like Eshell.

Cmd

The built-in command line on Windows is cmd. It’s sometimes called the “DOS prompt” though that’s misleading. DOS died two decades ago and the cmd shell has improved quite a bit since then.

cmd has some features you might not expect, such as pushd and popd. However, I don’t believe it has anything analogous to dirs to let you see the directory stack.

PowerShell

PowerShell is a very sophisticated scripting environment, but the interactive shell itself (e.g. command editing functionality) is basically cmd. (I haven’t kept up with PowerShell and that may have changed.) This means that writing a PowerShell script is completely different from writing a batch file, but the experience of navigating the command line is essentially the same as cmd.

Eshell

You can run shells inside Emacs. By default, M-x shell brings up a cmd prompt inside an Emacs buffer. You can also use Emacs’ own shell with the command M-x eshell.

Eshell is a shell implemented in Emacs Lisp. Using Eshell is very similar across platforms. On a fresh Windows machine, with nothing like Gow installed, Eshell provides some of the most common Unix utilities. You can use the which command to see whether you’re using a native executable or Emacs Lisp code. For example, if you type which ls into Eshell, you get the response

    eshell/ls is a compiled Lisp function in `em-ls.el'

The primary benefit of Eshell is that provides integration with Emacs. As the documentation says

Eshell is not a replacement for system shells such as bash or zsh. Use Eshell when you want to move text between Emacs and external processes …

Eshell does not provide some of the command editing features you might expect from bash. But the reason for this is clear: if you’re inside Emacs, you’d want to use the full power of Emacs editing, not the stripped-down editing features of a command line. For example, you cannot use ^foo^bar to replace foo with bar in the previous command. Instead, you could retrieve the previous command and edit it just as you edit any other line inside Emacs.

In bash you can use !^ to recall the first argument of the previous command and !$!$ using $_ instead. Many of the other bash shortcuts that begin with ! work as expected: !foo, !!, !-3, etc. Directory navigation commands like cd -, pushd, and popd work as in bash.

Bash

Gow comes with a bash shell, a Windows command line program that creates a bash-like environment. I haven’t had much experience with it, but it seems to be a faithful bash implementation with few compromises for Windows, for better and for worse. For example, it doesn’t understand backslashes as directory separators.

There are other implementations of bash on Windows, but I either haven’t tried them (e.g. win-bash) or have had bad experience with them (e.g. Cygwin).

Clink

Clink is not a shell per se but an extension to cmd. It adds the functionality of the Gnu readline library to the Windows command line and so you can use all the Emacs-like editing commands that you can with bash: Control-a to move to the beginning of a line, Control-k to delete the rest of a line, etc.

Clink also gives you Windows-like behavior that Windows itself doesn’t provide, such as being able to paste text onto the command line with Control-v.

I’ve heard that Clink will work with PowerShell, but I was not able to make it work.

The command editing and history shortcuts beginning with ! mentioned above all work with Clink, as do substitutions like ^foo^bar.

Conclusion

In my opinion, the combination of Gow and Clink gives a good compromise between a Windows and Unix work environment. And if you’re running Windows, a compromise is probably what you want. Otherwise, simply run a (possibly virtual) Linux machine. Attempts to make Windows too Unix-like run down an uncanny valley where it’s easy to waste a lot of time.

Data, code, and regulation

Data is code and code is data. The distinction between software (“code”) and input (“data”) is blurry at best, arbitrary at worst. And this distinction, or lack thereof, has interesting implications for regulation.

In some contexts software is regulated but data is not, or at least software comes under different regulations than data. For example, maybe you have to maintain test records for software but not for data.

Suppose as part of some project you need to search for files containing the word “apple” and you use the command line utility grep. The text “apple” is data, input to the grep program. Since grep is a widely used third party tool, it doesn’t have to be validated, and you haven’t written any code.

Next you need to search for “apple” and “Apple” and so you search on the regular expression “[aA]pple” rather than a plain string. Now is the regular expression “[aA]pple” code? It’s at least a tiny step in the direction of code.

What about more complicated regular expressions? Regular expressions are equivalent to deterministic finite automata, which sure seem like code. And that’s only regular expressions as originally defined. The term “regular expression” has come to mean more expressive patterns.  Perl regular expressions can even contain arbitrary Perl code.

In practice we can agree that certain things are “code” and others are “data,” but there are gray areas where people could sincerely disagree. And someone wanting to be argumentative could stretch this gray zone to include everything. One could argue, for example, that all software is data because it’s input to a compiler or interpreter.

You might say “data is what goes into a database and code is what goes into a compiler.” That’s a reasonable rule of thumb, but databases can store code and programs can store data. Programmers routinely have long discussions about what belongs in a database and what belongs in source code. Throw regulatory considerations into the mix and there could be incentives to push more code into the database or more data into the source code.

* * *

See Slava Akhmechet’s essay The Nature of Lisp for a longer discussion of the duality between code and data.

Subway map of the solar system

This is a thumbnail version of a large, high-resolution image by Ulysse Carion. Thanks to Aleksey Shipilëv (@shipilev) for pointing it out.

It’s hard to see in the thumbnail, but the map gives the change in velocity needed at each branch point. You can find the full 2239 x 2725 pixel image here or click on the thumbnail above.

Fibonacci number system

Every positive integer can be written as the sum of distinct Fibonacci numbers. For example, 10 = 8 + 2, the sum of the fifth Fibonacci number and the second.

This decomposition is unique if you impose the extra requirement that consecutive Fibonacci numbers are not allowed. [1] It’s easy to see that the rule against consecutive Fibonacci numbers is necessary for uniqueness. It’s not as easy to see that the rule is sufficient.

Every Fibonacci number is itself the sum of two consecutive Fibonacci numbers—that’s how they’re defined—so clearly there are at least two ways to write a Fibonacci number as the sum of Fibonacci numbers, either just itself or its two predecessors. In the example above, 8 = 5 + 3 and so you could write 10 as 5 + 3 + 2.

The nth Fibonacci number is approximately φn/√5 where φ = 1.618… is the golden ratio. So you could think of a Fibonacci sum representation for x as roughly a base φ representation for √5x.

You can find the Fibonacci representation of a number x using a greedy algorithm: Subtract the largest Fibonacci number from x that you can, then subtract the largest Fibonacci number you can from the remainder, etc.

Programming exercise: How would you implement a function that finds the largest Fibonacci number less than or equal to its input? Once you have this it’s easy to write a program to find Fibonacci representations.

* * *

[1] This is known as Zeckendorf’s theorem, published by E. Zeckendorf in 1972. However, C. G. Lekkerkerker had published the same result 20 years earlier.

New monthly newsletter

Thank you for reading my blog. I’m starting a new email newsletter to address two things that readers have mentioned.

Some say they enjoy the blog, but I post more often than they care to keep up with, particularly if they’re only interested in the non-technical posts.

Others have said they’d like to know more about my consulting business. There are some interesting things going on there, but I’d rather not write about them on the blog.

The newsletter will address both of these groups. I’ll highlight a few posts from the blog, some technical and some not, and I’ll say a little about what I’ve been up to.

If you’d like to receive the newsletter, you can sign up here.

I won’t share your email address with anyone and you can unsubscribe at any time.

Information hiding

One of the basic principles of software development is information hiding. People agree that it’s desirable, but may not realize they have different ideas of what it means. And when done poorly, well-meaning attempts to make software more maintainable backfire. Leo Brodie cautions

… we should clarify. From what, or whom, are we hiding information?

[T]raditional languages … bend over backwards to ensure that modules hide internal routines and data structures from other modules. The goal is to achieve module independence (a minimum coupling). The fear seems to be that modules strive to attack each other like alien antibodies. Or else, that evil bands of marauding modules are out to clobber the precious family data structures.

This is not what we’re concerned about. The purpose of hiding information, as we mean it, is simply to minimize the effects of a possible design-change by localizing things that might change within each component.

Quote from Thinking Forth. Emphasis added.

 

Rotating PDF pages with Python

Yesterday I got a review copy of Automate the Boring Stuff with Python. It explains, among other things, how to manipulate PDFs from Python. This morning I needed to rotate some pages in a PDF, so I decided to try out the method in the book.

The sample code uses PyPDF2. I’m using Conda for my Python environment, and PyPDF2 isn’t directly available for Conda. I searched Binstar with

binstar search -t conda pypdf2

The first hit was from JimInCO, so I installed PyPDF2 with

conda install -c https://conda.binstar.org/JimInCO pypdf2

I scanned a few pages from a book to PDF, turning the book around every other page, so half the pages in the PDF were upside down. I needed a script to rotate the even numbered pages. The script counts pages from 0, so it rotates the odd numbered pages from its perspective.

import PyPDF2

pdf_in = open('original.pdf', 'rb')
pdf_reader = PyPDF2.PdfFileReader(pdf_in)
pdf_writer = PyPDF2.PdfFileWriter()

for pagenum in range(pdf_reader.numPages):
    page = pdf_reader.getPage(pagenum)
    if pagenum % 2:
        page.rotateClockwise(180)
    pdf_writer.addPage(page)

pdf_out = open('rotated.pdf', 'wb')
pdf_writer.write(pdf_out)
pdf_out.close()
pdf_in.close()

It worked as advertised on the first try.

RSS feeds for Twitter accounts

Twitter once provided RSS feeds for all Twitter accounts. They no longer provide this service. However, third parties can create RSS feeds from the content of Twitter accounts. BazQux has done this for my daily tip accounts, so you can subscribe to any of my accounts via RSS using the feeds linked to below.

If you would like to subscribe to more Twitter accounts via RSS, you could subscribe to the BazQux service and create a custom RSS feed for whatever Twitter, Google+, or Facebook accounts you’d like to follow.

Scientifically valid, practically invalid

In a recent episode of EconTalk, Phil Rosenzweig describes how the artificial conditions necessary to make experiments scientifically valid can also make the results practically invalid.

Rosenzweig discusses experiments designed to study decision making. In order to make clean comparisons, subjects are presented with discrete choices over which they have no control. They cannot look for more options or exercise any other form of agency. The result is an experiment that is easy to analyze and easy to publish, but so unrealistic as to tell us little about real-world decision making.

In his book Left Brain, Right Stuff, Rosenzweig quotes Philip Tetlock’s summary:

Much mischief can be wrought by transplanting this hypothesis-testing logic, which flourishes in controlled lab settings, into the hurly-burly of real-world settings where ceteris paribus never is, and never can be, satisfied.