Mystery curve

This afternoon I got a review copy of the book Creating Symmetry: The Artful Mathematics of Wallpaper Patterns. Here’s a striking curves from near the beginning of the book, one that the author calls the “mystery curve.”

The curve is the plot of exp(it) – exp(6it)/2 + i exp(-14it)/3 with t running from 0 to 2π.

Here’s Python code to draw the curve.

import matplotlib.pyplot as plt
from numpy import pi, exp, real, imag, linspace

def f(t):
    return exp(1j*t) - exp(6j*t)/2 + 1j*exp(-14j*t)/3

t = linspace(0, 2*pi, 1000)

plt.plot(real(f(t)), imag(f(t)))

# These two lines make the aspect ratio square
fig = plt.gcf()
fig.gca().set_aspect('equal')

plt.show()

Maybe there’s a more direct way to plot curves in the complex plane rather than taking real and imaginary parts.

Updated code for the aspect ratio per Janne’s suggestion in the comments.

Related posts:

Several people have been making fun visualizations that generalize the example above.

Brent Yorgey has written two posts, one choosing frequencies randomly and another that animates the path of a particle along the curve and shows how the frequency components each contribute to the motion.

Mike Croucher developed a Jupyter notebook that lets you vary the frequency components with sliders.

John Golden created visualizations in GeoGerba here and here.

Jennifer Silverman showed how these curves are related to decorative patterns that popular in the 1960’s. She also created a coloring book and a video.

Dan Anderson accused me of nerd sniping him and created this visualization.

Unix-like shells on Windows

This post gives some notes on ways to create a Unix-like command line experience on Windows, without using a virtual machine like VMWare or a quasi-virtual machine like Cygwin.

Finding Windows ports of Unix utilities is easy. The harder part is finding a shell that behaves as expected. (Of course “as expected” depends on your expectations!)

There have been many projects to port Unix utilities to Windows, particularly GnuWin32 and Gow. Some of the command shells I’ve tried are:

  • Cmd
  • PowerShell
  • Eshell
  • Bash
  • Clink

I’d recommend the combination of Gow and Clink for most people. If you’re an Emacs power user you might like Eshell.

Cmd

The built-in command line on Windows is cmd. It’s sometimes called the “DOS prompt” though that’s misleading. DOS died two decades ago and the cmd shell has improved quite a bit since then.

cmd has some features you might not expect, such as pushd and popd. However, I don’t believe it has anything analogous to dirs to let you see the directory stack.

PowerShell

PowerShell is a very sophisticated scripting environment, but the interactive shell itself (e.g. command editing functionality) is basically cmd. (I haven’t kept up with PowerShell and that may have changed.) This means that writing a PowerShell script is completely different from writing a batch file, but the experience of navigating the command line is essentially the same as cmd.

Eshell

You can run shells inside Emacs. By default, M-x shell brings up a cmd prompt inside an Emacs buffer. You can also use Emacs’ own shell with the command M-x eshell.

Eshell is a shell implemented in Emacs Lisp. Using Eshell is very similar across platforms. On a fresh Windows machine, with nothing like Gow installed, Eshell provides some of the most common Unix utilities. You can use the which command to see whether you’re using a native executable or Emacs Lisp code. For example, if you type which ls into Eshell, you get the response

    eshell/ls is a compiled Lisp function in `em-ls.el'

The primary benefit of Eshell is that provides integration with Emacs. As the documentation says

Eshell is not a replacement for system shells such as bash or zsh. Use Eshell when you want to move text between Emacs and external processes …

Eshell does not provide some of the command editing features you might expect from bash. But the reason for this is clear: if you’re inside Emacs, you’d want to use the full power of Emacs editing, not the stripped-down editing features of a command line. For example, you cannot use ^foo^bar to replace foo with bar in the previous command. Instead, you could retrieve the previous command and edit it just as you edit any other line inside Emacs.

In bash you can use !^ to recall the first argument of the previous command and !$!$ using $_ instead. Many of the other bash shortcuts that begin with ! work as expected: !foo, !!, !-3, etc. Directory navigation commands like cd -, pushd, and popd work as in bash.

Bash

Gow comes with a bash shell, a Windows command line program that creates a bash-like environment. I haven’t had much experience with it, but it seems to be a faithful bash implementation with few compromises for Windows, for better and for worse. For example, it doesn’t understand backslashes as directory separators.

There are other implementations of bash on Windows, but I either haven’t tried them (e.g. win-bash) or have had bad experience with them (e.g. Cygwin).

Clink

Clink is not a shell per se but an extension to cmd. It adds the functionality of the Gnu readline library to the Windows command line and so you can use all the Emacs-like editing commands that you can with bash: Control-a to move to the beginning of a line, Control-k to delete the rest of a line, etc.

Clink also gives you Windows-like behavior that Windows itself doesn’t provide, such as being able to paste text onto the command line with Control-v.

I’ve heard that Clink will work with PowerShell, but I was not able to make it work.

The command editing and history shortcuts beginning with ! mentioned above all work with Clink, as do substitutions like ^foo^bar.

Conclusion

In my opinion, the combination of Gow and Clink gives a good compromise between a Windows and Unix work environment. And if you’re running Windows, a compromise is probably what you want. Otherwise, simply run a (possibly virtual) Linux machine. Attempts to make Windows too Unix-like run down an uncanny valley where it’s easy to waste a lot of time.

Data, code, and regulation

Data is code and code is data. The distinction between software (“code”) and input (“data”) is blurry at best, arbitrary at worst. And this distinction, or lack thereof, has interesting implications for regulation.

In some contexts software is regulated but data is not, or at least software comes under different regulations than data. For example, maybe you have to maintain test records for software but not for data.

Suppose as part of some project you need to search for files containing the word “apple” and you use the command line utility grep. The text “apple” is data, input to the grep program. Since grep is a widely used third party tool, it doesn’t have to be validated, and you haven’t written any code.

Next you need to search for “apple” and “Apple” and so you search on the regular expression “[aA]pple” rather than a plain string. Now is the regular expression “[aA]pple” code? It’s at least a tiny step in the direction of code.

What about more complicated regular expressions? Regular expressions are equivalent to deterministic finite automata, which sure seem like code. And that’s only regular expressions as originally defined. The term “regular expression” has come to mean more expressive patterns.  Perl regular expressions can even contain arbitrary Perl code.

In practice we can agree that certain things are “code” and others are “data,” but there are gray areas where people could sincerely disagree. And someone wanting to be argumentative could stretch this gray zone to include everything. One could argue, for example, that all software is data because it’s input to a compiler or interpreter.

You might say “data is what goes into a database and code is what goes into a compiler.” That’s a reasonable rule of thumb, but databases can store code and programs can store data. Programmers routinely have long discussions about what belongs in a database and what belongs in source code. Throw regulatory considerations into the mix and there could be incentives to push more code into the database or more data into the source code.

* * *

See Slava Akhmechet’s essay The Nature of Lisp for a longer discussion of the duality between code and data.

Subway map of the solar system

This is a thumbnail version of a large, high-resolution image by Ulysse Carion. Thanks to Aleksey Shipilëv (@shipilev) for pointing it out.

It’s hard to see in the thumbnail, but the map gives the change in velocity needed at each branch point. You can find the full 2239 x 2725 pixel image here or click on the thumbnail above.

Fibonacci number system

Every positive integer can be written as the sum of distinct Fibonacci numbers. For example, 10 = 8 + 2, the sum of the fifth Fibonacci number and the second.

This decomposition is unique if you impose the extra requirement that consecutive Fibonacci numbers are not allowed. [1] It’s easy to see that the rule against consecutive Fibonacci numbers is necessary for uniqueness. It’s not as easy to see that the rule is sufficient.

Every Fibonacci number is itself the sum of two consecutive Fibonacci numbers—that’s how they’re defined—so clearly there are at least two ways to write a Fibonacci number as the sum of Fibonacci numbers, either just itself or its two predecessors. In the example above, 8 = 5 + 3 and so you could write 10 as 5 + 3 + 2.

The nth Fibonacci number is approximately φn/√5 where φ = 1.618… is the golden ratio. So you could think of a Fibonacci sum representation for x as roughly a base φ representation for √5x.

You can find the Fibonacci representation of a number x using a greedy algorithm: Subtract the largest Fibonacci number from x that you can, then subtract the largest Fibonacci number you can from the remainder, etc.

Programming exercise: How would you implement a function that finds the largest Fibonacci number less than or equal to its input? Once you have this it’s easy to write a program to find Fibonacci representations.

* * *

[1] This is known as Zeckendorf’s theorem, published by E. Zeckendorf in 1972. However, C. G. Lekkerkerker had published the same result 20 years earlier.

New monthly newsletter

Thank you for reading my blog. I’m starting a new email newsletter to address two things that readers have mentioned.

Some say they enjoy the blog, but I post more often than they care to keep up with, particularly if they’re only interested in the non-technical posts.

Others have said they’d like to know more about my consulting business. There are some interesting things going on there, but I’d rather not write about them on the blog.

The newsletter will address both of these groups. I’ll highlight a few posts from the blog, some technical and some not, and I’ll say a little about what I’ve been up to.

If you’d like to receive the newsletter, you can sign up here.

I won’t share your email address with anyone and you can unsubscribe at any time.

Information hiding

One of the basic principles of software development is information hiding. People agree that it’s desirable, but may not realize they have different ideas of what it means. And when done poorly, well-meaning attempts to make software more maintainable backfire. Leo Brodie cautions

… we should clarify. From what, or whom, are we hiding information?

[T]raditional languages … bend over backwards to ensure that modules hide internal routines and data structures from other modules. The goal is to achieve module independence (a minimum coupling). The fear seems to be that modules strive to attack each other like alien antibodies. Or else, that evil bands of marauding modules are out to clobber the precious family data structures.

This is not what we’re concerned about. The purpose of hiding information, as we mean it, is simply to minimize the effects of a possible design-change by localizing things that might change within each component.

Quote from Thinking Forth. Emphasis added.

 

Rotating PDF pages with Python

Yesterday I got a review copy of Automate the Boring Stuff with Python. It explains, among other things, how to manipulate PDFs from Python. This morning I needed to rotate some pages in a PDF, so I decided to try out the method in the book.

The sample code uses PyPDF2. I’m using Conda for my Python environment, and PyPDF2 isn’t directly available for Conda. I searched Binstar with

binstar search -t conda pypdf2

The first hit was from JimInCO, so I installed PyPDF2 with

conda install -c https://conda.binstar.org/JimInCO pypdf2

I scanned a few pages from a book to PDF, turning the book around every other page, so half the pages in the PDF were upside down. I needed a script to rotate the even numbered pages. The script counts pages from 0, so it rotates the odd numbered pages from its perspective.

import PyPDF2

pdf_in = open('original.pdf', 'rb')
pdf_reader = PyPDF2.PdfFileReader(pdf_in)
pdf_writer = PyPDF2.PdfFileWriter()

for pagenum in range(pdf_reader.numPages):
    page = pdf_reader.getPage(pagenum)
    if pagenum % 2:
        page.rotateClockwise(180)
    pdf_writer.addPage(page)

pdf_out = open('rotated.pdf', 'wb')
pdf_writer.write(pdf_out)
pdf_out.close()
pdf_in.close()

It worked as advertised on the first try.

RSS feeds for Twitter accounts

Twitter once provided RSS feeds for all Twitter accounts. They no longer provide this service. However, third parties can create RSS feeds from the content of Twitter accounts. BazQux has done this for my daily tip accounts, so you can subscribe to any of my accounts via RSS using the feeds linked to below.

If you would like to subscribe to more Twitter accounts via RSS, you could subscribe to the BazQux service and create a custom RSS feed for whatever Twitter, Google+, or Facebook accounts you’d like to follow.

Scientifically valid, practically invalid

In a recent episode of EconTalk, Phil Rosenzweig describes how the artificial conditions necessary to make experiments scientifically valid can also make the results practically invalid.

Rosenzweig discusses experiments designed to study decision making. In order to make clean comparisons, subjects are presented with discrete choices over which they have no control. They cannot look for more options or exercise any other form of agency. The result is an experiment that is easy to analyze and easy to publish, but so unrealistic as to tell us little about real-world decision making.

In his book Left Brain, Right Stuff, Rosenzweig quotes Philip Tetlock’s summary:

Much mischief can be wrought by transplanting this hypothesis-testing logic, which flourishes in controlled lab settings, into the hurly-burly of real-world settings where ceteris paribus never is, and never can be, satisfied.

The Mozart Myth

I don’t know how many times I’ve heard about how Mozart would compose entire musical scores in his head and only write them down once they were finished. Even authors who stress that creativity requires false starts and hard work have said that Mozart may have been an exception. But maybe he wasn’t.

In his new book How to Fly a Horse, Kevin Ashton says that the Mozart story above is a myth based on a forged letter. According to Ashton,

Mozart’s real letters—to his father, to his sister, and to others—reveal his true creative process. He was exceptionally talented, but he did not write by magic. He sketched his compositions, revised them, and sometimes got stuck. He could not work without a piano or harpsichord. He would set work aside and return to it later. … Masterpieces did not come to him complete in uninterrupted streams of imagination, nor without an instrument, nor did he write them whole and unchanged. The letter is not only forged, it is false.

Related posts:

 

Pedantic arithmetic rules

Generations of math teachers have drilled into their students that they must reduce fractions. That serves some purpose in the early years, but somewhere along the way students need to learn reducing fractions is not only unnecessary, but can be bad for communication. For example, if the fraction 45/365 comes up in the discussion of something that happened 45 days in a year, the fraction 45/365 is clearer than 9/73. The fraction 45/365 is not simpler in a number theoretic sense, but it is psychologically simpler since it’s obvious where the denominator came from. In this context, writing 9/73 is not a simplification but an obfuscation.

Simplifying fractions sometimes makes things clearer, but not always. It depends on context, and context is something students don’t understand at first. So it makes sense to be pedantic at some stage, but then students need to learn that clear communication trumps pedantic conventions.

Along these lines, there is a old taboo against having radicals in the denominator of a fraction. For example, 3/√5 is not allowed and should be rewritten as 3√5/5. This is an arbitrary convention now, though there once was a practical reason for it, namely that in hand calculations it’s easier to multiply by a long fraction than to divide by it. So, for example, if you had to reduce 3/√5 to a decimal in the old days, you’d look up √5 in a table to find it equals 2.2360679775. It would be easier to compute 0.6*2.2360679775 by hand than to compute 3/2.2360679775.

As with unreduced fractions, radicals in the denominator might be not only mathematically equivalent but psychologically preferable. If there’s a 3 in some context, and a √5, then it may be clear that 3/√5 is their ratio. In that same context someone may look at 3√5/5 and ask “Where did that factor of 5 in the denominator come from?”

A possible justification for rules above is that they provide standard forms that make grading easier. But this is only true for the simplest exercises. With moderately complicated exercises, following a student’s work is harder than determining whether two expressions represent the same number.

One final note on pedantic arithmetic rules: If the order of operations isn’t clear, make it clear. Add a pair of parentheses if you need to. Or write division operations as one thing above a horizontal bar and another below, not using the division symbol. Then you (and your reader) don’t have to worry whether, for example, multiplication has higher precedence than division or whether both have equal precedence and are carried out left to right.

QR Codes and Percolation

Percolation theory looks at problems such as the probability of being able to traverse some region with random obstacles. It is motivated by problems such as modeling the flow of a fluid in a porous medium.

Here’s a percolation problem for QR codes: What is the probability that there is a path from one side of a QR code to the opposite side? How far across a QR code would you expect to be able to go? For example, the QR code below was generated from my contact information. It’s not possible to go from one side to the other, and the red line shows what I believe is the deepest path into the code from a side.

This could make an interesting programming exercise. A simple version would be to start with a file of bits representing a particular QR code and find the deepest path into the corresponding image.

The next step up would be to generate simplified QR codes, requiring certain bits to be set, such as the patterns in three of the four corners that allow a QR reader to orient itself.

The next step in sophistication would be to implement the actual QR encoding algorithm, including its error correction encoding, then use this to encode random data.

(Because of the error correction used by QR codes, you could scan the image above and your QR reader would ignore the red path. It would even work if a fairly large portion of the image were missing because the error correction introduces a lot of redundancy.)

Why is an empty sum 0 and an empty product 1?

In response to my earlier post on why 0! should be 1, several people replied that 0! = 1 because an empty product is 1. You can define the factorial of an integer n as the product of all positive numbers less than or equal to n. There are no positive integers less than or equal to 0, so 0! is an empty product. But this raises the question of why an empty product should be 1.

You could say that an empty sum is 0 because 0 is the additive identity and an empty product is 1 because 1 is the multiplicative identity. If you’d like a simple answer, maybe you should stop reading here.

The problem with the answer above is that it doesn’t say why an operation on an empty set should be defined to be the identity for that operation. The identity is certainly a plausible candidate, but why should it make sense to even define an operation on an empty set, and why should the identity turn out so often to be the definition that makes things proceed smoothly?

The convention that the sum over an empty set should be defined as 0, and that a product over an empty set should be defined to be 1 works well in very general settings where “sum”, “product”, “0”, and “1” take on abstract meanings.

The ultimate generalization of products is the notion of products in category theory. Similarly, the ultimate generalization of sums is categorical co-products. (Co-products are sometimes called sums, but they’re usually called co-products due to a symmetry with products.) Category theory simultaneously addresses a wide variety of operations that could be called products or sums (co-products).

The particular advantage of bringing category theory into this discussion is that it has definitions of product and co-product that are the same for any number of objects, including zero objects; there is no special definition for empty products. Empty products and co-products are a consequence of a more general definition, not special cases defined by convention.

In the category of sets, products are Cartesian products. The product of a set with n elements and one with m elements is one with nm elements. Also in the category of sets, co-products are disjoint unions. The co-product of a set with n elements and one with m elements is one with n+m elements. These examples show a connection between products and sums in arithmetic and products and co-products in category theory.

You can find the full definition of a categorical product here. Below I give the definition leaving out details that go away when we look at empty products.

The product of a set of objects is an object P such that given any other object X … there exists a unique morphism from X to P such that ….

If you’ve never seen this before, you might rightfully wonder what in the world this has to do with products. You’ll have to trust me on this one. [1]

When the set of objects is empty, the missing parts of the definition above don’t matter, so we’re left with requiring that there is a unique morphism [2] from each object X to the product P. In other words, P is a terminal object, often denoted 1. So in category theory, you can say empty products are 1.

But that seems like a leap, since “1” now takes on a new meaning that isn’t obviously connected to the idea of 1 we learned as a child. How is an object such that every object has a unique arrow to it at all like, say, the number of noses on a human face?

We drew a connection between arithmetic and categories before by looking at the cardinality of sets. We could define the product of the numbers n and m as the number of elements in the product of a set with n elements and one with m elements. Similarly we could define 1 as the cardinality of the terminal element, also denoted 1. This is because there is a unique map from any set to the set with 1 element. Pick your favorite one-element set and call it 1. Any other choice is isomorphic to your choice.

Now for empty sums. The following is the definition of co-product (sum), leaving out details that go away when we look at empty co-products.

The co-product of a set of objects is an object S such that given any other object X … there exists a unique morphism from S to X such that ….

As before, when the set of objects is empty, the missing parts don’t matter. Notice that the direction of the arrow in the definition is reversed: there is a unique morphism from the co-product S to any object X. In other words, S is an initial object, denoted for good reasons as 0.  [3]

In set theory, the initial object is the empty set. (If that hurts your head, you’re not alone. But if you think of functions in terms of sets of ordered pairs, it makes a little more sense. The function that sends the empty set to another set is an empty set of ordered pairs!) The cardinality of the initial object 0 is the integer 0, just as the cardinality of the initial object 1 is the integer 1.

* * *

[1] Category theory has to define operations entirely in terms of objects and morphisms. It can’t look inside an object and describe things in terms of elements the way you’d usually do to define the product of two numbers or two sets, so the definition of product has to look very different. The benefit of this extra work is a definition that applies much more generally.

To understand the general definition of products, start by understanding the product of two objects. Then learn about categorical limits and how products relate to limits. (As with products, the categorical definition of limits will look entirely different from familiar limits, but they’re related.)

[2] Morphisms are a generalization of functions. In the category of sets, morphisms are functions.

[3] Sometimes initial objects are denoted by ∅, the symbol for the empty set, and sometimes by 0. To make things more confusing, a “zero,” spelled out as a word rather than a symbol, has a different but related meaning in category theory: an object that is both initial and terminal.