Differential Equations and the City

by John on May 19, 2012

This afternoon I got a review copy of X and the City: Modeling Aspects of Urban Life by John A. Adam. It’s a book about mathematical model, taking all its examples from urban life: public transportation, growth, pollution, etc. I’ve only skimmed through the book so far, but it looks like most of the applications involve differential equations. Some depend on algebra or probability.

The book looks interesting. I hope to say more about the book once I’ve had a chance to read it. The examples are all short, so it may be any easy book to read a little at a time.

I also got a review copy of The Book of Inkscape today, and I’m expecting several other books soon. It may take a while to get through these since this is a busy time for me. When it rains, it pours.

{ 2 comments }

Castles and quantum mechanics

by John on May 17, 2012

How are castles and quantum mechanics related? One connection is rook polynomials.

The rook is the chess piece that looks like a castle, and used to be called a castle. It can move vertically or horizontally, any number of spaces.

A rook polynomial is a polynomial whose coefficients give the number of ways rooks can be arranged on a chess board without attacking each other. The coefficient of xk in the polynomial Rm,n(x) is the number of ways you can arrange k rooks on an m by n chessboard such that no two rooks are in the same row or column.

The rook polynomials are related to the Laguerre polynomials by

Rm,n(x) = n! xn Lnm-n(-1/x)

where Lnk(x) is an “associated Laguerre polynomial.” These polynomials satisfy Laguerre’s differential equation

x y” + (n+1-x) y‘ + k y = 0,

an equation that comes up in numerous contexts in physics. In quantum mechanics, these polynomials arise in the solution of the Schrödinger equation for the hydrogen atom.

Related:

Relations between special functions

{ 1 comment }

Mars, magic squares, and music

by John on May 16, 2012

About a year ago I wrote about Jupiter’s magic square. Then yesterday I was listening to the New Sounds podcast that mentioned a magic square associated with Mars. I hadn’t heard of this, so I looked into it and found there were magic squares associated with each of solar system bodies known to antiquity (i.e. Sun, Mercury, Venus, Moon, Mars, Jupiter, and Saturn).

Here is the magic square of Mars:

The podcast featured Secret Pulse by Zack Browning. From the liner notes:

Magic squares provide structure to the music. Structure provides direction to the composer. Direction provides restrictions for the focused inspiration and interpretation of musical materials. The effect of this process? Freedom to compose.

The compositions on this CD use the 5×5 Magic Square of Mars (Secret Pulse), the 9×9 Magic Square of the Moon (Moon Thrust), and the ancient Chinese 3×3 Lo Shu Square found in the Flying Star System of Feng Shui (Hakka Fusion, String Quartet, Flying Tones, and Moon Thrust) as compositional models.  The musical structure created from these magic squares is dramatically articulated by the collision of different musical worlds …

I don’t know how the composer used these magic squares, but you can listen to the title track (Secret Pulse) on the podcast.

Related posts:

Jupiter’s magic square
A magic king’s tour
A magic knight’s tour
A knight’s random walk

{ 0 comments }

Machine Learning in Action

by John on May 15, 2012

A couple months ago I briefly reviewed Machine Learning for Hackers by Drew Conway and John Myles White. Today I’m looking at Machine Learning in Action by Peter Harrington and comparing the two books.

Both books are about the same size and cover many of the same topics. One difference between the two books is choice of programming language: ML for Hackers uses R for its examples, ML in Action uses Python.

ML in Action doesn’t lean heavily on Python libraries. It mostly implements its algorithms from scratch, with a little help from NumPy for linear algebra, but it does not use ML libraries such as scikit-learn. It sometimes uses Matplotlib for plotting and uses Tkinter for building a simple GUI in one chapter. The final chapter introduces Hadoop and Amazon Web Services.

ML for Hackers is a little more of a general introduction to machine learning. ML in Action contains a brief introduction to machine learning in general, but quickly moves on to specific algorithms. ML for Hackers spends a good number of pages discussing data cleaning. ML in Action starts with clean data in order to spend more time on algorithms.

ML in Action takes 8 of the top 10 algorithms in machine learning (as selected by this paper) and organizes around these algorithms. (The two algorithms out of the top 1o that didn’t make it into ML in Action were PageRank, because it has been covered well elsewhere, and EM, because its explanation requires too much mathematics.) The algorithms come first in ML in Action, illustrations second. ML for Hackers puts more emphasis on its examples and reads a bit more like a story. ML in Action reads a little more like a reference book.

http://www.johndcook.com/blog/2008/06/27/wine-beer-and-statistics/#comment-170809

{ 7 comments }

Criteria for a computing setup

by John on May 14, 2012

“My setup” articles have become common. These articles list the hardware and software someone uses, usually with little explanation. The subtext is often the author’s commitment to the Apple brand or to open source, to spending money on the best stuff or to avoid spending money on principle. I don’t find such articles interesting or useful.

Vivek Haldar has written a different kind of  “my setup” article, one that emphasizes the problems he set out to solve and the reasons for the solutions he chose. Here are a couple excerpts describing his goals for preserving his data and his health.

Try to remember the oldest digital artifact that you can still retrieve, and more importantly, decode and view. Can you? How old is it? That should give you some idea of how hard and full of unknowns the problem of long-term preservation is. …

If a significant fraction of your working life is spent working with computers, and you do not yet have even the mildest RSI, you should consider yourself extremely lucky, but not immune. Act like you do have RSI, and change your set up right now to avoid it.

I thought the best part of the article was the criteria, not the solutions. It’s not that I disapprove of his choices, but I appreciate more his explanation of the rationale behind his choices. I don’t expect anybody is going to read the article and say “That’s it! I’m going to copy that setup.” I gather that in at least one detail, his choice of version control system, Vivek wouldn’t even copy his own setup if he were to start over. But people will get ideas to consider in their own circumstances.

Related post: Ford-Chevy arguments in tech

{ 8 comments }

Solutions to knight’s random walk

by John on May 10, 2012

My previous post asked this question:

Start a knight at a corner square of an otherwise-empty chessboard. Move the knight at random by choosing uniformly from the legal knight-moves at each step. What is the mean number of moves until the knight returns to the starting square?

There is a mathematical solution that is a little arcane, but short and exact. You could also approach the problem using simulation, which is more accessible but not exact.

The mathematical solution is to view the problem as a random walk on a graph. The vertices of the graph are the squares of a chess board and the edges connect legal knight moves. The general solution for the time to first return is simply 2N/k where N is the number of edges in the graph, and k is the number of edges meeting at the starting point. Amazingly, the solution hardly depends on the structure of the graph at all. It only requires that the graph is connected. In our case N = 168 and k = 2.

For a full explanation of the math, see this online book, chapter 3, page 9. Start there and work your way backward until you understand the solution.

And now for simulation. The problem says to pick a legal knight’s move at random. The most direct approach would be to find the legal moves at a given point first, then choose one of those at random. The code below achieves the same end with a different approach. It first chooses a random move, and if that move is illegal (i.e. off the board) it throws that move away and tries again.  This will select a legal move with the right probability, though perhaps that’s not obvious. It’s what’s known as an accept-reject random generator.

from random import randint

# Move a knight from (x, y) to a random new position
def new_position(x, y):

    while True:
        dx, dy = 1, 2

        # it takes three bits to determine a random knight move:
        # (1, 2) vs (2, 1), and the sign of each
        r = randint(0, 7)
        if r % 2:
            dx, dy = dy, dx
        if (r >> 1) % 2:
            dx = -dx
        if (r >> 2) % 2:
            dy = -dy

        newx, newy = x + dx, y + dy
        # If the new position is on the board, take it.
        # Otherwise try again.
        if (newx >= 0 and newx < 8 and newy >= 0 and newy < 8):
            return (newx, newy)

# Count the number of steps in one random tour
def random_tour():
    x, y = x0, y0 = 0, 0
    count = 0
    while True:
        x, y = new_position(x, y)
        count += 1
        if x == x0 and y == y0:
            return count

# Average the length of many random tours
sum = 0
num_reps = 100000
for i in xrange(num_reps):
    sum += random_tour()
print sum / float(num_reps)

A theorem is better than a simulation, but a simulation is a lot better than nothing. This problem illustrates how sometimes we think we need to simulate when we don’t. On the other hand, when you have a simulation and a theorem, you have more confidence in your solution because each validates the other.

{ 10 comments }

A knight’s random walk

by John on May 8, 2012

Here’s a puzzle I ran across today:

Start a knight at a corner square of an otherwise-empty chessboard. Move the knight at random by choosing uniformly from the legal knight-moves at each step. What is the mean number of moves until the knight returns to the starting square?

There’s a slick mathematical solution that I’ll give later.

You could also find the answer via simulation: write a program to carry out a knight random walk and count how many steps it takes. Repeat this many times and average your counts.

knight

Related post: A knight’s tour magic square

{ 36 comments }

Web architecture

by John on May 8, 2012

Robert “Uncle Bob” Martin argues that an application’s purpose should determine its architecture; its delivery mechanism should not. If you look at an architectural drawing of a church, for example, you can tell it’s a church. But if you look at an architectural drawing of a web application, the fact that it’s a web application should not be the first thing you notice.

The web is a delivery mechanism. The web is a detail. The web is not particularly important to your application. … The web is a pipe. It is nothing more than a pipe. It is not the central abstraction of your application. It is not the grand, over-arching thing that makes your application the application it is. The web is just a dumb detail. And yet it dominates our code.

From Robert Martin’s keynote at Ruby Midwest 2011. The quote starts about 9 minutes into the presentation.

He goes on to explain how to let the functionality of the application determine its architecture. In the dependency diagram arrows point from the delivery mechanism to the rest of the code, but none point the other way. By following this design you get something easier to test.

Related posts:

Recap of the Robert Martin/Joel Spolksy brouhaha
Why there will always be programmers

{ 16 comments }

Roger Peng asks a good question on his blog this morning: How would you know if someone is great at data analysis? He says that while he has worked with some great data analysts, the nature of their work is that it’s hard to evaluate the work of someone you don’t know personally. And as Josh Grant pointed out, this isn’t unique to data analysts.

I immediately thought of a database administrator I know. Everyone who works with her know she’s great at her job, but I doubt anyone who doesn’t know her has ever said “They must have a great DBA!”

Matthew Crawford argues in Shop Class as Soulcraft that white collar work in general is hard to objectively evaluate and that this explains why offices are so political. Employees are judged on their sensitivity and other nebulous attributes because unlike a welder, for example, they can’t be judged directly on their work. He argues that blue collar people workers greater freedom of speech at work because their work can be objectively evaluated.

Colleagues can identify great data analysts, DBAs, and others whose work isn’t on public display. But this isn’t easy to do through a bureaucratic process, and so technical competence is routinely under-compensated in large organizations. On the other hand, reputation spreads more efficiently outside of organizational channels. This may help explain why highly competent people are often more appreciated by their professional communities than by their employers.

Related post: It doesn’t pay to be the computer guy

{ 7 comments }

An algebra problem from 1798

by John on May 5, 2012

The Lady’s Diary was a popular magazine published in England from 1704 to 1841. It contained mathematical puzzles such as the following, published in 1798.

What two numbers are those whose product, difference of their squares, and the ratio or quotient of their cubes, are all equal to each other?

From Benjamin Wardhaugh’s new book A Wealth of Numbers: An Anthology of 500 Years of Popular Mathematics Writing.

See also my brief review of How to Read Historical Mathematics by the same author.

{ 19 comments }

Beer with a confidence interval

by John on May 4, 2012

Medalla is a Puerto Rican beer. On the side of a can it says

Alcohol by volume over 4%, not more than 6%.

I’d never seen a beer give a confidence interval for its alcohol content. I’d only seen point estimates before. For example, Budweiser announces that its beer contains 5% alcohol by volume. Just 5%, no statistical details.

I wonder why Medalla reports an interval.

  • Is their manufacturing process so variable that they can’t just report an average?
  • Is their manufacturing process no more variable than others, but they disclose their uncertainty?
  • Are they required to report alcohol content the way they do by local law or by the law of countries they export to?

Related posts:

Beer, wine, and statistics
Wine and politics

{ 10 comments }

Visiting Puerto Rico

by John on May 4, 2012

I’ve been in San Juan this week, visiting the University of Puerto Rico. I’ve been here several times before, but here are a few things I noticed about Puerto Rico on this trip.

Coffee: Coffee means espresso here; I haven’t seen it brewed any other way. And it’s cheap. At UPR, a 4 oz pocillo is only $0.70, and at a bakery near my hotel a 6 oz espresso con leche is $1.25.

Gasoline: You can’t pay at the pump with a credit card. You have to go inside to pay for gas. Some gas stations used to let you pay at the pump, but these have gone back to having you pay inside.

Electronics: I’ve been told that you can’t buy electronics from Apple’s web site for delivery in Puerto Rico. You can buy Apple products in stores like Best Buy on the island, but you can’t have them shipped directly here from their web site.

***

Related post: Beer with a confidence interval (Medalla)

{ 2 comments }

Python as a Lisp dialect

by John on May 3, 2012

From Peter Norvig:

Basically, Python can be seen as a dialect of Lisp with “traditional” syntax … Python supports all of Lisp’s essential features except macros, and you don’t miss macros all that much because it does have eval, and operator overloading, and regular expression parsing, so some — but not all — of the use cases for macros are covered.

Source: Python for Lisp Programmers

{ 9 comments }

Reading historical math

by John on May 2, 2012

I recently received review copies of two books by Benjamin Wardhaugh. Here I will discuss How to Read Historical Mathematics. The other book is his anthology of historical popular mathematics which I intend to review later.

Here is the key passage, located near the end of How to Read Historical Mathematics, for identifying the author’s perspective.

But not all historical mathematics is significant. And perhaps there is a second kind of significance, where something can be historically significant without being mathematically significant. Some historians (I’m one of them) delight in investigating mathematical writing that contains little or no important or novel mathematics: popular textbooks, self-instruction manuals, … or old almanacs and popular magazines with mathematical news or puzzles in them. These kinds of writing … are certainly significant for a historian who wants to know about popular experiences of mathematics. But they’re not significant in the sense of containing significant mathematics.

Wardhaugh’s perspective is valuable, though it is not one that I share. My interest in historical math is more on the development of the mathematical ideas rather than their social context. I’m interested, for example, in discovering the concrete problems that motivated mathematics that has become more abstract and formal.

I was hoping for something more along the lines of a mapping from historical definitions and notations to their modern counterparts. This book contains a little of that, but it focuses more on how to read historical mathematics as a historian rather than as a mathematician. However, if you are interested in more of the social angle, the book has many good suggestions (and even exercises) for exploring the larger context of historical mathematical writing.

{ 3 comments }

Big data is easy

by John on May 1, 2012

Big data is easy; big models are hard.

If you just wanted to use simple models with tons of data, that would be easy. You could resample the data, throwing some of it away until you had a quantity of data you could comfortably manage.

But when you have tons of data, you want to take advantage of it and ask questions that simple models cannot answer. (”Big” data is often indirect data.) So the problem isn’t that you have a lot of data, it’s that you’re using models that require a lot of data. And that can be very hard.

I am not saying people should just use simple models. No, people are right to want to take advantage of their data, and often that does require complex models. (See Brad Efron’s explanation why.) But the primary challenge is intellectual, not physical.

Related post: Big data and humility

{ 5 comments }