Julia for Python programmers

One of my clients is writing software in Julia so I’m picking up the language. I looked at Julia briefly when it first came out but haven’t used it for work. My memory of the language was that it was almost a dialect of Python. Now that I’m looking at it a little closer, I can see more differences, though the most basic language syntax is more like Python than any other language I’m familiar with.

Here are a few scattered notes on Julia, especially on how it differs from Python.

  • Array indices in Julia start from 1, like Fortran and R, and unlike any recent language that I know of.
  • Like Python and many other scripting languages, Julia uses # for one-line comments. It also adds #= and =# for multi-line comments, like /* and */ in C.
  • By convention, names of functions that modify their first argument end in !. This is not enforced.
  • Blocks are indented as in Python, but there is no colon at the end of the first line, and there must be an end statement to close the block.
  • Julia uses elseif as in Perl, not elif as in Python [1].
  • Julia uses square brackets to declare a dictionary. Keys and values are separated with =>, as in Perl, rather than with colons, as in Python.
  • Julia, like Python 3, returns 2.5 when given 5/2. Julia has a // division operator, but it returns a rational number rather than an integer.
  • The number 3 + 4i would be written 3 + 4im in Julia and 3 + 4j in Python.
  • Strings are contained in double quotes and characters in single quotes, as in C. Python does not distinguish between characters and strings, and uses single and double quotes interchangeably.
  • Julia uses function to define a function, similar to JavaScript and R, where Python uses def.
  • You can access the last element of an array with end, not with -1 as in Perl and Python.

* * *

[1] Actually, Perl uses elsif, as pointed out in the comments below. I can’t remember when to use else if, elseif, elsif, and elif.

Nicholas Higham on Mathematics in Color

This is reprint of Nick Higham’s post of the same title from the Princeton University Press blog, used with permission.


Color is a fascinating subject. Important early contributions to our understanding of it came from physicists and mathematicians such as Newton, Young, Grassmann, Maxwell, and Helmholtz. Today, the science of color measurement and description is well established and we rely on it in our daily lives, from when we view images on a computer screen to when we order paint, wallpaper, or a car, of a specified color.

For practical purposes color space, as perceived by humans, is three-dimensional, because our retinas have three different types of cones, which have peak sensitivities at wavelengths corresponding roughly to red, green, and blue. It’s therefore possible to use linear algebra in three dimensions to analyze various aspects of color.


A good example of the use of linear algebra is to understand metamerism, which is the phenomenon whereby two objects can appear to have the same color but are actually giving off light having different spectral decompositions. This is something we are usually unaware of, but it is welcome in that color output systems (such as televisions and computer monitors) rely on it.

Mathematically, the response of the cones on the retina to light can be modeled as a matrix-vector product Af, where A is a 3-by-n matrix and f is an n-vector that contains samples of the spectral distribution of the light hitting the retina. The parameter n is a discretization parameter that is typically about 80 in practice. Metamerism corresponds to the fact that Af1 = Af2  is possible for different vectors f1 and f2. This equation is equivalent to saying that Ag = 0 for a nonzero vector gf1 – f2, or, in other words, that a matrix with fewer rows than columns has a nontrivial null space.

Metamerism is not always welcome. If you have ever printed your photographs on an inkjet printer you may have observed that a print that looked fine when viewed indoors under tungsten lighting can have a color cast when viewed in daylight.

LAB Space: Separating Color from Luminosity

In digital imaging the term channel refers to the grayscale image representing the values of the pixels in one of the coordinates, most often R, G, or B (for red, green, and blue) in an RGB image. It is sometimes said that an image has ten channels. The number ten is arrived at by combining coordinates from the representation of an image in three different color spaces. RGB supplies three channels, a space called LAB (pronounced “ell-A-B”) provides another three channels, and the last four channels are from CMYK (cyan, magenta, yellow, black), the color space in which all printing is done.

LAB is a rather esoteric color space that separates luminosity (or lightness, the L coordinate) from color (the A and B coordinates). In recent years photographers have realized that LAB can be very useful for image manipulations, allowing certain things to be done much more easily than in RGB. This usage is an example of a technique used all the time by mathematicians: if we can’t solve a problem in a given form then we transform it into another representation of the problem that we can solve.

As an example of the power of LAB space, consider this image of aeroplanes at Schiphol airport.


Original image.

Suppose that KLM are considering changing their livery from blue to pink. How can the image be edited to illustrate how the new livery would look? “Painting in” the new color over the old using the brush tool in image editing software would be a painstaking task (note the many windows to paint around and the darker blue in the shadow area under the tail). The next image was produced in
just a few seconds.


Image converted to LAB space and A channel flipped.

How was it done? The image was converted from RGB to LAB space (which is a nonlinear transformation) and then the coordinates of the A channel were replaced by their negatives. Why did this work? The A channel represents color on a green–magenta axis (and the B channel on a blue–yellow axis). Apart from the blue fuselage, most pixels have a small A component, so reversing the sign of this component doesn’t make much difference to them. But for the blue, which has a negative A component, this flipping of the A channel adds just enough magenta to make the planes pink.

You may recall from earlier this year the infamous photo of a dress that generated a huge amount of interest on the web because some viewers perceived the dress as being blue and black while others saw it as white and gold. A recent paper What Can We Learn from a Dress with Ambiguous Colors? analyzes both the photo and the original dress using LAB coordinates. One reason for using LAB in this context is its device independence, which contrasts with RGB, for which the coordinates have no universally agreed meaning.

Nicholas J. Higham is the Richardson Professor of Applied Mathematics at The University of Manchester, and editor of The Princeton Companion to Applied Mathematics. His article Color Spaces and Digital Imaging in The Princeton Companion to Applied Mathematics gives an introduction to the mathematics of color and the representation and manipulation of digital images. In particular, it emphasizes the role of linear algebra in modeling color and gives more detail on LAB space.

Higham jacket

Related posts:

Three algorithms for converting to gray scale

FFT and Big Data (quotes from Princeton Companion to Applied Mathematics)

Mixing Haskell and R

It would be hard to think of two programming languages more dissimilar than Haskell and R.

Haskell was designed to enforce programming discipline; R was designed for interactive use. Haskell emphasizes correctness; R emphasizes convenience.  Haskell emphasizes computer efficiency; R emphasizes interactive user efficiency. Haskell was written to be a proving ground for programming language theorists. R was written to be a workbench for statisticians. Very different goals lead to very different languages.

When I first heard of a project to mix Haskell and R, I was a little shocked. Could it even be done? Aside from the external differences listed above, the differences in language internals are vast. I’m very impressed that the folks at Tweag I/O were able to pull this off. Their HaskellR project lets you call R from Haskell and vice versa. (It’s primarily for Haskell calling R, though you can call Haskell functions from your R code: Haskell calling R calling Haskell. It kinda hurts your brain at first.) Because the languages are so different, some things that are hard in one are easy in the other.

I used HaskellR while it was under initial development. Our project was written in Haskell, but we wanted to access R libraries. There were a few difficulties along the way, as with any project, but these were resolved and eventually it just worked.

The Fast Fourier Transform (FFT) and big data

The most direct way to compute a Fourier transform numerically takes O(n2) operations. The Fast Fourier Transform (FFT) can compute the same result in O(n log n) operations. If n is large, this can be a huge improvement.

James Cooley and John Tukey (re)discovered the FFT in 1965. It was thought to be an original discovery at the time. Only later did someone find a sketch of the algorithm in the papers of Gauss.

Daniel Rockmore wrote the article on the Fast Fourier Transform in The Princeton Companion to Applied Mathematics. He says

In this new world of 1960s “big data,” a clever reduction in computational complexity (a term not yet widely in use) could make a tremendous difference.

Rockmore adds a very interesting footnote to the sentence above:

Many years later Cooley told me that he believed that the Fast Fourier transform could be thought of as one of the inspirations for asymptotic algorithmic analysis and the study of computational complexity, as previous to the publication of his paper with Tukey very few people had considered data sets large enough to suggest the utility of an asymptotic analysis.

Related posts:

Generalization of Fibonacci ratios

Each Fibonacci number is the sum of its two predecessors. My previous post looked at generalizing this to the so-called Tribonacci numbers, each being the sum of its three predecessors. One could keep going, defining the Tetrabonacci numbers and in general the n-Fibonacci numbers for any n at least 2.

For the definition to be complete, you have to specify the first n of the n-Fibonacci numbers. However, these starting values hardly matter for our purposes. We want to look at the limiting ratio of consecutive n-Fibonacci numbers, and this doesn’t depend on the initial conditions. (If you were determined, you could find starting values where this isn’t true. It’s enough to pick integer initial values, at least one of which is not zero.)

As shown in the previous post, the ratio is the largest eigenvalue of an n by n matrix with 1’s on the first row and 1’s immediately below the main diagonal. The characteristic polynomial of such a matrix is

λn – λn-1 – λn-2 – … -1

and so we look for the largest zero of this polynomial. We can sum the terms with negative coefficients as a geometric series and show that the eigenvalues satisfy

λn – 1/(2 – λ) = 0.

So the limiting ratio of consecutive n-Fibonacci numbers is the largest root of the above equation. You could verify that when n = 2, we get the golden ratio φ as we should, and when n = 3 we get around 1.8393 as in the previous post.

As n gets large, the limiting ratio approaches 2. You can see this by taking the log of the previous equation.

n = -log(2 – λ)/log(λ).

As n goes to infinity, λ must approach 2 so that the right side of the equation also goes to infinity.

Power method and Fibonacci numbers

Take an n × n matrix A and a vector x of length n. Now multiply x by A, then multiply the result by A, over and over again. The sequence of vectors generated by this process will converge to an eigenvector of A. (An eigenvector is a vector whose direction is unchanged when multiplied by A. Multiplying by A may stretch or shrink the vector, but it doesn’t rotate it at all. The amount of stretching is call the corresponding eigenvalue.)

The eigenvector produced by this process is the eigenvector corresponding to the largest eigenvalue of A, largest in absolute value. This assumes A has a unique eigenvector associated with its largest eigenvalue. It also assumes you’re not spectacularly unlucky in your choice of vector to start with.

Assume your starting vector x has some component in the direction of the v, the eigenvector corresponding to the largest eigenvalue. (The vectors that don’t have such a component lie in an n-1 dimensional subspace, which would has measure zero. So if you pick a starting vector at random, with probability 1 it will have some component in the direction we’re after. That’s what I meant when I said you can’t start with a spectacularly unlucky initial choice.) Each time you multiply by A, the component in the direction of v gets stretched more than the components orthogonal to v. After enough iterations, the component in the direction of v dominates the other components.

What does this have to do with Fibonacci numbers? The next number in the Fibonacci sequence is the sum of the previous two. In matrix form this says

\left[\begin{array}{c} x_{n+1} \\\ x_{n} \end{array}\right] = \left[\begin{array}{cc} 1 & 1 \\\ 1 & 0 \end{array}\right] \left[\begin{array}{c} x_{n} \\\ x_{n-1} \end{array}\right]

The ratio of consecutive Fibonacci numbers converges to the golden ratio φ because φ is the largest eigenvalue of the matrix above.

The first two Fibonacci numbers are 1 and 1, so the Fibonacci sequence corresponds to repeatedly multiplying by the matrix above, starting with the initial vector x = [1 1]T. But you could start with any other vector and the ratio of consecutive terms would converge to the golden ratio, provided you don’t start with a vector orthogonal to [1 φ]T. Starting with any pair of integers, unless both are zero, is enough to avoid this condition, since φ is irrational.

We could generalize this approach to look at other sequences defined by a recurrence relation. For example, we could look at the “Tribonacci” numbers. The Tribonacci sequence starts out 1, 1, 2, and then each successive term is the sum of the three previous terms. We can find the limiting ratio of Tribonacci numbers by finding the largest eigenvalue of the matrix below.

\left[\begin{array}{ccc} 1 & 1 & 1 \\\ 1 & 0 & 0 \\\ 0 & 1 & 0 \end{array}\right]

This eigenvalue is the largest root of x3x2x – 1 = 0, which is about 1.8393. As before, the starting values hardly matter. Start with any three integers, at least one of them non-zero, and define each successive term to be the sum of the previous three terms. The ratio of consecutive terms in this series will converge to 1.8393.

By the way, you could compute the limiting ratio of Tribonacci numbers with the following bit of Python code:

      from scipy import matrix, linalg
      M = matrix([[1, 1, 1], [1, 0, 0], [0, 1, 0]])
      print( linalg.eig(M) )

Update: The next post generalizes this one to n-Fibonacci numbers.

Secret equation

I got a call this afternoon from someone who records audio books for the blind. He wanted to know the name of a symbol he didn’t recognize. He then asked me if the equation was real.

Here’s the equation in context, from the book Michael Vey 4: Hunt for Jade Dragon. The context is as follows.

Suddenly math problems she hadn’t understood made sense. Except now they weren’t just numbers and equations, they were patterns and colors. Calculus, geometry, and trigonometry were easy to understand, simple as a game, like shooting balls at a basketball hoop that was a hundred feet wide. Then a specific sequence of numbers, letters, and symbols started running through her mind.

s(t; t_y) = k \frac{Q}{r^2} \hat{r} \int_{R^2} m(x, y) e^{-2\pi i \daleth\left(\frac{G_x xt + \daleth G_y yt_y}{2\pi}\right)} \,dx\,dy

She almost said the equation when a powerful thought came over her not to speak it out loud—that she must not ever divulge it. She new that what she was receiving was something of great importance, even if she had no idea what it meant.

I believe the symbol in question is the fourth letter of the Hebrew alphabet, ℸ (daleth).

Is this a real equation? The overall form of it looks like an integral transform. However, the two instances of ℸ in equation look suspicious.

One reason is that I’ve never seen ℸ used in math, though I read somewhere that Cantor used it for the cardinality of some set. Even so, Cantor’s use wouldn’t make sense inside an integral.

Also, the two instances of ℸ are used differently. The first is a function (or else the factors of 2 π could be cancelled out) and the second one apparently is not. Finally, the equation is symmetric in x and y if you remove the two daleths. So I suspect this was a real equation with the daleths added in for extra mystery.

Juggling projects

Yesterday on Twitter I said I was thinking about writing the names of each of my clients and leads on balls so I could literally juggle them. I was only half joking.

I didn’t write my clients and leads on balls, but I did write them on index cards. And it helped a great deal. It’s easier to think about projects when you have physical representations you can easily move around. Moving lines up and down in an org-mode file, or even moving boxes around in 2D in OneNote, doesn’t work as well.

Electronic files are great for storing, editing, and querying ideas. But they’re not the best medium for generating ideas. See Create offline, analyze online. See also Austin Kleon’s idea of having two separate desks, one digital and one analog.

Casting out sevens

A while back I wrote about a method to test whether a number is divisible by seven. I recently ran across another method for testing divisibility by 7 in Martin Gardner’s book The Unexpected Hanging and Other Mathematical Diversions. The method doesn’t save too much effort compared to simply dividing by 7, but it’s interesting. It looks a little mysterious at first, though the explanation of why it works is very simple.

Suppose you want to find whether a number n is divisible by 7. Start with the last digit of n and write a 1 under the last digit, and a 3 under the next to last digit. The digits under the digits of n cycle through 1, 3, 2, 6, 4, 5, repeatedly until there is something under each digit in n. Now multiply each digit of n by the digit under it and add the results.

For example, suppose n = 1394. Write the digits 1, 3, 2, and 6 underneath, from right to left:


The sum we need to compute is 1*6 + 3*2 + 9*3 + 4*1 = 6 + 6 + 27 + 4 = 43.

This sum, 43 in our example, has the same remainder when divided by 7 as the original number, 1394 in our example. Since 43 is not divisible by 7, neither is 1394. Not only that, the result of our method has the same remainder by 7 as the number we started with. In our example, 43 leaves a remainder of 1 by 7, so 1394 also leaves a remainder of 1.

You could apply this method repeatedly, though in this case 43 is small enough that it’s easy enough to see that it leaves a remainder of 1.

Suppose you started with a 1000-digit number n. Each digit is no more than 9, and is being multiplied by a number no more than 6. So the sum would be less than 54000. So you’ve gone from a 1000-digit number to at most a 5-digit number in one step. One or two more steps should be enough for the remainder to be obvious.

Why does this method work? The key is that the multipliers 1, 3, 2, 6, 4, 5 are the remainders when the powers of 10 are divided by 7. Since 106 has a remainder of 1 when divided by 7, the numbers 106a+b and 10b have the same remainder by 7, and that’s why the multipliers have period 6.

All the trick is doing is expanding the base 10 representation of a number and adding up the remainders when each term is divided by seven. In our example, 1394 = 1000 + 3*100 + 9*10 + 4, and mod 7 this reduces to 1*6 + 3*2 + 9*3 + 4*1, the exact calculation above.

The trick presented here is analogous to casting out nines. But since every power of 10 leaves a remainder of 1 when divided by 9, all the multipliers in casting out nines are 1.

You could follow the pattern of this method to create a divisibility rule for any other divisor, say 13 for example, by letting the multipliers be the remainders when powers of 10 are divided by 13.

Related post: Divisibility rules in hex

Computing square triangular numbers

The previous post stated a formula for f(n), the nth square triangular number (i.e. the nth triangular number that is also a square number):

((17 + 12√2)n + (17 – 12√2)n – 2)/32

Now 17 – 12√2 is 0.029… and so the term (17 – 12√2)n approaches zero very quickly as n increases. So the f(n) is very nearly

((17 + 12√2)n – 2)/32

The error in the approximation is less than 0.001 for all n, so the approximation is exact when rounded to the nearest integer. So the nth square triangular number is

⌊((17 + 12√2)n +14)/32⌋

where ⌊x⌋ is the greatest integer less than x.

Here’s how you might implement this in Python:

    from math import sqrt, floor

    def f(n):
        x = 17 + 12*sqrt(2)
        return floor((x**n + 14)/32.0)

Unfortunately this formula isn’t that useful in ordinary floating point computation. When n = 11 or larger, the result is needs more bits than are available in the significand of a floating point number. The result is accurate to around 15 digits, but the result is longer than 15 digits.

The result can also be computed with a recurrence relation:

f(n) = 34 f(n-1) – f(n-2) + 2

for n > 2. (Or n > 1 if you define f(0) to be 0, a reasonable thing to do).

This only requires integer arithmetic, so it’s only limitation is the size of the integers you can compute with. Since Python has arbitrarily large integers, the following works for very large integers.

    def f(n):
        if n < 2:
            return n
        return f(n-1)*34 - f(n-2) + 2

This is a direct implementation, though it’s inefficient because of the redundant function evaluations. It could be made more efficient by, for example, using memoization.

When is a triangle a square?

Of course a triangle cannot be a square, but a triangular number can be a square number.

A triangular number is the sum of the first so many positive integers. For example, 10 is a triangular number because it equals 1+2+3+4. These numbers are called triangle numbers because you can form a triangle by having a row of one coin, two coin, three coins, etc. forming a triangle.

The smallest number that is both triangular and square is 1. The next smallest is 36. There are infinitely many numbers that are both triangular and square, and there’s even a formula for the nth number that is both a triangle and a square:

((17 + 12√2)n + (17 – 12√2)n – 2)/32

Source: American Mathematical Monthly, February 1962, page 169.

For more on triangle numbers and their generalizations, see Twelve Days of Christmas and Tetrahedral Numbers.

There is also a way to compute the square triangular numbers recursively discussed in the next post.

Basics equations of beam deflection

In the preface to his book Strength of Materials, J. P. Den Hartog says

After the alphabet and the tables of multiplication, nothing has proved quite so useful in my professional life as these six little expressions.

The six expressions he refers to are nicknamed the vergeet-me-nietjes in Dutch, which translates to forget-me-nots in English. They are also known as Dr. Myosotis’s equations because myosotis is the genus for forget-me-nots. The equations give the angular and linear deflections of a cantilever beam.

Imagine a beam anchored at one end and free on the other, subject to one of the kinds of load: a bending moment M at the opposite end, a point force P a the opposite end, or a force w distributed over the length of the beam. The equations below give the rotation (angular deflection) and displacement (linear deflection) of the free end of the beam.

Rotation Displacement
Bending moment  ML/EI  ML2/2EI
Point load  PL2/2EI  PL3/3EI
Distributed load  wL3/6EI  wL4/8EI

Here E is the modulus of elasticity, L is the length of the beam, and I is the area moment of inertia.

Deserted island books

You’ve probably heard someone ask someone else what books they would take to a deserted island. It’s usually implied that you’re bringing books for pleasure, not books that would help you survive on the island or leave it.

People often answer the question with a list of their favorite books, perhaps skewed in favor of long books. But I don’t think you should take books that have been your favorites in the past. You should take what you think would be your favorite books on a deserted island. I expect my tastes there would be very different than they are in my current suburban life.

I think of books that I could only read on a desert island, books that I’ve enjoyed in small portions but ordinarily would not have the patience to read cover-to-cover. For example, I’ve found portions of Donald Knuth’s series The Art of Computer Programming enjoyable and useful, but I can’t say I’ve read it all. Perhaps on a deserted island with little to do and few distractions I’d enjoy going through it carefully line by line, attempting all the exercises. I might even learn MMIX, something I can’t imagine doing under ordinary circumstances.

Along those lines, I might want to take some works by Thomas Aquinas such as his Summa Theologica or his commentaries on Aristotle. The little I’ve read of Aquinas has been profound, and more approachable than I expected. Still, I find it hard to read much of him. Alone on an island I might take the time to read him carefully.

For math, I might want to take Spivak’s differential geometry series, depending on how long I expect to be on this island. If I’m going to be there too long and I’m limited on books, I might want to take something else that’s more dense and less familiar.

For science, I’d take Gravitation by Misner, Thorne, and Wheeler. I’ve intended to read that book for many years and have started a couple times. In college I couldn’t afford this price; now I can’t afford the time.

For fiction, I’d take Patrick O’Brian’s Aubrey/Maturin series because I haven’t read it, I’ve heard it is very well written, and it’s long.



Attributing changes to numerator and denominator

This afternoon I helped someone debug a financial spreadsheet. One of the reasons spreadsheets can be so frustrating to work with is that assumptions are hard to see. You have to click on cells one at a time to find formulas, then decode cell coordinates into their meanings.

The root problem turned out to be an assumption that sounds reasonable. You’re making two changes, one to a numerator and one to a denominator. The total change equals the sum of the results of each change separately. Except that’s not so.

At this point, a mathematician would say “Of course you can’t split the effects like that. It’s nonlinear.” But it’s worth pursuing a little further. For one thing, it doesn’t help a general audience to just say “it’s nonlinear.” For another, it’s worth seeing when it is appropriate, at least approximately, to attribute the effects this way.

You start with a numerator and denominator, N/D, then change N to N+n and change D to D+d. The total change is then (N+n)/(D+d) – N/D.

The result from only the change in the numerator is n/D. The result from only the change in denominator is N/(D+d) – N/D.

The difference between the total change and the sum of the two partial changes is


The assumption that you can take the total change and attribute it to each change separately is wrong in general. But it is correct if n or d is zero, and it is approximately correct with nd is small. This can make the bug harder to find. It could also be useful when nd is indeed small and you don’t need to be exact.

Also, if all the terms are positive, the discrepancy is negative, i.e. the total change is less than the sum of the partial changes. Said another way, allocating the change to each cause separately over-estimates the total change.