# Dump a pickle file to a readable text file

I got a data file from a client recently in “pickle” format. I happen to know that pickle is a binary format for serializing Python objects, but trying to open a pickle file could be a puzzle if you didn’t know this.

## Be careful

There are a couple problems with using pickle files for data transfer. First of all, it’s a security risk because an attacker could create a malformed pickle file that would cause your system to run arbitrary code. In the Python Cookbook, the authors David Beazley and Brian K. Jones warn

It’s essential that pickle only be used internally with interpreters that have some ability to authenticate one another.

The second problem is that the format could change. Again quoting the Cookbook,

Because of its Python-specific nature and attachment to source code, you probably shouldn’t use pickle as a format for long-term storage. For example, if the source code changes, all of your stored data might break and become unreadable.

Suppose someone gives you a pickle file and you’re willing to take your chances and open it. It’s from a trusted source, and it was created recently enough that the format probably hasn’t changed. How do you open it?

## Unpickling

The following code will open the file data.pickle and read it into an object obj.

    import pickle


If the object in the pickle file is very  small, you could simply print obj. But if the object is at all large, you probably want to save it to a file rather than dumping it at the command line, and you also want to “pretty” print it than simply printing it.

## Pretty printing

The following code will dump a nicely-formatted version of our pickled object to a text file out.txt.

    import pickle
import pprint

with open("out.txt", "a") as f:
pprint.pprint(obj, stream=f)


In my case, the client’s file contained a dictionary of lists of dictionaries. It printed as one incomprehensible line, but it pretty printed as 40,000 readable lines.

## Prettier printing

Simon Brunning left a comment suggesting that the json module output is even easier to read.

    import json

with open("out.txt", "a") as f:
json.dump(obj, f, indent=2)


And he’s right, at least in my case. The indentation json.dump produces is more what I’d expect, more like what you’d see if you were writing the structure in well-formatted source code.

# Keeping data and code together with org-mode

With org-mode you can keep data, code, and documentation in one file.

Suppose you have an org-mode file containing the following table.

    #+NAME: mydata
| Drug | Patients |
|------+----------|
|    X |      232 |
|    Y |      351 |
|    Z |      117 |


Note that there cannot be a blank line between the NAME header and the beginning of the table.

You can bring this table into Python simply by declaring it to be a variable in the header of a Python code block.

    #+begin_src python :var tbl=mydata :results output
print(tbl)
#+end_src


When you evaluate this block, you see that the table is imported as a list of lists.

    [['X', 232], ['Y', 351], ['Z', 117]]

Note that the column headings were not imported into Python. Now suppose you would like to retain the headers, and use them as column names in a pandas data frame.

    #+begin_src python :var tbl=mydata :colnames no :results output
import pandas as pd
df = pd.DataFrame(tbl[1:], columns=tbl[0])
print(df, "\n")
print(df["Patients"].mean())
#+end_src


When evaluated, this block produces the following.

      Drug  Patients
0    X       232
1    Y       351
2    Z       117

233.33333333333334


Note that in order to import the column names, we told org-mode that there are no column names! We did this with the header option

    :colnames no

This seems backward, but it makes sense. It says do bring in the first row of the table, even though it appears to be a column header that isn’t imported by default. But then we tell pandas that we want to make a data frame out of all but the first row (i.e. tbl[1:]) and we want to use the first row (i.e. tbl[0]) as the column names.

A possible disadvantage to keeping data and code together is that the data could be large. But since org files are naturally in outline mode, you could collapse the part of the outline containing the data so that you don’t have to look at it unless you need to.

# Inline computed content in org-mode

The previous post discussed how to use org-mode as a notebook. You can have blocks of code and blocks of results, analogous to cells in a Jupyter notebook. The code and the results export as obvious blocks when you export the org file to another format, such as LaTeX or HTML. And that’s fine for a notebook.

Now suppose you want to do something more subtle. You want to splice in the result of a computed value without being obvious about it. Maybe you want to compute a value rather than directly enter it so that the document will remain consistent. Maybe you have a template and you want to set the parameters of the template at the top of the file.

Web development languages like PHP do this well. You can write a PHP file that is essentially an HTML file with pieces of code spliced in. You do this my inserting

    <?php … ?>

into the HTML code, and when the page is rendered the code between the <?php and ?> tags is replaced with the result of executing the code. We’d like to do something analogous in org-mode with org-babel. (org-babel is the subsystem of org-mode that interacts with code.)

Here’s an org-mode example that sets length and width as variables at the top of a file and multiplies them later in the body of the file to get area.

We define our variables as follows. The block is marked :exports none because we do not want to display the code or the values. We just want the code to run when we export the file.

    #+begin_src python :session :exports none
length, width = 7, 13
#+end_src


The following almost does what we want [1].

    Area equals src_python[:session]{length*width}.

This renders as

Area equals 91.

if we export our org file to HTML The number 91 is typeset differently than the words before it. This would be more obvious if the computed value were a string rather than a number.

Org-mode is wrapping <code> tags around the computed result. If we were to export the org file to LaTeX it would wrap the result with \texttt{}. This is because, by default, the output of a computation is displayed as computer output, which is conventionally set in a monospace font like Courier. That’s fine in a technical document when we want to make it obvious that a calculation is a calculation, but typically not in a business context. You wouldn’t want, for example, to generate a letter that starts

Dear Michael,

with Michael’s name set in Courier, announcing that this is a form letter.

The fix is to add :results raw to the header session, the part in square brackets between src_python and the Python code.

    Area equals src_python[:session :results raw]{length*width}.

Now the calculation result is reported “raw”, i.e. without any special markup surrounding it.

***

[1] In this example I’m using Python, and so I used the function src_python. org-babel supports dozens of languages, and each has its src_<language> counterpart.

# Org-mode as a lightweight notebook

You can think of org-mode as simply a kind of markdown, a plain text file that can be exported to fancier formats such as HTML or PDF. It’s a lot more than that, but that’s a reasonable place to start.

Org-mode also integrates with source code. You can embed code in your file and have the code and/or the result of running the code appear when you export the file to another format.

## Org-mode as notebook

You can use org-mode as a notebook, something like a Jupyter notebook, but much simpler. An org file is a plain text file, and you can execute embedded code right there in your editor. You don’t need a browser, and there’s no hidden state.

Here’s an example of mixing markup and code:

    The volume of an n-sphere of radius r is

$$\frac{\pi^{\frac{n}{2}}}{\Gamma\left(\frac{n}{2} + 1\right)}r^n.$$

#+begin_src python :session
from scipy import pi
from scipy.special import gamma

def vol(r, n):
return pi**(n/2)*r**n/gamma(n/2 + 1)

vol(1, 5)
#+end_src


If you were to export the file to PDF, the equation for the volume of a sphere would be compiled into a image using LaTeX.

To run the code [1], put your cursor somewhere in the code block and type C-c C-c. When you do, the following lines will appear below your code.

    #+RESULTS:
: 5.263789013914324


If you think of your org-mode file as primary, and you’re just inserting some code as a kind of scratch area, an advantage of org-mode is that you never leave your editor.

## Jupyter notebooks

Now let’s compare that to a Jupyter notebook. Jupyter organizes everything by cells, and a cell can contain markup or code. So you could create a markup cell and enter the exact same introductory text [2].

    The volume of an n-sphere of radius r is

$$\frac{\pi^{\frac{n}{2}}}{\Gamma\left(\frac{n}{2} + 1\right)}r^n$$.


When you “run” the cell, the LaTeX is processed and you see the typeset expression rather than its LaTeX source. You can click on the cell to see the LaTeX code again.

Then you would enter the Python code in another cell. When you run the cell you see the result, much as in org-mode. And you could export your notebook to PDF as with org-mode.

## File diff

Now suppose we make a couple small changes. We want the n and r in the comment section set in math italic, and we’d like to find the volume of a 5-sphere of radius 2 rather than radius 1. We do this, in Jupyter and in org-mode [3], by putting dollar signs around the “n” and the “r”, and we change vol(1, 5) to vol(2, 5).

Let’s run diff on the two versions of the org-mode file and on the two versions of the Jupyter notebook.

The differences in the org files are easy to spot:

    1c1
< The volume of an n-sphere of radius r is
---
> The volume of an $$n$$-sphere of radius $$r$$ is
12c12
< vol(1, 5)
---
> vol(2, 5)
16c16,17
< : 5.263789013914324
---
> : 168.44124844525837


However, the differences in the Jupyter files are more complicated:

    5c5
<    "id": "2a1b0bc4",
---
>    "id": "a0a89fcf",
8c8
<     "The volume of an n-sphere of radius r is \n",
---
>     "The volume of an $n$-sphere of radius $r$ is \n",
15,16c15,16
<    "execution_count": 1,
<    "id": "888660a2",
---
>    "execution_count": 2,
22c22
<        "5.263789013914324"
---
>        "168.44124844525837"
25c25
<      "execution_count": 1,
---
>      "execution_count": 2,
37c37
<     "vol(1, 5)"
---
>     "vol(2, 5)"
43c43
<    "id": "f8d4d1b0",


There’s a lot of extra stuff in a Jupyter notebook. This is a trivial notebook, and more complex notebooks have more extra stuff. An apparently small change to the notebook can cause a large change in the underlying notebook file. This makes it difficult to track changes in a Jupyter notebook in a version control system.

## Related posts

[1] Before this will work, you have to tell Emacs that Python is one of the languages you want to run inside org-mode. I have the following line in my init file to tell Emacs that I want to be able to run Python, DITAA, R, and Perl.

    (org-babel-do-load-languages 'org-babel-load-languages '((python . t) (ditaa . t) (R . t) (perl . t)))

[2] Org-mode will let you use $ and $ to bracket LaTeX code for a displayed equation, and it will also let you use . Jupyter only supports the latter.

[3] In org-mode, putting dollar signs around variables sometimes works and sometimes doesn’t. And in this example, it works for the “r” but not for the “n”. This is very annoying, but it can be fixed by using $$ and $$ to enter and leave math mode rather than use a dollar sign for both.

# Complex AGM

The arithmetic-geometric mean (AGM) of two non-negative real numbers a and b is defined as the limit of the iteration starting with a0 = a and b0 = b and

an+1 = ½ (an + bn)
bn+1 = √(an bn)

for n > 0. This sequence converges very quickly and is useful in numerical algorithms. The limit can be expressed in terms of an elliptic function, and that elliptic function can then be related to other functions. See, for example, this post for how the AGM can be used to compute logarithms to arbitrary precision.

Since the AGM is useful in computing special functions, and we’re often interested in evaluating special functions at complex values, it’s natural to want to evaluate the AGM for complex numbers.

But we immediately run into a difficulty: which square root do we pick to find the new b?

For a non-negative real number x, √x is defined to be the non-negative real number y such that y² = x. But for more general values of x we have to choose a branch of the square root function. Which branch should we use in the AGM?

Often when we need to take the square root of complex numbers we can use the “principal branch,” the branch that gives positive values for positive real inputs and extends to the rest of the complex plane with the negative axis removed. If you compute the square root of a complex number in software, as we’ll do below, this is probably the value you’ll get by default.

But it turns out we cannot simply pick the principal branch. Gauss discovered two centuries ago that the right branch to take could vary at each step [0]. What does that even mean? How do we know we’ve made the “right” choice?

For one thing, we want our iteration to converge. And for another, we’d like it to converge to something non-zero if we start with non-zero inputs [1]. The right choice will guarantee this [2].

So what is this right choice? We provisionally update a and b as above, using either square root for b, and keep the value of b if

|ab| ≤ |a + b|

or

|ab| = |a + b|

and the imaginary part of b/a is positive. In other words, chose the possible value of b that’s closer to a, and use the imaginary part of b/a as a tie breaker.

Here is the AGM implemented in Python, using the right square root at each step.

    def right(a, b):
d = abs(a + b) - abs(a - b)
return d > 0 or d == 0 and (b/a).imag > 0

def agm(a, b):
while abs(a-b) > 1e-14:
a1 = 0.5*(a+b)
b1 = np.sqrt(a*b)
if not right(a1, b1):
b1 = -b1
a, b = a1, b1
return a1


The test d == 0 should make you concerned. Testing floating point numbers for exact equality with zero is seldom the right thing to do, but we’ll gloss over numerical issues for this post.

Here’s an example, giving the a values of the iteration starting with 7 + 30i and 20 + 22i. The iteration converges in four steps.

    13.500000000000000 + 26.000000000000000j
13.784944719026262 + 26.397404494892115j
13.783557503026870 + 26.395953326186888j
13.783557473769877 + 26.395953309190112j


In this example, the right root is the principal root every time. To find an example where the other root is chosen, I generated random starting points until I found one that took an alternate root.

Here are the values of b starting with -1.654 – 1.178i and 2.244 – 1.956i. An asterisk indicates that the principal root was not the right root.

     0.2328790598285062 - 0.728412421988127j
-0.6829999999999998 - 0.589000000000000j
-0.2254063569280081 - 0.799311791579126j *
-0.2250604700857468 - 0.658706210994063j
-0.2261796153095098 - 0.725905503054624j *
-0.2252334135068774 - 0.729009001286595j
-0.2257078598391289 - 0.727456168426402j *
-0.2257065144081936 - 0.727457252170610j
-0.2257071871240875 - 0.727456710298747j *
-0.2257071871236612 - 0.727456710298506j
-0.2257071871238743 - 0.727456710298626j *
-0.2257071871238744 - 0.727456710298626j


## More AGM posts

[0] David Cox. The Arithmetic-Geometric Mean of Gauss. L’Énseignement Mathématique, 30 (1984), p. 275–330.

[1] If a = –b we get zero immediately. But if a and b are both not zero, and a does not equal –b, then taking the right square root at each iteration gives us what we want.

[2] Interestingly, you could make a finite number of wrong choices and still end up with something that might converge, albeit to a different value. This gives you a different branch of the AGM.

# Letter-like Unicode symbols

Unicode provides a way to distinguish symbols that look alike but have different meanings.

We can illustrate this with the following Python code.

    import unicodedata as u

for pair in [('K', 'K'), ('Ω', 'Ω'), ('ℵ', 'א')]:
for c in pair:
print(format(ord(c), '4X'), u.bidirectional(c), u.name(c))


This produces

      4B L LATIN CAPITAL LETTER K
212A L KELVIN SIGN
3A9 L GREEK CAPITAL LETTER OMEGA
2126 L OHM SIGN
2135 L ALEF SYMBOL
5D0 R HEBREW LETTER ALEF


Even though K and K look similar, the former is a Roman letter and the latter is a symbol for temperature in Kelvin. Similarly, Ω and Ω are semantically different even though they look alike.

Or rather, they probably look similar. A font may or may not use different glyphs for different code points. The editor I’m using to write this post uses a font that makes no difference between ohm and omega. The letter K and the Kelvin symbol are slightly different if I look very closely. The two alefs appear substantially different.

Note that the mathematical symbol alef is a left-to-right character and the Hebrew latter alef is a right-to-left character. The former could be useful to tell a word processor “This isn’t really a Hebrew letter; it’s a symbol that looks like a Hebrew letter. Don’t change the direction of my text or switch any language-sensitive features like spell checking.”

These letter-like symbols can be used to provide semantic information, but they can also be used to deceive. For example, a malicious website could change a K in a URL to a Kelvin sign.

# General tetrahedral numbers

1, 1, 1, 1, 1, …

Taking the partial sums of this sequence gives consecutive numbers. That is, the nth number of the new series is the sum of the first n terms of the previous series.

1, 2, 3, 4, 5, …

If we take partial sums again, we get the triangular numbers.

1, 3, 6, 10, 15, …

They’re called triangle numbers because if we arrange coins in a triangle, with n on a side, the total number of coins is the nth triangular number.

If we repeat the process again, we get the tetrahedral numbers.

1, 4, 10, 20, 35, …

The nth tetrahedral number is the number of cannonballs is a tetrahedral stack, with each layer of the stack being a triangle.

We can produce the examples above in Python using the cumsum (cumulative sum) function on NumPy arrays.

    >>> import numpy as np
>>> x = np.ones(5, int)
>>> x
array([1, 1, 1, 1, 1])
>>> x.cumsum()
array([1, 2, 3, 4, 5])
>>> x.cumsum().cumsum()
array([ 1, 3, 6, 10, 15])
>>> x.cumsum().cumsum().cumsum()
array([ 1, 4, 10, 20, 35])


We could continue this further, taking tetrahedral numbers of degree k. The sequences above could be called the tetrahedral numbers of degree k for k = 0, 1, 2, and 3.

Here’s Python code to find the nth tetrahedral number of a given dimension.

    def tetrahedral(n, dim):
x = np.ones(n, int)
for _ in range(dim):
x = x.cumsum()
return x[-1]


It turns out that

and so we could rewrite the code above more simply as

    from math import comb

def tetrahedral(n, dim):
return comb(n + dim - 1, dim)


# Sampling with replacement until you’ve seen everything

Suppose you have a standard deck of 52 cards. You pull out a card, put it back in the deck, shuffle, and pull out another card. How long would you expect to do this until you’ve seen every card?

Here’s a variation on the same problem. Suppose you’re a park ranger keeping data on tagged wolves in a national park. Each day you catch a wolf at random, collect some data, and release it. If there are 100 wolves in the park, how long until you’ve seen every wolf at least once?

We’ll solve this problem via simulation, then exactly.

## Simulation

The Python code to solve the problem via simulation is straight forward. Here we’ll simulate our ranger catching and releasing one of 100 wolves. We’ll repeat our simulation five times to get an idea how much the results vary.

    import numpy as np
from numpy.random import default_rng

rng = default_rng()
num_runs = 5
N = 100 # number of unique items

for run in range(num_runs):
tally = np.zeros(N,dtype=int)
trials = 0
while min(tally) == 0:
tally[rng.integers(N)] += 1
trials += 1
print(trials)


When I ran this, I got 846, 842, 676, 398, and 420.

## Exact solution

Suppose we have a desk of N cards, and we sample the deck with replacement M times. We can select the first card N ways. Ditto for the second, third, and all the rest of the cards. so there are NM possible sequences of card draws.

How many of these sequences include each card at least once? To answer the question, we look at Richard Stanley’s twelvefold way. We’re looking for the number of ways to allocate M balls into N urns such that each urn contains at least one ball. The answer is

N! S(M, N)

where S(M, N) is the Stirling number of the second kind with parameters M and N [1].

So the probability of having seen each card at least once after selecting M cards randomly with replacement is

N! S(M, N) / NM.

## Computing Stirling numbers

We’ve reduced our problem to the problem of computing Stirling numbers (of the second kind), so how do we do that?

We can compute Stirling numbers in terms of binomial coefficients as follows.

Now we have a practical problem if we want to use this formula for larger values of n and k. If we work with floating point numbers, we’re likely to run into overflow. This is the perennial with probability calculations. You may need to compute a moderate-sized number as the ratio of two enormous numbers. Even though the final result is between 0 and 1, the intermediate results may be too large to store in a floating point number. And even if we don’t overflow, we may lose precision because we’re working with an alternating sum.

The usual way to deal with overflow is to work on a log scale. But that won’t work for Stirling numbers because we’re adding terms together, not multiplying them.

One way to solve our problem—not the most efficient way but a way that will work—is to work with integers. This is easy in Python because integers by default can be arbitrarily large.

There are two ways to compute binomial coefficients in Python: scipy.special.binom and math.comb. The former uses floating point operations and the latter uses integers, so we need to use the latter.

We can compute the numerator of our probability with

    from math import comb
def k_factorial_stirling(n, k):
return sum((-1)**i * comb(k, i)*(k-i)**n for i in range(k+1))


Then our probability is

    k_factorial_stirling(M, N) / N**M

If we compute the probability that our ranger has seen all 100 wolves after catching and releasing 500 times, we get 0.5116. So the median number of catch and release cycles is somewhere around 500, which is consistent with our simulation above.

Note that the numerator and denominator in our calculation are both on the order of 101000 while the largest floating point number is on the order of 10308, so we would have overflowed if we’d used binom instead of comb.

## Related posts

[1] It’s unfortunate that these were called the “second kind” because they’re the kind that come up most often in application.

# Logarithmic spiral

I’ve seen an image similar to the following many times, but I don’t recall any source going into detail regarding how the spiral is constructed. This post will do just that.

The previous post constructed iterated golden rectangles. We start with a golden rectangle and imagine chopping of first the blue, then the green, then the orange, and then the red square. We could continue this process on the inner most rectangle, over and over.

At this point, many sources would say “Hey look! We can draw a spiral that goes through two corners of each square” without explicitly stating an equation for the spiral. We’ll find spiral, making everything explicit.

In the previous post we showed how to find the limit of the iterative process at the intersection of the two diagonal lines below.

We showed that the intersection is at

x0 = (2φ + 1)/(φ + 2)

and

y0 = 1/(φ + 2).

## Logarithmic spirals

A logarithmic spiral centered at the origin has the polar equation

r(θ) = a exp(kθ)

for some constants a and k. Our spiral will be such a spiral shifted to center at (x0y0).

Imagine a spiral going from the lower left to the top left of the blue square, then to the top left of the green square, then to the top right of the orange square etc., passing through two corners of every square. Every time θ goes through a quarter turn, i.e. π/2 radians, our rectangles get smaller by a factor of φ. We can use this to solve for k because

a exp(k(θ + π/2)) = φ a exp()

and so

exp(kπ/2) = φ

and

k = 2log(φ) / π.

To summarize what we know so far, if we shift our spiral to the origin, it has equation in polar form

r(θ) = a exp(kθ)

where we now know k but don’t yet know a.

## Finding θ and a

To get our actual spiral, we have to shift the origin to (x0, y0) which we have calculated above. Converting from polar to rectangular coordinates so we can do the shift, we have

x(θ) = x0 + a exp(kθ) cos(θ)

y(θ) = y0 + a exp(kθ) sin(θ)

We can find the parameter a by knowing that our spiral passes through the origin, the bottom left corner in the plots above. At what value of θ does the curve go through the origin? We have a choice; values of θ that differ by multiples of 2π will give different values of a.

The angle θ is the angle made by the line connecting (0, 0) to (x0, y0). We can find θ via

θ0 = atan2(-y0, –x0).

Here’s Python code to find θ0:

    import numpy as np

phi = (1 + 5**0.5)/2
y0 = 1/(2+phi)
x0 = (2*phi + 1)*y0
theta0 = np.arctan2(-y0, -x0)


Then we solve for a from the equation x0) = 0.

    k = 2*np.log(phi)/np.pi
a = -x0 / (np.exp(k*theta0)*np.cos(theta0)))

## Plots

Now we have all the parameters we need to define our logarithmic spiral. The following code draws our logarithmic spiral on top of the plot of the rectangles.

    t = np.linspace(-20, theta0, 1000)
def x(t): return x0 + a*np.exp(k*t)*np.cos(t)
def y(t): return y0 + a*np.exp(k*t)*np.sin(t)
plt.plot(x(t), y(t), "k-")


The result is the image at the top of the post.

# Non-associative multiplication

There are five ways to parenthesize a product of four things:

• ((ab)c)d
• (ab)(cd)
• (a(b(cd))
• (a(bc))d
• (a((bc)d)

In a context where multiplication is not associative, the five products above are not necessarily the same. Maybe all five are different.

This post will give two examples where the products above are all different: octonions and matrix multiplication.

If you’re thinking “But wait: matrix multiplication is associative!” then read further and see in what sense I’m saying it’s not.

## Octonioins

Octonion multiplication is not associative. I wrote a blog post a while back asking how close the octonions come to being associative. That is, if we randomly generate unit-length octonions a, b, and c, we can calculate the norm of

(ab)ca(bc)

and ask about its expected value. Sometimes for a triple of octionions this value is zero, but on average this expression has norm greater than 1. I estimated the average value via simulation, and later Greg Egan worked out the exact value. His write-up had gone down with the Google+ ship, but recently Greg posted a new version of his notes.

In this post I gave Python code for multiplying octonions using a function called CayleyDickson named after the originators of the algorithm. Let’s rename it m to have something shorter to work with and compute all five ways of associating a product of four octonions.

    import numpy as np

def m(x, y):
return CayleyDickson(x, y)

def take_five(a, b, c, d):
return [
m(a, (m(b, m(c, d)))),
m(m(a, b), m(c, d)),
m(m(m(a, b), c), d),
m(a, m(m(b, c), d)),
m(m(a, m(b, c)), d)
]


I first tried products of basis elements, and I only got two different products out of the five ways of associating multiplication, but I only tried a few examples. However, when I tried using four random octonions, I got five different products.

    np.random.seed(20220201)

a = np.random.rand(8)
b = np.random.rand(8)
c = np.random.rand(8)
d = np.random.rand(8)

for t in take_five(a, b, c, d):
print(t)


This gave a very interesting result: I got five different results, but only two different real (first) components. The five vectors all differed in the last seven components, but only produced two distinct first components.

    [ 2.5180856  -2.61618184  ...]
[ 2.5180856   0.32031027  ...]
[ 2.5180856   1.13177500  ...]
[ 3.0280984  -0.30169446  ...]
[ 3.0280984  -2.36523580  ...]


I repeated this experiment a few times, and the first three results always had the same real component, and the last two results had another real component.

I suspect there’s a theorem that says

Re( ((ab)c)d ) = Re( (ab)(cd) ) = Re( a(b(cd)) )

and

Re( (a(bc))d ) = Re( a((bc)d) )

but I haven’t tried to prove it. If you come with a proof, or a counterexample, please post a link in the comments.

## Matrix multiplication

Matrix multiplication is indeed associative, but the efficiency of matrix multiplication is not. That is, any two ways of parenthesizing a matrix product will give the same final matrix, but the cost of the various products are not the same. I first wrote about this here.

This is a very important result in practice. Changing the parentheses in a matrix product can make the difference between a computation being practical or impractical. This comes up, for example, in automatic differentiation and in backpropagation for neural networks.

Suppose A is an m by n matrix and B is an n by p matrix. Then AB is an m by p matrix, and forming the product AB requires mnp scalar multiplications. If C is a by q matrix, then (AB)C takes

mnp + mpq = mp(n + q)

scalar multiplications, but computing A(BC) takes

npqmnq = nq(m + p)

scalar multiplications, and in general these are not equal.

Let’s rewrite our multiplication function m and our take_five function to compute the cost of multiplying four matrices of random size.

We’ve got an interesting programming problem in that our multiplication function needs to do two different things. First of all, we need to know the size of the resulting matrix. But we also want to keep track of the number of scalar multiplications the product would require. We have a sort of main channel and a side channel. Having our multiplication function return both the dimension and the cost would make composition awkward.

This is is kind of a fork in the road. There are two ways to solving this problem, one high-status and one low-status. The high-status approach would be to use a monad. The low-status approach would be to use a global variable. I’m going to take the low road and use a global variable. What’s one little global variable among friends?

    mults = 0

def M(x, y):
global mults
mults += x[0]*x[1]*y[0]
return (x[0], y[1])

def take_five2(a, b, c, d):

global mults
costs = []

mults = 0; M(a, (M(b, M(c, d)))); costs.append(mults)
mults = 0; M(M(a, b), M(c, d));   costs.append(mults)
mults = 0; M(M(M(a, b), c), d);   costs.append(mults)
mults = 0; M(a, M(M(b, c), d));   costs.append(mults)
mults = 0; M(M(a, M(b, c)), d);   costs.append(mults)

return costs


Next, I’ll generate five random integers, and group them in pairs as sizes of matrices conformable for multiplication.

    dims = np.random.random_integers(10, size=5)
dim_pairs = zip(dims[:4], dims[1:])
c = take_five2(*[p for p in dim_pairs])


When I ran this dims was set to

[3 9 7 10 6]

and so my matrices were of size 3×9, 9×7, 7×10, and 10×6.

The number of scalar multiplications required by each way of multiplying the four matrices, computed by take_five2 was

[1384, 1090, 690, 1584, 984]

So each took a different number of operations. The slowest approach would take more than twice as long as the fastest approach. In applications, matrices can be very long and skinny, or very wide and thin, in which case one approach may take orders of magnitude more operations than another.