Rapidly mixing random walks on graphs

Random walks mix faster on some graphs than on others. Rapid mixing is one of the ways to characterize expander graphs.

By rapid mixing we mean that a random walk approaches its limiting (stationary) probability distribution quickly relative to random walks on other graphs.

Here’s a graph that supports rapid mixing. Pick a prime p and label nodes 0, 1, 2, 3, …, p-1. Create a circular graph by connecting each node k to k+1 (mod p).  Then add an edge between each non-zero k to its multiplicative inverse (mod p). If an element is it’s own inverse, add a loop connecting the node to itself. For the purpose of this construction, consider 0 to be its own inverse. This construction comes from [1].

Here’s what the graph looks like with p = 23.

Cayley graph of order 23

This graph doesn’t show the three loops from a node to itself, nodes 1, 0, and 22.

At each node in the random walk, we choose an edge uniformly and follow it. The stationary distribution for a random walk on this graph is uniform. That is, in the long run, you’re equally likely to be at any particular node.

If we pick an arbitrary starting node, 13, and take 30 steps of the random walk, the simulated probability distribution is nearly flat.

By contrast, we take a variation on the random walk above. From each node, we move left, right, or stay in place with equal probability. This walk is not as well mixed after 100 steps as the original walk is after only 30 steps.

You can tell from the graph that the walk started around 13. In the previous graph, evidence of the starting point had vanished.

The code below was used to produce these graphs.

To investigate how quickly a walk on this graph converges to its limiting distribution, we could run code like the following.

    from random import random
    import numpy as np
    import matplotlib.pyplot as plt
    m = 23

    # Multiplicative inverses mod 23
    inverse = [0, 1, 12, 8, 6, 14, 4, 10, 3, 18, 7, 21, 2, 16, 5, 20, 13, 19, 9, 17, 15, 11, 22]

    # Verify that the inverses are correct
    assert(len(inverse) == m)
    for i in range(1, m):
        assert (i*inverse[i] % 23 == 1)

    # Simulation
    num_reps = 100000
    num_steps = 30

    def take_cayley_step(loc):
        u = random() * 3
        if u < 1: # move left
            newloc = (loc + m - 1) % m
        elif u < 2: # move right
            newloc = (loc + 1) % m
        else:
            newloc = inverse[loc] # or newloc = loc
        return newloc

    stats = [0]*m
    for i in range(num_reps):
        loc = 13 # arbitrary fixed starting point
        for j in range(num_steps):
            loc = take_cayley_step(loc)
        stats[loc] += 1

    normed = np.array(stats)/num_reps
    plt.plot(normed)
    plt.xlim(1, m)
    plt.ylim(0,2/m)
    plt.show()

Related posts:

* * *

[1] Shlomo Hoory, Nathan Linial, Avi Wigderson. “Expander graphs and their applications.” Bulletin of the American Mathematical Society, Volume 43, Number 4, October 2006, pages 439–561.

Branch cuts and Common Lisp

“Nearly everything is really interesting if you go into it deeply enough.” — Richard Feynman

If you thumb through Guy Steele’s book Common Lisp: The Language, 2nd Edition, you might be surprised how much space is devoted to defining familiar functions: square root, log, arcsine, etc. He gives some history of how these functions were first defined in Lisp and then refined by the ANSI (X3JI3) standardization committee in 1989.

There are three sources of complexity:

  1. Complex number arguments
  2. Multi-valued functions
  3. +0 and -0

Complex arguments

The functions under discussion are defined for either real or complex inputs. This does not complicate things much in itself. Defining some functions for complex arguments, such as the exp function, is simple. The complication comes from the interaction of complex arguments with multi-valued functions and floating point representations of zero.

Multi-valued functions

The tricky functions to define are inverse functions, functions where we have to make a choice of range.

Real multi-valued functions

Let’s restrict our attention to real numbers for a moment. How do you define the square root of a positive number x? There are two solutions to the equation y2 = x, and √x is defined to be the positive solution.

What about the arcsine of x? This is the number whose sine is x. Except there is a “the” number. There are infinitely many numbers whose sine is x, so we have to make a choice. It seems natural to chose values in an interval symmetric about 0, so we take arcsine of x to be the number between -π/2 and π/2 whose sine is x.

Now what about arctangent? As with arcsine, we have to make a choice because for any x there are infinitely many numbers y whose tangent is x. Again it’s convenient to define the range to be in an interval symmetric about zero, so we define the arctangent of x to be the number y between -π/2 and π/2 whose tangent is x. But now we have a subtle complication with tangent we didn’t have with sine because tangent is unbounded. How do we want to define the tangent of a vertical angle? Should we call it ∞ or -∞? What do we want to return if someone asks for the arctangent of ±∞? Should we return π/2 or -π/2?

Complex multi-valued functions

The discussion shows there are some minor complications in defining inverse functions on the real line. Things get more complicated when working in the complex plane. To take the square root example, it’s easy to say we’re going to define square root so that the square root of a positive number is another positive number. Fine. But which solution to z2 = w should we take to be the square root of a complex number w, such as 2 + 3i or -5 + 17i?

Or consider logarithms. For positive numbers x there is only one real number y such at exp(y) = x. But what if we take a negative value of x such as -1? There’s no real number whose exponential is -1, but there is a complex number. In fact, there are infinitely many complex numbers whose exponential is -1. Which one should we choose?

Floating point representations of zero

A little known feature of floating point arithmetic (specified by the IEEE 754 standard) is that there are two kinds of zero: +0 and -0. This sounds bizarre at first, but there are good reasons for this, which I explain here. But defining functions to work properly with  two kinds of zero takes a lot of work. This was the main reason the ANSI Common Lisp committee had to revise their definitions of several transcendental functions. If a function has a branch cut discontinuity along the real axis, for example, you want your definition to be continuous as you approach x + 0i from above and as you approach x -0i from below.

The Common Lisp solution

I’ll cut to the chase and present the solution the X3J13 came up with. For a discussion of the changes this required and the detailed justifications, see Guy Steele’s book.

The first step is to carefully define the two-argument arctangent function (atan y x) for all 16 combinations of y and x being positive, negative, +0, or -0. Then other functions are defined as follows.

  1. Define phase in terms of atan.
  2. Define complex abs in terms of real sqrt.
  3. Define complex log in terms of phase, complex abs, and real log.
  4. Define complex sqrt in terms of complex log.
  5. Define everything else in terms of the functions above.

The actual implementations may not follow this sequence, but they have to produce results consistent with this sequence.

The phase of z is defined as the arctangent with arguments the imaginary and real parts of z.

The complex log of z is defined as log |z| + i phase(z).

Square root of z is defined as exp( log(z) / 2 ).

The inverses of circular and hyperbolic functions are defined as follows.

\arcsin z &=& -i \log \left( iz + \sqrt{1 - z^2} \right) \\ \arccos z &=& -i \log \left( z + i \sqrt{1 - z^2} \right) \\ \mbox{arcsinh} z &=& \log \left( z + \sqrt{1 + z^2} \right) \\ \mbox{arccosh} z &=& 2 \log\left( \sqrt{(z+1)/2} + \sqrt{(z-1)/2} \right) \\ \mbox{arctanh} z &=& \log \left( (1+z) \sqrt{1/(1 - z^2)}\right)

Note that there are many ways to define these functions that seem to be equivalent, and are equivalent in some region. Getting the branch cuts right is what makes this complicated.

 

Subjectivity in statistics

Andrew Gelman on subjectivity in statistics:

Bayesian methods are often characterized as “subjective” because the user must choose a prior distribution, that is, a mathematical expression of prior information. The prior distribution requires information and user input, that’s for sure, but I don’t see this as being any more “subjective” than other aspects of a statistical procedure, such as the choice of model for the data (for example, logistic regression) or the choice of which variables to include in a prediction, the choice of which coefficients should vary over time or across situations, the choice of statistical test, and so forth. Indeed, Bayesian methods can in many ways be more “objective” than conventional approaches in that Bayesian inference, with its smoothing and partial pooling, is well adapted to including diverse sources of information and thus can reduce the number of data coding or data exclusion choice points in an analysis.

People worry about prior distributions, not because they’re subjective, but because they’re explicitly subjective. There are many other subjective factors, common to Bayesian and Frequentist statistics, but these are implicitly subjective.

In practice, prior distributions often don’t make much difference. For example, you might show that an optimistic prior and a pessimistic prior lead to the same conclusion.

If you have so little data that the choice of prior does make a substantial difference, being able to specify the prior is a benefit. Suppose you only have a little data but have to make a decision anyway. A frequentist might say there’s too little data for a statistical analysis to be meaningful. So what do you do? Make a decision entirely subjectively! But with a Bayesian approach, you capture what is known outside of the data at hand in the form of a prior distribution, and update the prior with the little data you have. In this scenario, a Bayesian analysis is less subjective and more informed by the data than a frequentist approach.

Most mathematical problem statement

Every so often college students will ask me for advice regarding going into applied math. I’ll tell them the first step in an application, and often the hardest step, is formulating a problem, not solving it. People don’t approach you with mathematical problems per se but problems that can be turned into mathematical problems. Nobody is going to hand you a precisely formulated math problem. That only happens on exams.

Saying “nobody” is a bit of hyperbole. It happens, but not often. Most problems come to you in the language of the client’s business, such as “We want to make sure this machine doesn’t run into that machine” or “We’re trying to predict kidney failure.” [1]

Recently Charles McCreary from CRM Engineering sent me the most specific problem statement I’ve ever seen:

I need to calculate the von Mises yield envelope in the axial force-internal/external pressure domain for Lame’s solution …

That really took me by surprise. Sometimes a lead will mention mathematical terminology like “particle filters” or “finite elements,” though even this level of detail is uncommon. I’ve never seen something so specific.

It’s still the case that a lot of work goes into formulating a problem. I’m sure Charles’ client didn’t approach him with this statement. I’m consulting to a consultant who has already done the hard work of reducing the original application down to a purely mathematical problem. (I’d call it “purely mathematical” rather than “pure mathematical” because it’s definitely applied math.) I look forward to digging into it as soon as I wrap up what I’m currently working on.

* * *

[1] Nobody has come to me wanting to predict kidney failure, but I’ve worked on predicting several other maladies that I’ll not mention to maintain confidentiality.

An integral with a couple lessons

If you present calculus students with a definite integral, their Pavlovian response is “Take the anti-derivative, evaluate it at the limits, and subtract.” They think that’s what it means. But it’s not what a definite integral means. It’s how you (usually) calculate its value. This is not a pedantic fine point but a practically important distinction. It pays to distinguish what something means from how you usually calculate it. Without this distinction, things that are possible may seem impossible. [1]

For example, suppose you want to compute the following integral that comes up frequently in probability.

\int_{-\infty}^\infty e^{-x^2}\, dx

There is no (elementary) function whose derivative is exp(-x2). It’s not just hard to find or ugly. It simply doesn’t exist, not within the universe of elementary functions. There are functions whose derivative is exp(-x2), but these functions are not finite algebraic combinations of the kinds of functions you’d see in high school.

If you think of the definite integral above as meaning “the result you get when you find an antiderivative, let its arguments go off to ∞ and -∞, and subtract the two limits” then you’ll never calculate it. And when you hear that the antiderivative doesn’t exist (in the world of functions you’re familiar with) then you might think that not only can you not calculate the integral, no one can.

In fact the integral is easy to calculate. It requires an ingenious trick [2], but once you see that trick it’s not hard.

Let I be the value of the integral. Changing the integration variable makes no difference, i.e.

I = \int_{-\infty}^\infty e^{-x^2}\, dx = \int_{-\infty}^\infty e^{-y^2}\, dy

and so

I^2 = \left(\int_{-\infty}^\infty e^{-x^2}\, dx\right) \left( \int_{-\infty}^\infty e^{-y^2}\, dy\right) = \int_{-\infty}^\infty\!\int_{-\infty}^\infty e^{-x^2 - y^2} \, dx\, dy

This integral can be converted to polar coordinates. Instead of describing the plane as an infinite square with x and y each going off to infinity in both directions, we can think of it as an infinite disk, with radius going off to infinity. The advantage of this approach is that the Jacobian of the change of variables gives us an extra factor of r that makes the exponential integral tractable.

\int_0^{2\pi} \! \int_0^\infty e^{-r^2} r \, dr\, d\theta = \frac{1}{2} \int_0^{2\pi} 1\, d\theta = \pi

From this we get I2 = π and so I = √π.

This specific trick comes up occasionally. But more generally, it is often the case that definite integrals are easier to compute than indefinite integrals. One of the most common applications of complex analysis is computing such integrals through the magic of contour integration. This leads to a lesson closely related to the one above, namely that you may not have to do what it looks like you need to do. In this case, you don’t always need to compute indefinite integrals (anti-derivatives) as an intermediate step to compute definite integrals. [3]

Mathematics is filled with theorems that effectively say that you don’t actually have to compute what you conceptually need to compute. Sometimes you can get by with calculating much less.

* * *

[1] One frustration I’ve had working with statisticians is that many have forgotten the distinction between what they want to calculate and how they calculate it. This makes it difficult to suggest better ways of computing things.

[2] Lord Kelvin said of this trick “A mathematician is one to whom that is as obvious as that twice two makes four is to you. Liouville was a mathematician.”

[3] If you look back carefully, we had to compute the integral of exp(-r2) r, which you would do by first computing its anti-derivative. But we didn’t have to compute the anti-derivative of the original integrand. We traded a hard (in some sense impossible) anti-derivative problem for an easy one.

How a couple failed auditions worked out well

When I was in high school, one year I made the Region choir. I had no intention of competing at the next level, Area, because I didn’t think I stood a chance of going all the way to State, and because the music was really hard: Stravinsky’s Symphony of Psalms.

My choir director persuaded me to try anyway, with just a few days before auditions. That wasn’t enough time for me to learn the music with all its strange intervals. But I tried out. I sang the whole thing. As awful as it was, I kept going. It was about as terrible as it could be, just good enough to not be funny. I wanted to walk out, and maybe I should have out of compassion for the judges, but I stuck it out.

I was proud of that audition, not as a musical achievement, but because I powered through something humiliating.

I did better in band than in choir. I made Area in band and tried out for State but didn’t make it. I worked hard for that one and did a fair job, but simply wasn’t good enough.

That turned out well. It was my senior year, and I was debating whether to major in math or music. I’d told myself that if I made State, I’d major in music. I didn’t make State, so I majored in math and took a few music classes for fun. We can never know how alternative paths would have worked out, but it’s hard to imagine that I would have succeeded as a musician. I didn’t have the talent or the temperament for it.

When I was in college I wondered whether I should have done something like acoustical engineering as a sort of compromise between math and music.  I could imagine that working out. Years later I got a chance to do some work in acoustics and enjoyed it, but I’m glad I made a career of math. Applied math has given me the chance to work in a lot of different areas—to play in everyone else’s back yard, as John Tukey put it—and I believe it suits me better than music or acoustics would have.

Setting up Emacs shell on a Mac

Here are a few things I’ve had to figure out in the process of setting up Emacs on a Mac, in particular with getting shell-mode to work as I’d like. Maybe this will save someone else some time if they want to do the same.

I’ve used a Mac occasionally since the days of the beige toasters, but I never owned one until recently. I’ve said for years that I’d buy a Mac as soon as I have a justification, and I recently started a project that needs a Mac.

I’d heard that Emacs was hard to set up on Mac, but that has not been my experience. I’m running Emacs 25.1 on macOS 10.12.1. Maybe there were problems with earlier versions of Emacs or OS X that I skipped. Or maybe there are quirks I haven’t run into yet. So far my only difficulties have been related to running a shell inside Emacs.

Path differences

The first problem I ran into is that my path is not the same inside shell-mode as in a terminal window. A little searching showed a lot of discussion of this problem but no good solutions. My current solution is to run source .bash_profile from my bash shell inside Emacs to manually force it to read the configuration file. There’s probably a way to avoid this, and if you know how please tell me, but this works OK for now.

Manually sourcing the .bash_profile file works for bash but doesn’t work for Eshell. I doubt I’ll have much use for Eshell, however. It’s more useful on Windows when you want a Unix-like shell inside Emacs.

Update: Dan Schmidt pointed out in the comments that Emacs reads .bashrc rather than .bash_profile. It seems that Mac doesn’t read .bashrc at all, at least not if it can find a .bash_profile file. I created a .bashrc file that sources .bash_profile and that fixed my problem, though it did not fix the problem with Eshell or the path problem below.

Scrolling command history

The second problem I had was that Control-up arrow does not scroll through shell history because that key combination has special meaning to the operating system, bringing up Mission Control. Quite a surprise when you expect to scroll through previous commands but instead your entire screen changes.

I got around this by putting the following code in my Emacs config file and using Alt-up and Alt-down instead of Control-up and Control-down to scroll shell history. (I’m using my beloved Microsoft Natural keyboard, so I have an Alt key.)

(add-hook 'shell-mode-hook
  (lambda ()
    (define-key shell-mode-map (kbd "<M-up>") 'comint-previous-input)
    (define-key shell-mode-map (kbd "<M-down>") 'comint-next-input)
  )
)

Another path problem

The last problem I had was running the Clojure REPL inside Emacs. When I ran lein repl from bash inside Emacs I got an error saying command not found. Apparently running source .bash_profile didn’t give me entirely the same path in Emacs as in a terminal. I was able to fix the following to my Emacs config file.

(add-to-list 'exec-path "/usr/local/bin")

This works, though there are a couple things I don’t understand. First, I don’t understand why /usr/local/bin was missing from my path inside Emacs. Second, I don’t understand why adding the path customizations from my .bash_profile to exec-path doesn’t work. Until I need to understand this, I’m willing to let it remain a mystery.

Update: LaTeX path problem

After fixing the problems mentioned in the original post, I ran into another problem. Trying to run LaTeX on a file failed saying that pdflatex couldn’t be found. Adding the path to pdflatex to the exec-path didn’t work. But the following code from the TeX Stack Exchange did work:

(getenv "PATH")
(setenv "PATH" (concat "/Library/TeX/texbin" ":" (getenv "PATH")))

This is the path for El Capitan and Sierra. The path is different in earlier versions of the OS.

Portable Emacs config file

By the way, you can use one configuration file across operating systems by putting code like this in your file.

(cond
    ((string-equal system-type "windows-nt") 
        (progn
            ; Windows-specific configurations
            ...
        )
    )
    ((string-equal system-type "gnu/linux")
        (progn
            ; Linux-specific configurations
            ...
        )
    )
    ((string-equal system-type "darwin")
        (progn
            ; Mac-specific configurations
            ...
        )
    )
)

If you need machine-specific configuration for two machines running the same OS, you can test system-name rather than system-type.

Some frequently asked questions

I don’t have an FAQ page per se, but I’ve written a few blog posts where I answer some questions, and here I’ll answer a few more.

Should I get a PhD?

See my answer here and take a look at some of the other answers on the same site.

Do you have any advice for people going out on their own?

Yes. See my post Advice for going solo.

Shortly after I went out on my own, I wrote this post responding to questions people had about my particular situation. My answers there remain valid, except one. I said that planned to do anything I can do well that also pays well. That was true at the time, but I’ve gotten a little more selective since then.

Can you say more about the work you’ve been doing?

Only in general terms. For example, I did some work with psychoacoustics earlier this year, and lately I’ve been working with medical device startups and giving expert testimony.

Nearly all the work I do is covered under NDA (non-disclosure agreement). Occasionally a project will be public, such as the white paper I wrote for Hitachi Data Systems comparing replication and erasure coding. But usually a project is confidential, though I hope to be able to say more about some projects after they come to market.

Miscellaneous other questions

I wrote an FAQ post of sorts a few years ago. Here are the questions from that post that people still ask fairly often.

Any more questions?

You can use this page to send me a question and see my various contact information. The page also has a link to a vCard you could import into your contact manager.

Longhorn tribute to fallen Aggies

For many years, rivals University of Texas and Texas A&M University played each other in football on Thanksgiving. In 1999, the game fell one week after the collapse of the Aggie Bonfire killed 12 A&M students and injured 27.

The University of Texas band’s half time show that year was a beautiful tribute to the fallen A&M students.

A different kind of network book

Yesterday I got a review copy of The Power of Networks. There’s some math inside, but not much, and what’s there is elementary.

I’d say it’s not a book about networks per se but a collection of topics associated with networks: cell phone protocols, search engines, auctions, recommendation engines, etc. It would be a good introduction for non-technical people who are curious about how these things work. More technically inclined folks probably already know much of what’s here.

Hard work

The pinned tweet on my Twitter account at the moment says “Productivity tip: work hard.” It’s gotten a lot of positive feedback, so I assume it has resonated with a few people.

I don’t know how people take it, but here’s what I meant by it. Sometimes you can find a smarter way to work, and if you can, I assume you’re doing that. Don’t drive nails with your shoe if you can find a hammer. But ultimately the way to get things done is hard work. You might see some marginal increase in productivity from using some app or another, but there’s nothing that’s going to magically make you 10x more productive without extra effort.

Many people have replied on Twitter “I think you mean ‘work smart.'” At some point “work smarter” wasn’t a cliché, but now it is. The problem of our time isn’t people brute-forcing their way with hard, thoughtless work. We’re more likely to wish for a silver bullet. We’re gnostics.

Smart work is a kind of hard work. It may take less physical work but more mental work. Or less mental work and more emotional work. It’s hard work to try to find a new perspective and take risks.

One last thought: hard work is not necessarily long work. Sometimes it is, but often not. Hard creative work requires bursts of mental or emotional effort that cannot be sustained for long.

Ultra-reliable software

From a NASA page advocating formal methods:

We are very good at building complex software systems that work 95% of the time. But we do not know how to build complex software systems that are ultra-reliably safe (i.e. P_f < 10^-7/hour).

Emphasis added.

Developing medium-reliability and high-reliability software are almost entirely different professions. Using typical software development procedures on systems that must be ultra-reliable would invite disaster. But using extremely cautious development methods on systems that can afford to fail relatively often would be an economic disaster.

Related post: Formal validation methods let you explore the corners

 

Technological allegiances

I used to wonder why people “convert” from one technology to another. For example, someone might convert from Windows to Linux and put a penguin sticker on their car. Or they might move from Java to Ruby and feel obligated to talk about how terrible Java is. They don’t add a new technology, they switch from one to the other. In the words of Stephen Sondheim, “Is it always or, and never and?”

Rivalries seem sillier to outsiders the more similar the two options are. And yet this makes sense. I’ve forgotten the psychological term for this, but it has a name: Similar things compete for space in your brain more than things that are dissimilar. For example, studying French can make it harder to spell English words. (Does literature have two t’s in French and one in English or is it the other way around?) But studying Chinese doesn’t impair English orthography.

It’s been said that academic politics are so vicious because the stakes are so small [1]. Similarly, there are fierce technological loyalties because the differences with competing technologies are so small, small enough to cause confusion. My favorite example: I can’t keep straight which languages use else if, elif, elseif, … in branching.

If you have to learn two similar technologies, it may be easier to devote yourself exclusively to one, then to the other, then use both and learn to keep them straight.

Related post: Ford-Chevy arguments in technology

[1] I first heard this attributed to Henry Kissinger, but there’s no agreement on who first said it. Several people have said similar things.