New Twitter account for functional programming and categories

I’m starting a new Twitter account @FunctorFact for functional programming and category theory.

These two subjects have a lot of overlap, and some tweets will combine both, but many will be strictly about one or the other. So some content will be strictly about programming, some strictly about math, and some combining ideas from both.

FunctorFact icon

ASCII art diagrams in Emacs org-mode

Yesterday I wrote about ASCII art diagrams and gave four reasons you might want to use this ancient approach to creating simple diagrams:

  • It could be quicker than creating a graphical image .
  • You can paste them into plain text documents like source code files.
  • They can be produced programmatically.
  • There is software to turn ASCII art into more polished images.

Today I’ll post a few notes about how to create graphical versions of ASCII diagrams in Emacs with org-mode.

Running code inside org-mode

You can embed and execute source code in org-mode files. I wrote a couple posts about this, one showing how to run Python and R inside org-mode and another showing how to mix languages in org-mode. The latter shows Perl calling R and Python, all in 14 lines of code.

There are currently 39 programming languages that org-mode can call by default. In addition to conventional programming languages like the ones listed above, org-mode also supports ditaa, the language that treats ASCII art as a specification language to produce graphics.

You can edit code blocks just as you would other text in an org-mode file. But if you’d like to edit a code block in its language’s mode, type C-c ' from inside the code block. If you’re editing Python code, for example, this will open a new window, in Python mode, containing just that code block. If you type C-c ' inside a ditaa code block, Emacs opens a new window in “artist mode,” a mode specifically for editing ASCII art.

You can run code inside org-mode two ways: interactively and for export. With your cursor inside a block of code, type C-c C-c to execute the code and report the results. You can also export an entire org-mode document and have the results of code execution embedded in the final document. This works much the same as “*weave” projects like Sweave, Pweave, and Hweave. But while each of these is specific to a particular programming language (R, Python, and Haskell respectively), org-mode works with dozens of languages, including ditaa.

Running ditaa inside org-mode

You embed ditaa code just like you’d embed any other code. In the first post mentioned above, I gave this example of calling R:

    #+begin_src R
    sqrt(42)
    #+end_src

Here’s the analogous code for ditaa:

    #+BEGIN_SRC ditaa :file foo.png
    +-------+
    | Hello |
    +-------+
    #+END_SRC

The markers to begin and end a source code segment can be upper or lower case. I used lower case in my previous post, but it may be more readable to use upper case so that the markers stand out better from their arguments.

The R example above didn’t use any header arguments, though it could have. With ditaa, you must provide a header argument: the name of the file to save the graphics in.

If you run the ditaa code by navigating inside the code block and running C-c C-c, Emacs will add a couple lines after the code block:

    #+RESULTS:
    [[file:foo.png]]

This is the literal text, what you’d see if you opened the file in another editor. But org-mode uses the double brackets for links. You wouldn’t see the double brackets. Instead you’d see a hyperlink with text file:foo.png. Clicking on that link opens the image.

Hello image produce by ditaa

You can export the org-mode file with the command C-c C-e. This brings up a menu of export options: h for HTML, l for LaTeX, etc. Within each of these are further options. For example, you could type l for LaTeX and then type l again to select export to LaTeX or type p to have have org-mode run LaTeX on the file and produce a PDF. If you know you want a PDF, you could do this all in one command: C-c C-l l p.

You can control whether org-mode exports code or the results of running code (or both or neither). Org-mode exports the results of ditaa code, i.e. graphics files, by default. This makes sense: your PDF file will have a nice image version of your diagram, not the ASCII art used as its source code.

Configuration and troubleshooting

By default, the only programming language you can run inside org-mode is Emacs Lisp. This makes sense. You want to be deliberate about what code you run, but if you don’t want to run Emacs Lisp you’d better not run Emacs!

Inside your Emacs configuration file, you specify what languages org-mode is allowed to run. Here’s an example allowing Python and ditaa:

    
    (org-babel-do-load-languages 'org-babel-load-languages '(
        (python . t) 
        (ditaa . t))
    )

Recent versions of Emacs are supposed to ship with ditaa, but the ditaa jar file was missing from the two computers I experimented with. Emacs complained

    org-babel-execute:ditaa: Could not find ditaa.jar at ...

and so I copied ditaa.jar to the place where Emacs said it was looking for it. That worked, but it’s kind of a kludge because now I had two copies of the jar file. A better solution is to tell Emacs where you already have ditaa.

I like to use the exact same init.el file on every machine I use and so I added a section where I put OS-specific and machine-specific configuration. Here I put a different path to ditaa for each OS.

    ;;----------------------------------------------------------------------
    ;; OS specific
    
    (cond
        ((string-equal system-type "windows-nt") ; Microsoft Windows
    
            (progn
                (setq-default ispell-program-name "C:/bin/Aspell/bin/aspell.exe") 
                # etc.
                (setq org-ditaa-jar-path "c:/bin/ditaa/ditaa.jar")
            )
        )
        ((string-equal system-type "gnu/linux") ; Linux
            (progn
                (setq x-select-enable-clipboard t)
                 # etc.
                (setq org-ditaa-jar-path "/usr/bin/ditaa")
            )
        )
    )

ASCII art diagrams

“Technology is additive.” — Kevin Kelly

Old technologies never die. Instead, their range of application shrinks. Or maybe it grows when conditions change.

ASCII art, drawing pictures with fixed-width plain text characters, is no longer how many people want to produce diagrams. Just fire up Adobe Illustrator and you get incomparably more resolution of expression.

And yet there are times when ASCII art comes in handy. You can, for example, paste it into source code files. Someone more familiar with Emacs than Illustrator may be able to produce a simple diagram in the former faster than the latter. And it can be relatively easy to programmatically produce a large number of ASCII art diagrams, depending on the nature of the diagrams.

It’s also possible to use ASCII art as a way to specify nicely rendered images. I’ll show how to do that with ditaa below.

Here’s an ASCII version of the conjugate prior diagram I made some time ago:

 

                          +-------------+
                          |             |
                          | Exponential |
                          |             |
                          +-------------+
                                 |
                          lambda |                                                          
                                 v
+-------------+           +-------------+           +-------------+
|             |   tau     |             |   lambda  |             |
|  Lognormal  |---------->|    Gamma    |<----------|   Poisson   |
|             |           |             |---+       |             |
+-------------+           +-------------+   |       +-------------+
      |                          ^   ^      | beta
      |                          |   |      |
      |                          |   +------+
      |                      tau |           
      |                          |           
      |                   +-------------+   
      |        mu         |             |   
      +------------------>|    Normal   |
                          |             |----+
                          +-------------+    | 
                                     ^       | mu
                                     |       |
                                     +-------+

And here’s the image produced by ditaa processing the ASCII diagram above:

Conjugate prior diagram produced by ditaa

Update: See my next post on how to create ASCII art diagrams and their graphic version from ditaa using Emacs org mode.

Update: When I first made the diagram above, I tried using Greek letters, e.g. using β rather than “beta,” but this didn’t work. I thought “OK, I suppose it’s not really ASCII art if you use non-ASCII characters.” But someone told me Unicode characters worked for him, so I tried again when I wrote the follow up post and it worked.

My first attempt, from a Windows laptop, calling ditaa from the command line, did not work. My second attempt, running inside org-mode from a Windows desktop, did work. My third attempt, running from Emacs on Linux also worked.

ditaa diagram using Unicode symbols

Insertion sort as a fold

I’ve written several posts lately about various algorithms that can be expressed as functional folds:

These have all been numerical algorithms. Insertion sort is an example of a non-numerical algorithm that could be implemented as a fold.

Insertion sort is not the fastest sorting algorithm. It takes O(n2) operations to sort a list of size n while the fastest algorithms take O(n log n) operations. But insertion sort is simple, and as it works its way down a list, the portion it has processed is sorted. If you interrupt it, you have correct results given the input so far. If you interrupt something like a quicksort, you don’t have a sorted list. If you’re receiving a stream of data points and want to sort them as they come, you have to use insertion sort rather than something like quicksort.

The function to fold over a list looks like this:

    f as a = [x | x <- as, x < a] ++ [a] ++ [x | x <- as, x >= a]

Given a sorted list as and a value a, return a new sorted list that has inserted the element a in the proper position. Our function f does this by joining together the list of the elements less than a, the list containing only a, and the list of elements at least as big as a.

Here’s how we could use this to alphabetize the letters in the word “example.”

    foldl f [] "example"

This returns "aeelmpx".

Haskell takes our function f and an empty list of characters [] and returns a new list of characters by folding f over the list of characters making up the string "example".

You can always watch how foldl works by replacing it with scanl to see intermediate results.

    scanl f [] "example"

returns

    ["", "e", "ex", "aex", "aemx", "aempx", "aelmpx", "aeelmpx"]

Computing higher moments with a fold

Folds in functional programming are often introduced as a way to find the sum or product of items in a list. In this case the fold state has the same type as the list items. But more generally the fold state could have a different type, and this allows more interesting applications of folds. Previous posts look at using folds to update conjugate Bayesian models and numerically solve differential equations.

This post uses a fold to compute mean, variance, skewness, and kurtosis. See this earlier post for an object-oriented approach. The code below seems cryptic out of context. The object-oriented post gives references for where these algorithms are developed. The important point for this post is that we can compute mean, variance, skewness, and kurtosis all in one pass through the data even though textbook definitions appear to require at least two passes. It’s also worth noting that the functional version is less than half as much code as the object-oriented version.

(Algorithms that work in one pass through a stream of data, updating for each new input, are sometimes called “online” algorithms. This is unfortunate now that “online” has come to mean something else.)

The Haskell function moments below returns the number of samples and the mean, but does not directly return variance, skewness and kurtosis. Instead it returns moments from which these statistics can easily be calculated using the mvks function.

    moments (n, m1, m2, m3, m4) x = (n', m1', m2', m3', m4')
        where
            n' = n + 1
            delta = x - m1
            delta_n = delta / n'
            delta_n2 = delta_n**2
            t = delta*delta_n*n
            m1' = m1 + delta_n
            m4' = m4 + t*delta_n2*(n'*n' - 3*n' + 3) + 6*delta_n2*m2 - 4*delta_n*m3
            m3' = m3 + t*delta_n*(n' - 2) - 3*delta_n*m2
            m2' = m2 + t

    mvsk (n, m1, m2, m3, m4) = (m1, m2/(n-1.0), (sqrt n)*m3/m2**1.5, n*m4/m2**2 - 3.0)                         

Here’s an example of how you would use this Haskell code to compute statistics for the list [2, 30, 51, 72]:

    ghci>  mvsk $ foldl moments (0,0,0,0,0) [2, 30, 51, 72]
    (38.75, 894.25,-0.1685, -1.2912)

The foldl applies moments first to its initial value, the 5-tuple of zeros. Then it iterates over the data, taking data points one at a time and visiting each point only once, returning a new state from moments each time. Another way to say this is that after processing each data point, moments returns the 5-tuple that it would have returned if that data only consisted of the values up to that point.

For a non-numerical example of folds, see my post on sorting.

Demystifying artificial intelligence

animated face

Computers do what we tell them to do. Period. Any talk of computers doing things they weren’t programmed to do is only a way of speaking. It’s a convenient shorthand when used properly, misleading mysticism when used improperly.

When you write a program

        print(24*7)

you could say that the computer isn’t programmed to print the number 168 in the sense that the code did not say

        print(168)

But of course the computer was programmed to print the number 168. It just wasn’t directly programmed to do so. Instead, it was given data and algorithms to apply to that data to produce the result. The program isn’t very useful because the degree of indirection is tiny. Artificial intelligence is more interesting because it increases the degree of indirection, but it’s still software instructing a computer to take in data and apply algorithms.

When someone says

A computer did this without being programmed!

mentally edit their statement to say

A computer did this without being directly programmed to do so!

The latter may still be impressive, but it’s not magic.

I was at a presentation once where software vendors were claiming that their software “discovered” the equation of motion for a pendulum. The software wasn’t directly programmed to do this, but it was programmed to read in data and find the best fit to the data from a set of basis functions which included sines and cosines. And it correctly found that a linear combination of sines and cosines best described the pendulum’s motion.

The pendulum example was not that impressive, though some applications of artificial intelligence truly are impressive, delivering results several layers of abstraction away from what the software was directly programmed to do. If there’s anything mysterious involved, it’s the statistical regularity of the world that allowed the software to make correct inference from the data it was given.

Technical memento mori

The Latin phrase memento mori means “remember that you must die.” It has been adopted into English to refer to an object that serves as a reminder of death, especially a skull. This is a common theme in art, such as Albrecht Dürer’s engraving St. Jerome in His Study.

Saint Jerome in His Study (Dürer)

I keep a copy of the book Inside OLE as a sort of technological memento mori.

Inside OLE by Kraig Brockschmidt

At some point in the 1990’s I thought OLE was the way of the future. My professional development as a programmer would be complete once I mastered OLE, and this book was the way to get there.

Most of you probably have no idea what OLE is, which is kinda my point. It stands for Object Linking and Embedding, a framework that began at Microsoft in 1990. It was a brilliant solution to the problems it was designed to address, given the limitations of its time. Today it seems quaint. Now I’m doubly removed from OLE: hardly any Windows software developers think about OLE, and I hardly think about Windows development. The thing I was anxious to understand deeply was irrelevant to me a few years later.

Despite my one-time infatuation with OLE, throughout my career I have mostly focused on things that will last. In particular, I’ve focused more on mathematics than technology because the former has a longer shelf life. As I quipped on Twitter one time, technology has the shelf life of bread, but mathematics has the shelf life of honey. Still, man does not live by honey alone. We need bread too, even if it only lasts a day.

Related post: Remembering COM

Frequently rediscovered technologies

Greenspun’s tenth rule of programming says

Any sufficiently complicated C or Fortran program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp.

Here I’m going to take seriously a rule that was only not entirely serious. It’s saying three things about Lisp.

  1. It’s a frequently rediscovered technology. There’s something inevitable about it.
  2. It’s not completely widely known. Not everyone knows about it, so they don’t know that they’re reinventing it.
  3. It’s not easy to implement, hence all the poor implementations.

The same could be said of state machines. A number of projects have grown until they contained an ad hoc, informally-specified, bug-ridden, slow implementation of state machines.

What are other ideas like Lisp and state machines that are frequently and poorly reinvented?

Searching files on Windows

Searching files on Windows is a pain. The built-in search features don’t find everything. There may be ways to make them work, but I haven’t persisted long enough to make them work.

On Linux, the combination of find, xargs, and grep works well, and sometimes it works on Windows using the GOW or GnuWin port of these tools. Again there may be a way to make the ported utilities work more as expected, though I haven’t found it. I suspect the problem isn’t with the tools per se but their interaction with the command line. I also tried Emacs features like rgrep, but these features use the ported find and grep utilities, and so you run into the same problems with Emacs as you do running them directly and more.

Ack logo

It looks like ack is the way to go. I heard about it a long time ago and kept meaning to try it out. Now I finally did. It’s fast, convenient, etc. But here are the two things I most like about it:

  1. Ack works the same across platforms.
  2. Ack uses Perl regular expression syntax.

While the alternatives above are supposed to work the same across platforms, they don’t in my experience. But ack does because it’s a pure Perl program. All the portability has been delegated to Perl, where it is well handled. I imagine once I become more familiar with ack I’ll prefer it on Linux as well.

Because it’s a Perl program, ack uses Perl regex syntax. Perl has the most powerful regex implementation out there, though I seldom need any features unique to Perl. More important for me is that Perl regular expression dialect is the one I remember most easily.

Related posts:

For daily tips on regular expressions, follow @RegexTip on Twitter.

Regex tip icon

Regular expression to match any chemical element

Here’s a frivolous exercise in regular expressions: Write a regex to match any chemical element symbol.

Here’s one solution.

A[cglmrstu]|B[aehikr]?|C[adeflmnorsu]?|D[bsy]|E[rsu]|F[elmr]?|G[ade]|H[efgos]?|I[nr]?|Kr?|L[airuv]|M[dgnot]|N[abdeiop]?|Os?|P[abdmortu]?|R[abefghnu]|S[bcegimnr]?|T[abcehilm]|U(u[opst])?|V|W|Xe|Yb?|Z[nr]

Making it more readable

Here’s the same expression in more readable form:

/
A[cglmrstu]     | 
B[aehikr]?      | 
C[adeflmnorsu]? | 
D[bsy]          | 
E[rsu]          | 
F[elmr]?        | 
G[ade]          | 
H[efgos]?       | 
I[nr]?          | 
Kr?             | 
L[airuv]        | 
M[dgnot]        | 
N[abdeiop]?     | 
Os?             | 
P[abdmortu]?    | 
R[abefghnu]     | 
S[bcegimnr]?    | 
T[abcehilm]     | 
U(u[opst])?     | 
V               | 
W               | 
Xe              | 
Yb?             | 
Z[nr]
/x

The /x option in Perl says to ignore white space. Other regular expression implementations have something similar. Python has two such options, X for similarity with Perl, and VERBOSE for readability. Both have the same behavior.

Regex syntax

The regular expression says that a chemical element symbol may start with A, followed by c, g, l, m, r, s, t, or u; or a B, optionally followed by a, e, h, i, k, or r; or …

The most complicated part of the regex is the part for symbols starting with U. There’s Uranium whose symbols is simply U, and there are the elements who have temporary names based on their atomic numbers: Ununtrium, Ununpentium, Ununseptium, and Ununoctium. These are just Latin for one-one-three, one-one-five, one-one-seven, and one-one-eight. The symbols are U, Uut, Uup, Uus, and Uuo. The regex U(u[opst])? can be read “U, optionally followed by u and one of o, p, s, or t.”

Note that the regex will match any string that contains a chemical element symbol, but it could match more. For example, it would match “I’ve never been to Boston in the fall” because that string contains B, the symbol for boron. Exercise: Modify the regex to only match chemical element symbols.

Regex golf

There may be clever ways to use fewer characters at the cost of being more obfuscated. But this is for fun anyway, so we’ll indulge in a little regex golf.

There are five elements whose symbols start with I or Z: I, In, Ir, Zn, and Zr. You could write [IZ][nr] to match four of these. The regex I|[IZ][nr] would represent all five with 10 characters, while I[nr]?|Z[nr] uses 12. Two characters saved! Can you cut out any more?

Regex resources

Notes on regular expressions in Python, PowerShell, R, Mathematica, and C++

For daily tips on regular expressions, follow @RegexTip on Twitter.

Regex tip icon

Algorithms vs Moore’s Law

I saw an impressive chart once of how numerical linear algebra algorithm efficiency have improved over time. I haven’t been able to find that chart, but here is something similar. Thanks to Naveen Palli for pointing this out.

Even more remarkable [than Moore’s law] — and even less widely understood — is that in many areas, performance gains due to improvements in algorithms have vastly exceeded even the dramatic performance gains due to increased processor speed.

… Here is just one example … Grötschel, an expert in optimization, observes that a benchmark production planning model solved using linear programming would have taken 82 years to solve in 1988, using the computers and the linear programming algorithms of the day. Fifteen years later — in 2003 — this same model could be solved in roughly 1 minute, an improvement by a factor of roughly 43 million. Of this, a factor of roughly 1,000 was due to increased processor speed, whereas a factor of roughly 43,000 was due to improvements in algorithms! Grötschel also cites an algorithmic improvement of roughly 30,000 for mixed integer programming between 1991 and 2008.

From a federal report with too long a title, page 71.

As I mentioned in my previous post, many of the speed-ups in optimization are the result of speed-ups in numerical linear algebra, because like so much other applied math, it all boils down to linear algebra.

Click to find out more about consulting for numerical computing

 

The Fast Fourier Transform (FFT) and big data

The most direct way to compute a Fourier transform numerically takes O(n2) operations. The Fast Fourier Transform (FFT) can compute the same result in O(n log n) operations. If n is large, this can be a huge improvement.

James Cooley and John Tukey (re)discovered the FFT in 1965. It was thought to be an original discovery at the time. Only later did someone find a sketch of the algorithm in the papers of Gauss.

Daniel Rockmore wrote the article on the Fast Fourier Transform in The Princeton Companion to Applied Mathematics. He says

In this new world of 1960s “big data,” a clever reduction in computational complexity (a term not yet widely in use) could make a tremendous difference.

Rockmore adds a very interesting footnote to the sentence above:

Many years later Cooley told me that he believed that the Fast Fourier transform could be thought of as one of the inspirations for asymptotic algorithmic analysis and the study of computational complexity, as previous to the publication of his paper with Tukey very few people had considered data sets large enough to suggest the utility of an asymptotic analysis.

Click to learn more about consulting help with signal processing

 

Related posts:

Algorithmic wizardry

Last week I wrote a short commentary on James Hague’s blog post Organization skills beat algorithmic wizardry. This week that post got more traffic than my server could handle. I believe it struck a chord with experienced software developers who know that the challenges they face now are not like the challenges they prepared for in school.

Although I completely agree that “algorithmic wizardry” is over-rated in general, my personal experience has been a little different. My role on projects has frequently been to supply a little bit of algorithmic wizardry. I’ve often been asked to look into a program that is taking too long to run and been able to speed it up by an order of magnitude or two by improving a numerical algorithm. (See an example here.)

James Hague says that “rarely is there some … algorithm that casts a looming shadow over everything else.” I believe he is right, though I’ve been called into projects precisely on those rare occasions when an algorithm does cast a shadow over everything else.

Unix-like shells on Windows

This post gives some notes on ways to create a Unix-like command line experience on Windows, without using a virtual machine like VMWare or a quasi-virtual machine like Cygwin.

Finding Windows ports of Unix utilities is easy. The harder part is finding a shell that behaves as expected. (Of course “as expected” depends on your expectations!)

There have been many projects to port Unix utilities to Windows, particularly GnuWin32 and Gow. Some of the command shells I’ve tried are:

  • Cmd
  • PowerShell
  • Eshell
  • Bash
  • Clink

I’d recommend the combination of Gow and Clink for most people. If you’re an Emacs power user you might like Eshell.

Cmd

The built-in command line on Windows is cmd. It’s sometimes called the “DOS prompt” though that’s misleading. DOS died two decades ago and the cmd shell has improved quite a bit since then.

cmd has some features you might not expect, such as pushd and popd. However, I don’t believe it has anything analogous to dirs to let you see the directory stack.

PowerShell

PowerShell is a very sophisticated scripting environment, but the interactive shell itself (e.g. command editing functionality) is basically cmd. (I haven’t kept up with PowerShell and that may have changed.) This means that writing a PowerShell script is completely different from writing a batch file, but the experience of navigating the command line is essentially the same as cmd.

Eshell

You can run shells inside Emacs. By default, M-x shell brings up a cmd prompt inside an Emacs buffer. You can also use Emacs’ own shell with the command M-x eshell.

Eshell is a shell implemented in Emacs Lisp. Using Eshell is very similar across platforms. On a fresh Windows machine, with nothing like Gow installed, Eshell provides some of the most common Unix utilities. You can use the which command to see whether you’re using a native executable or Emacs Lisp code. For example, if you type which ls into Eshell, you get the response

    eshell/ls is a compiled Lisp function in `em-ls.el'

The primary benefit of Eshell is that provides integration with Emacs. As the documentation says

Eshell is not a replacement for system shells such as bash or zsh. Use Eshell when you want to move text between Emacs and external processes …

Eshell does not provide some of the command editing features you might expect from bash. But the reason for this is clear: if you’re inside Emacs, you’d want to use the full power of Emacs editing, not the stripped-down editing features of a command line. For example, you cannot use ^foo^bar to replace foo with bar in the previous command. Instead, you could retrieve the previous command and edit it just as you edit any other line inside Emacs.

In bash you can use !^ to recall the first argument of the previous command and !$!$ using $_ instead. Many of the other bash shortcuts that begin with ! work as expected: !foo, !!, !-3, etc. Directory navigation commands like cd -, pushd, and popd work as in bash.

Bash

Gow comes with a bash shell, a Windows command line program that creates a bash-like environment. I haven’t had much experience with it, but it seems to be a faithful bash implementation with few compromises for Windows, for better and for worse. For example, it doesn’t understand backslashes as directory separators.

There are other implementations of bash on Windows, but I either haven’t tried them (e.g. win-bash) or have had bad experience with them (e.g. Cygwin).

Clink

Clink is not a shell per se but an extension to cmd. It adds the functionality of the Gnu readline library to the Windows command line and so you can use all the Emacs-like editing commands that you can with bash: Control-a to move to the beginning of a line, Control-k to delete the rest of a line, etc.

Clink also gives you Windows-like behavior that Windows itself doesn’t provide, such as being able to paste text onto the command line with Control-v.

I’ve heard that Clink will work with PowerShell, but I was not able to make it work.

The command editing and history shortcuts beginning with ! mentioned above all work with Clink, as do substitutions like ^foo^bar.

Conclusion

In my opinion, the combination of Gow and Clink gives a good compromise between a Windows and Unix work environment. And if you’re running Windows, a compromise is probably what you want. Otherwise, simply run a (possibly virtual) Linux machine. Attempts to make Windows too Unix-like run down an uncanny valley where it’s easy to waste a lot of time.

Click to learn more about math and computing consulting

QR Codes and Percolation

Percolation theory looks at problems such as the probability of being able to traverse some region with random obstacles. It is motivated by problems such as modeling the flow of a fluid in a porous medium.

Here’s a percolation problem for QR codes: What is the probability that there is a path from one side of a QR code to the opposite side? How far across a QR code would you expect to be able to go? For example, the QR code below was generated from my contact information. It’s not possible to go from one side to the other, and the red line shows what I believe is the deepest path into the code from a side.

This could make an interesting programming exercise. A simple version would be to start with a file of bits representing a particular QR code and find the deepest path into the corresponding image.

The next step up would be to generate simplified QR codes, requiring certain bits to be set, such as the patterns in three of the four corners that allow a QR reader to orient itself.

The next step in sophistication would be to implement the actual QR encoding algorithm, including its error correction encoding, then use this to encode random data.

(Because of the error correction used by QR codes, you could scan the image above and your QR reader would ignore the red path. It would even work if a fairly large portion of the image were missing because the error correction introduces a lot of redundancy.)