Arbitrary precision math in gawk

The idea of using awk for any math beyond basic arithmetic is kinda strange, and yet it has some nice features.

Awk was designed for file munging, a task it does well with compact syntax. GNU awk (gawk) supports the original minimalist version of awk and adds more features. It supports arbitrary precision arithmetic by wrapping the GMP and MPFR libraries. (When I refer to “awk” in this post I mean the features common to awk and gawk.)

If you just want extended precision, it would be easier to use bc. But if you’ve written an awk script and want to do some extended precision calculation in place rather than in a different tool, you could take advantage of gawk’s extension.

Here’s a one-liner to compute π to 100 decimal places using gawk.

    gawk -M -v PREC=333 'BEGIN {printf("%0.100f\n", 4*atan2(1,1))}'

The -M flag tells gawk to use arbitrary precision arithmetic.

NB: Your version of gawk may not support arbitrary precision. If so, running the code above will give the following warning:

    gawk: warning: -M ignored: MPFR/GMP support not compiled in

The -v flag tells gawk you’re about to set a variable, namely PREC variable for precision. You could set PREC in the body of your code instead.

Precision is measured in bits, not digits, and so for 100 digits we require 333 bits [1].

Awk is line-oriented. There’s an implicit loop around an awk program that loops over every line of input files. But you can specify code to run before the loop in a BEGIN block, and code to run after the loop in an END block. Here we just need the BEGIN block, no loop and no END.

We compute π as 4 times the arctangent of 1. Awk has an atan2 function rather than a atan function. The function atan(z) returns an angle between -π/2 and π/2 whose tangent is z. The function atan2(y, x) returns an angle between -π and π, in the same quadrant as x and y, whose tangent is y/x. The atan2 function is more general than atan since atan(z) equals atan2(z, 1). This is one advantage gawk has over bc, since bc has only an atan function (which it calls a).

***

[1] Where did 333 bits come from? It’s log2 10100, rounded up to the next integer. You could compute the logarithm in awk using

    awk 'BEGIN {print 100*log(10)/log(2)}'

Note that this uses awk; you don’t need gawk for this calculation. Awk doesn’t have a function to compute log base 2, only natural logs. So you use the fact that

log2(x) = log(x)/log(2).

Unicode arrows: math versus emoji

I used the character ↔︎︎ (U+2194) in a blog post recently and once again got bit by the giant pawn problem. That’s my name for when a character intended to be rendered as text is surprisingly rendered as an emoji. I saw

f (emoji <->) f dx dy dz

when what I intended was

f <-> f dx dy dz

I ran into the same problem a while back in my post on modal logic and security. The page contained

when I intended

This example is more surprising because → (U+2192) is rendered as text while ↔︎ (U+2194) is rendered as an image. This seems capricious. Why are some symbols rendered as text while other closely-related symbols are not? The reason is that some symbols are interpreted as emoji and some are not. You can find a full list of text characters that might be hijacked as emoji here.

One way to fix this problem is to use the Unicode “variation selector” U+FE0E to tell browsers that you’d like your character interpreted as text. You can also use the variation selector U+FE0F if you really do want a character displayed as an emoji. See this post for details.

(I use U+FE0E several times in this post. Hopefully your browser or RSS reader is rendering it correctly. If it isn’t, please let me know.)

Update: This post looks terrible in the RSS reader on my phone because the reader ignores the variation selectors. See the comments for a screenshot. It renders as intended in my browser, desktop or mobile.

It’s not always convenient or even possible to use variation selectors, and in that case you can simply use use different symbols.

Here are four Alternatives to ↔︎ (U+2194):

  • ⟷ (U+27F7)
  • ⇔ (U+21D4)
  • ⟺ (U+27FA)
  • ⇆ (U+21C6)

There is no risk of these characters being interpreted as emoji, and their intent may be clearer. In the two examples above, I used a double-headed arrow to mean two different things.

In the first example I wanted to convey that the Hodge operator takes the left side to the right and the right side to the left. Maybe ⇆ (U+21C6) would have been a better choice. In the second example, maybe it would have been clearer if I had used ⇒ (U+21D2) and ⇔ (U+21D4) rather than → (U+2192) and ↔︎ (U+2194).

Another math symbol that is unfortunately turned into an emoji is ↪︎ (U+21AA). In TeX this is \hookrightarrow. I used this symbol quite a bit in grad school to indicate that one space embeds in another. I don’t know of a good alternative to this one. You could use ⤷ (U+2937), though this looks kinda odd.

The Orange Book

I was spelunking around in Unicode and saw that there’s an emoji for orange book, U+1F4D9.

Orange Book emoji Linux

As is often the case, the emoji renders differently in different contexts. The image above is from my Linux desktop and the image below is from my Macbook. I tried created an image on my Windows box but it didn’t have a glyph for this character.

Orange book emoji Apple

When I saw “orange book” my first thought was the US Department of Defense publication Trusted Computer System Evaluation Criteria (TCSEC), aka DoDD 5200.28-STD, commonly know in security circles as the Orange Book.”For a split second I thought “Is this book so famous that there’s an emoji for it?!” But of course that’s not the case.

There are emoji for several colors of books, and there are several books in the DOD “Rainbow Series” publications. such as the Green Book for password management. (There’s also an emoji for green book: U+1F407).

TCSEC

So what is The Orange Book? Here’s a photo of the cover.

Department of Defense Trusted Computer System Evaluation Criteria

Here’s how Bruce Schneier describes the book in Applied Cryptography.

The NCSC publishes the infamous “Orange Book.” It’s actual title is the Department of Defense Trusted Computer System Evaluation Criteria, but that’s a mouthful to say and the book has an orange cover. The Orange Book attempts to define security requirements, gives computer manufacturers an objective way to measure the security of their systems, and guides them as to what to build into their secure products. It focuses on computer security and doesn’t really say a lot about cryptography.

The Orange Book is now deprecated in favor of The Common Criteria for Information Technology Security Evaluation, ISO/IEC 15408.

Related posts

What does rotating a matrix do to its determinant?

This post will look at rotating a matrix 90° and what that does to the determinant. This post was motivated by the previous post. There I quoted a paper that had a determinant with 1s in the right column. I debated rotating the matrix so that the 1s would be along the top because that seems more natural, but in the end I kept the original notation. If I had rotated the matrix, I’d need to adjust the value of the determinant.

Rotating a matrix is a an unusual operation. When someone speaks of turning a matrix over, they usually mean taking the transpose; taking the transpose is a reflection, not a rotation. And when you see the words “rotation” and “matrix” in close proximity, the topic is a matrix that represents a rotation. That’s not what this post is about: the matrix isn’t performing a rotation; a rotation is being performed on it.

Here’s an example:

\begin{pmatrix} a & b & c \\ d & e & f \\ g & h & i \end{pmatrix} \rightarrow \begin{pmatrix} c & f & i \\ b & e & h \\ a & d & g \end{pmatrix}

Using subscripts makes the example above a little harder to read but much easier to formalize.

\begin{pmatrix} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \end{pmatrix} \rightarrow \begin{pmatrix} a_{13} & a_{23} & a_{33} \\ a_{12} & a_{22} & a_{32} \\ a_{11} & a_{21} & a_{31} \end{pmatrix}

If we denote a matrix by A and its image after rotation as A‘, then the components a‘ of A‘ are related to the components a of A by

a'_{ij} = a_{j, n-i+1}

To form the rotation of A, we take the transpose of A, then reverse the order of the rows. In matrix notation,

A' = JA^T

where J is the matrix that has 1’s on the secondary diagonal, the diagonal running southwest to northeast, and 0’s everywhere else. The matrix J is the rotation of the identity matrix I.

The determinant of A‘ is the determinant of J times the determinant of AT, and the determinant of AT is the same as the determinant of A.

\det A' = \det J \det A^T = \det J \det A

So the last thing we need to do to compute the determinant of A‘ is to compute the determinant of J.

We can turn I into J, and vice versa, by reversing the order of the rows. Let’s do this one pair of rows at a time. We swap the first and last rows. Then we swap the second row with the second to last row, then swap the third row from the top with the third row from the bottom, etc.

Suppose I is an n × n matrix, and first assume n is even. Then we need to do n/2 swaps, swapping each of the top n/2 rows with one of the bottom n/2 rows. Each swap reverses the sign of the determinant, and so the determinant of J is (-1)n/2. Now if n is odd, the middle row doesn’t move. So we swap the top (n – 1)/2 rows with the bottom (n – 1)/2 rows. So in that case the determinant of J is (-1)(n-1)/2. We can combine the even and the odd case using floor notation:

\det A' = (-1)^{\lfloor n/2\rfloor} \det A

Area of a triangle in the complex plane

I recently ran across an elegant equation for the area of a triangle in the complex plane with vertices at z1, z2, and z3. [1].

A(z_1, z_2, z_3) = \frac{i}{4} \, \left| \begin{matrix} z_1 & \bar{z}_1 & 1 \\ z_2 & \bar{z}_2 & 1 \\ z_3 & \bar{z}_3 & 1 \\ \end{matrix} \right|

This formula gives the signed area: the area is positive if the points are given in countclockwise order and negative otherwise.

I’ll illustrate the formula with a little Python code. Let’s generate a random triangle.

    import numpy as np

    np.random.seed(20221204)
    r = 100*np.random.random(6)
    z1 = r[0] + 1j*r[1]
    z2 = r[2] + 1j*r[3]
    z3 = r[4] + 1j*r[5]

Here’s what our triangle looks like plotted.

Now let’s calculate the area using the formula above and using Heron’s formula.

    
    def area_det(z1, z2, z3):
        det = 0
        det += z2*z3.conjugate() - z3*z2.conjugate()
        det -= z1*z3.conjugate() - z3*z1.conjugate()
        det += z1*z2.conjugate() - z2*z1.conjugate()
        return 0.25j*det
    
    def area_heron(z1, z2, z3):
        a = abs(z1-z2)
        b = abs(z2-z3)
        c = abs(z3-z1)
        s = 0.5*(a + b + c)
        return np.sqrt(s*(s-a)*(s-b)*(s-c))
        
    print(area_heron(z1, z2, z3))
    print(area_det(z1, z2, z3))

This prints -209.728 and 209.728. The determinate gives a negative area because it was given the points in clockwise order.

[1] Philip J. Davis. Triangle Formulas in the Complex Plane. Mathematics of Computation. January 1964.

Finding a vector potential whose curl is given

The previous post showed that if a vector field F over a simply connected domain has zero curl, then there is a potential function φ whose gradient is F.

An analogous result says that if the vector field F has zero divergence, again over a simply connected domain, then there is a vector potential Φ whose curl is F.

These are both special cases of Poincaré’s lemma.

This post will outline how to calculate Φ. First of all, Φ is far from unique. Any vector field with zero curl can be added to Φ without changing its curl. So if

Φ = (Φ1, Φ2, Φ3)

then we can assume that one of the components, say Φ3, is zero by adding the right curl-free component. If you find that argument less than convincing, look at it this way: we’re going to solve a harder problem than simply find Φ such that ∇×Φ = F by giving ourselves the additional requirement that the last component of Φ must be zero.

Now if Φ = (Φ1, Φ2, 0) and ∇×Φ = F, then

-∂z Φ2= F1
z Φ1 = F2
x Φ2 – ∂y Φ1 = F3

We can solve the first equation by integrating F1 with respect to z and adding a function of x and y to be determined later. We can solve the second equation similarly, then use the third equation to determine the functions of x and y left over from solving the first two equations.

***

This post is the third, and probably last, in a series of posts looking at vector calculus from a more advanced perspective.  The first post in the series looked at applying {grad, curl, div} to {grad, curl, div}: seeing which combinations are defined, and which combinations are always 0. To illustrate the general pattern, the post dips into differential forms and the Hodge star operator.

The second post looked at finding the (scalar) potential function for a curl-free vector field, and mentions connections to topology and differential equations.

The present post is a little terse, but it makes more sense if you’ve gone through the previous post. The method of solution here is analogous to the method in the previous post, and that post goes into a little more detail.

Conservative vector fields

The previous post discussed the fact that the curl of a divergence is zero. The converse is also true (given one more requirement): a vector field F is the gradient of some potential φ function if ∇×F = 0. In that case we say F is a conservative vector field.

It’s clear that having zero curl is a necessary condition for a vector field to be the gradient of a potential. It’s not as clear that having zero curl is sufficient. And in fact, it’s not quite sufficient: it’s sufficient over a simply connected domain. This means a domain with no holes: any loop you draw in the domain can be continuously deformed to a point without leaving the domain. If you’re working with vector fields defined everywhere on ℝ³ then you don’t have to worry about this condition because ℝ³ is simply connected.

Aside on topology

For a calculus student, the requirement that a domain be simply connected is a footnote. For a topologist, it’s the main thing.

If a domain is not simply connected, then a vector field might have curl zero, but not be the gradient of a potential. So in general we have two kinds of vector fields with zero curl: those that are gradients and those that are not. We could look at the space of all vector fields that have zero curl and mod out by the vector fields that are gradients. The dimension of this space tells us something about how the domain is connected. This is the start of de Rham cohomology.

Calculating the potential function

If a vector field is conservative, i.e. if it is the gradient of a potential function φ, then you can find φ by integration.

The partial derivative of φ with respect to x is the first component of your vector field, so you can integrate to find φ as a function of x (and y and z). This integral will only be unique up to a constant, and functions of y and z alone are constants as far as partial derivatives with respect to x are concerned.

Now the partial derivative of φ with respect to y has to equal the second component of your vector field. So taking the derivative of what you found above determines your potential, up to a function of z. So then you differentiate again, this time with respect to z, and set this equal to the third component of your vector field, you’ve determined your potential function up to an constant. And that’s as far as it can be determined: any constant term goes away when you take the gradient.

Exact differential equations

A differential equation of the form

M(x, y) + N(x, y) y‘(x) = 0

is said to be exact if the partial derivative of M with respect to y equals the partial of N with respect to x. In that case you can find a function φ such that the partial of φ with respect to x is M and the partial of φ with respect to y is N (assuming you’re working in a simply connected domain). This function φ is a potential, though differential equation texts don’t call it that, and you find it just as you found φ above. The solution to your differential equation is given (implicitly) by

φ(x, y) = c

where c is a constant to be determined by initial conditions.

Poincaré’s lemma

For a vector field over a simply connected domain, having zero curl is necessary and sufficient for the existence of a potential function φ. This is a case of Poincaré’s lemma. The next post will look at another case of Poincaré’s lemma, finding a vector potential.

{div, grad, curl} of a {div, grad, curl}

The various combinations of divergence, gradient, and curl are confusing to someone seeing them for the first time, and even for someone having seen them many times. Is the divergence of a curl zero or is it the divergence of a gradient that’s zero? And there’s another one, Is it curl of curl or or curl of grad that’s zero?

It’s a mess that’s hard to sort out without pulling out differential forms. This post will show how a calculus student could make some sense out of all this, and how differential forms clarify the situation further.

Vector calculus perspective

We’ll start out looking at things from the perspective of a calculus student. We can make a table of all nine possible combinations of {grad, curl, div} applied to a {grad, curl, div} and start by asking which combinations make sense.

Gradient is something that takes a scalar function and returns a vector field. Curl takes a vector field and returns another vector field. Divergence takes a vector field and returns a scalar function. This means that only five of our nine combinations are even defined.

It turns out that the divergence of a curl is zero, and the curl of a gradient is zero (the zero vector). The other three possibilities are defined, but they are not zero in general.

So we can extend our chart to include the zeros.

Plain text version of chart image included at the bottom of the post.

Differential form perspective

From the perspective of differential forms, a scalar function f is a 0-form.

The differential of a 0-form is a 1-form and corresponds to the gradient.

The differential of a 1-form is a 2-form and corresponds to curl.

The differential of a 2-form is a 3-form and corresponds to divergence.

The differential of a differential is 0: d² = 0. This holds for k forms in general, for any non-negative integer k. So the curl of a gradient is 0 and the divergence of a curl is 0.

Gradients are 1-forms and curls are 2-forms. They’re different kinds of things. Vector calculus hides this distinction, which initially makes things simpler but ultimately makes things harder.

Now what about the three possibilities marked with question marks in the table above: the divergence of a gradient, the curl of a curl, and the gradient of a divergence?

From the perspective of differential forms, these are illegal operations. You cannot take the divergence of a gradient, because divergence operates on 2-forms, and a gradient is a 1-form. Similarly, you cannot take the curl of a curl or the gradient of a divergence. You could think of differential forms as adding type-checking to vector calculus.

But operations like taking the divergence of a gradient are legal in vector calculus. What gives?

Hodge star

The Hodge star operator is a duality between k-forms and (nk)-forms. In vector calculus n = 3, and so the Hodge star takes 0-forms to 3-forms and 3-forms to 0-forms.

f ︎↔︎ f dx dy dz

It also takes 1-forms to 2-forms and 2-forms to 1-forms.

f dx + g dy + h dz ︎↔︎ f dy dz + g dz dx + h dx dy.

You can’t take the divergence of a gradient of a function f, but you can translate the 1-form df represents into the 2-form *df via the Hodge operator, then take the divergence of that. This gives you a 3-form d*df, which you can translate to a 0-form by applying * once more to get *d*df. So the Laplacian, defined to be the divergence of the gradient in vector calculus, is *d*d in the language of differential forms.

Curl takes 1-forms to 2-forms, so you can’t take the curl of a curl. But you can turn a curl into a 1-form via the Hodge operator and then take the curl of that. And while you can’t take the divergence of a gradient, you can take the divergence of the Hodge operator applied to a gradient.

In vector calculus the Hodge operator is invisible. Making it visible explains why some combinations of operators always result in zeros and some do not: some identities follow from the general identity d² = 0, but operations requiring a Hodge operator are not zero in general.

The Hodge star operator is not so simple in general as it is in Euclidean space. On a Riemann manifold the Hodge operator is defined in terms of the metric. Defining the Laplace operator as *d*df extends to Riemann manifolds, but defining it as the sum of second partial derivatives will not.

Related posts

Plain text chart:

|------+------+------+-----|
|      | grad | curl | div |
|------+------+------+-----|
| grad | NA   | NA   | ?   |
| curl | 0    | ?    | NA  |
| div  | ?    | 0    | NA  |
|------+------+------+-----|

Arabic numerals and numerals that are Arabic

The characters 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9 are called Arabic numerals, but there are a lot of other numerals that are Arabic.

I discovered this when reading the documentation on Perl regular expressions, perlre. Here’s the excerpt from that page that caught my eye.

Many scripts have their own sets of digits equivalent to the Western 0 through 9 ones. A few, such as Arabic, have more than one set. For a string to be considered a script run, all digits in it must come from the same set of ten, as determined by the first digit encountered.

Emphasis added.

I took some code I’d written for previous posts on Unicode numbers and modified it to search the range of Arabic Unicode characters and report all characters that represent 0 through 9.

from unicodedata import numeric, name

a = set(range(0x00600, 0x006FF+1)) | \
    set(range(0x00750, 0x0077F+1)) | \
    set(range(0x008A0, 0x008FF+1)) | \
    set(range(0x00870, 0x0089F+1)) | \
    set(range(0x0FB50, 0x0FDFF+1)) | \
    set(range(0x0FE70, 0x0FEFF+1)) | \
    set(range(0x10EC0, 0x10EFF+1)) | \
    set(range(0x1EE00, 0x1EEFF+1)) | \
    set(range(0x1EC70, 0x1ECBF+1)) | \
    set(range(0x1ED00, 0x1ED4F+1)) | \
    set(range(0x10E60, 0x10E7F+1)) 

f = open('digits.txt','w',encoding='utf8')

def uni(i):
    return "U+" + format(i, "X")

for i in sorted(a):
    ch = chr(i)
    if ch.isnumeric() and numeric(ch) in range(10):
        print(ch, uni(i), numeric(ch), name(ch), file=f)

Apparently there are two ways to write 0, eight ways to write 2, and seven ways to write 1, 3, 4, 5, 6, 7, 8, and 9. I’ll include the full results at the bottom of the post.

I first wrote my Python script to write to the command line and redirected the output to a file. This resulted in some of the Arabic characters being replaced with a blank or with 0. Then I changed the script as above to write to a file opened to receive UTF-8 text. All the characters were preserved, though I can’t see most of them because the font my editor is using doesn’t have glyphs for the characters outside the BMP (i.e. those with Unicode values above 0xFFFF).

Related posts

٠ U+660 0.0 ARABIC-INDIC DIGIT ZERO
١ U+661 1.0 ARABIC-INDIC DIGIT ONE
٢ U+662 2.0 ARABIC-INDIC DIGIT TWO
٣ U+663 3.0 ARABIC-INDIC DIGIT THREE
٤ U+664 4.0 ARABIC-INDIC DIGIT FOUR
٥ U+665 5.0 ARABIC-INDIC DIGIT FIVE
٦ U+666 6.0 ARABIC-INDIC DIGIT SIX
٧ U+667 7.0 ARABIC-INDIC DIGIT SEVEN
٨ U+668 8.0 ARABIC-INDIC DIGIT EIGHT
٩ U+669 9.0 ARABIC-INDIC DIGIT NINE
۰ U+6F0 0.0 EXTENDED ARABIC-INDIC DIGIT ZERO
۱ U+6F1 1.0 EXTENDED ARABIC-INDIC DIGIT ONE
۲ U+6F2 2.0 EXTENDED ARABIC-INDIC DIGIT TWO
۳ U+6F3 3.0 EXTENDED ARABIC-INDIC DIGIT THREE
۴ U+6F4 4.0 EXTENDED ARABIC-INDIC DIGIT FOUR
۵ U+6F5 5.0 EXTENDED ARABIC-INDIC DIGIT FIVE
۶ U+6F6 6.0 EXTENDED ARABIC-INDIC DIGIT SIX
۷ U+6F7 7.0 EXTENDED ARABIC-INDIC DIGIT SEVEN
۸ U+6F8 8.0 EXTENDED ARABIC-INDIC DIGIT EIGHT
۹ U+6F9 9.0 EXTENDED ARABIC-INDIC DIGIT NINE
 U+10E60 1.0 RUMI DIGIT ONE
 U+10E61 2.0 RUMI DIGIT TWO
 U+10E62 3.0 RUMI DIGIT THREE
 U+10E63 4.0 RUMI DIGIT FOUR
 U+10E64 5.0 RUMI DIGIT FIVE
 U+10E65 6.0 RUMI DIGIT SIX
 U+10E66 7.0 RUMI DIGIT SEVEN
 U+10E67 8.0 RUMI DIGIT EIGHT
 U+10E68 9.0 RUMI DIGIT NINE
 U+1EC71 1.0 INDIC SIYAQ NUMBER ONE
 U+1EC72 2.0 INDIC SIYAQ NUMBER TWO
 U+1EC73 3.0 INDIC SIYAQ NUMBER THREE
 U+1EC74 4.0 INDIC SIYAQ NUMBER FOUR
 U+1EC75 5.0 INDIC SIYAQ NUMBER FIVE
 U+1EC76 6.0 INDIC SIYAQ NUMBER SIX
 U+1EC77 7.0 INDIC SIYAQ NUMBER SEVEN
 U+1EC78 8.0 INDIC SIYAQ NUMBER EIGHT
 U+1EC79 9.0 INDIC SIYAQ NUMBER NINE
 U+1ECA3 1.0 INDIC SIYAQ NUMBER PREFIXED ONE
 U+1ECA4 2.0 INDIC SIYAQ NUMBER PREFIXED TWO
 U+1ECA5 3.0 INDIC SIYAQ NUMBER PREFIXED THREE
 U+1ECA6 4.0 INDIC SIYAQ NUMBER PREFIXED FOUR
 U+1ECA7 5.0 INDIC SIYAQ NUMBER PREFIXED FIVE
 U+1ECA8 6.0 INDIC SIYAQ NUMBER PREFIXED SIX
 U+1ECA9 7.0 INDIC SIYAQ NUMBER PREFIXED SEVEN
 U+1ECAA 8.0 INDIC SIYAQ NUMBER PREFIXED EIGHT
 U+1ECAB 9.0 INDIC SIYAQ NUMBER PREFIXED NINE
 U+1ECB1 1.0 INDIC SIYAQ NUMBER ALTERNATE ONE
 U+1ECB2 2.0 INDIC SIYAQ NUMBER ALTERNATE TWO
 U+1ED01 1.0 OTTOMAN SIYAQ NUMBER ONE
 U+1ED02 2.0 OTTOMAN SIYAQ NUMBER TWO
 U+1ED03 3.0 OTTOMAN SIYAQ NUMBER THREE
 U+1ED04 4.0 OTTOMAN SIYAQ NUMBER FOUR
 U+1ED05 5.0 OTTOMAN SIYAQ NUMBER FIVE
 U+1ED06 6.0 OTTOMAN SIYAQ NUMBER SIX
 U+1ED07 7.0 OTTOMAN SIYAQ NUMBER SEVEN
 U+1ED08 8.0 OTTOMAN SIYAQ NUMBER EIGHT
 U+1ED09 9.0 OTTOMAN SIYAQ NUMBER NINE
 U+1ED2F 2.0 OTTOMAN SIYAQ ALTERNATE NUMBER TWO
 U+1ED30 3.0 OTTOMAN SIYAQ ALTERNATE NUMBER THREE
 U+1ED31 4.0 OTTOMAN SIYAQ ALTERNATE NUMBER FOUR
 U+1ED32 5.0 OTTOMAN SIYAQ ALTERNATE NUMBER FIVE
 U+1ED33 6.0 OTTOMAN SIYAQ ALTERNATE NUMBER SIX
 U+1ED34 7.0 OTTOMAN SIYAQ ALTERNATE NUMBER SEVEN
 U+1ED35 8.0 OTTOMAN SIYAQ ALTERNATE NUMBER EIGHT
 U+1ED36 9.0 OTTOMAN SIYAQ ALTERNATE NUMBER NINE

Telescopes, awk, and learning

Here’s a quote I think about often:

“It is faster to make a four-inch mirror and then a six-inch mirror than to make a six-inch mirror.” — Bill McKeenan, Thompson’s law of telescopes

If your goal is to make a six-inch mirror, why make a four-inch mirror first? From a reductionist perspective this makes no sense. But when you take into account how people learn, it makes perfect sense. The bigger project is more likely to succeed after you learn more about mirror-making in the context of a smaller project.

Awk

I was thrilled to discover the awk programming language in college. Munging files with little awk scripts was at least ten times easier than writing C programs.

When I told a friend about awk, he said “Have you seen Perl? It’ll do everything awk does and a lot more.”

If you want to learn Perl, I expect it would be faster to learn awk and then Perl than to learn Perl. I think I would have been intimidated by Perl if I’d tried to learn it first. But thinking of Perl as a more powerful awk made me more willing to try it. Awk make my life easier, and Perl had the potential to make it even easier. I’m not sure whether learning Perl was a good idea—that’s a discussion for another time—but I did.

C

I also learned C before learning C++. That was beneficial for similar reasons, starting with the four-inch mirror version of C++ before going on to the six-inch version.

Many people have said that learning C before C++ is a bad idea, that it teaches bad habits, and that it would be better to learn (modern) C++ from the beginning. That depends on what the realistic alternative is. Maybe if you attempted to learn C++ first you’d be intimidated and give up. As with giving up on learning Perl, giving up on learning C++ might be a good idea. At the time, however, learning C++ was a good move. Knowing C++ served me well when I left academia.

Learning on your own

Teaching yourself something requires different tactics than learning something in a classroom. The four-inch mirror warmup is more important when you’re learning on your own.

If I were teaching a course on C++, I would not teach C first. The added structure of a classroom makes it easier to learn C++ directly. The instructor can pace students through the material so as to avoid the intimidation they might face if they were attempting to learn C++ alone. Students don’t become overwhelmed and give up because they have the accountability of homework assignments etc. Of course some students will give up, but more would give up without the structure of a class.

Top-down vs bottom-up

From a strictly logical perspective, it’s most efficient to learn the most abstract version of a theorem first. But this is bad pedagogy. The people who are excited about the efficiency of compressing math this way, e.g. Bourbaki, learned what they know more concretely and incrementally, and think in hindsight that the process could be shortened.

It does save time to present things at some level of generality. However, the number of steps you can go up the abstraction ladder at a time varies by person. Some people might need to go one rung at a time, some could go two at a time or maybe three, but everyone has a limit. And you can take bigger steps when you have a teacher, or even better a tutor, to guide you and to rescue you if you try to take too big of a step.

You typically understand something better, and are more able to apply it, when you learn it bottom-up. People think they can specialize more easily than they can generalize, but the opposite is usually true. It’s easier to generalize from a few specific examples than to realize that a particular problem is an instance of a general pattern.

I’ve noticed this personally, and I’ve noticed it in other people. On Twitter, for example, I sometimes post a general and a concrete version of a theorem, and the more concrete version gets more engagement. The response to a general theorem may be “Ho hum. Everybody knows that.” but the response to a particular application may be “Wow, I never thought of that!” even when the latter is a trivial consequence of the former.

Related posts