Unified config files

Posted on 6 May 2026 by John

I try to maintain a consistent work environment across computers that I use. The computers differ for important reasons, but I’d rather they not differ for unimportant reasons.

Unified keys

One thing I do is remap keys so that the same key does the same thing everywhere, to the extent that’s practical. This requires remapping keys. In particular, I want the key functionality, not the key name, to be the same across operating systems. For example, the Command key on a Mac does what the Control key does on Windows and Linux. I have my machines set up so the key in the lower left corner, whatever you call it, does things like copy, paste, and cut.

Unified config files

I use bash everywhere even though zshell is the default on Mac OS. But Linux and MacOS are sufficiently different that I have two .bashrc files, one for each OS. However, both .bashrc files source a common file .bash_aliases for aliases. You can set that up by putting the following code in your .bashrc file.

if [ -f ~/.bash_aliases ]; then
    . ~/.bash_aliases
fi

Sometimes you need some kind of branching logic even for two machines running the same OS. For example, if you have two computers, named Castor and Pollux, you might have code like the following in your .bashrc.

HOSTNAME_SHORT=$(hostname -s)
if [ "$HOSTNAME_SHORT" = "Castor" ]; then
...
elif [ "$HOSTNAME_SHORT" = "Pollux" ]; then
...

One problem with using hostname is that the OS can change the name on you. At least MacOS will do this; I don’t know whether other operating systems will. If you give your machine an uncommon name then this is unlikely to happen.

I have one init.el file for Emacs. It contains some branching logic for OS:

(cond
    ((string-equal system-type "windows-nt") ; Microsoft Windows
     ...)
    ((string-equal system-type "gnu/linux") ; linux
     ...)
    ((string-equal system-type "darwin") ; Mac
     ...)
)

You can also branch on hostname.

(if (string-equal (system-name) "Castor")
...)
(if (string-equal (system-name) "Pollux")
...)

This isn’t difficult, but in the short run it’s easier to just make one ad hoc edit to a config file at a time, letting the files drift apart. Along those lines, see bicycle skills.

The mythology of category theory

Posted on 6 May 2026 by John

Yesterday a friend and I had a conversation about category theory, how it can be a useful pattern description language, but also about how people have unrealistic expectations for it, believing category theory can deliver something for nothing.

Later I ran across the following post from Qiaochu Yuan. It felt as if he had overheard my conversation and summarized it in a tweet:

category theory is just some straightforwardly useful stuff for some purposes in some fields! you can elegantly simplify and streamline some proofs. then there is the mythology of category theory, which is some other thing entirely, mostly wishful thinking and projection afaict

His phrase “the mythology of category theory” gives a name to this idea that category theory can deliver specific outputs without specific inputs. It helps to distinguish CT as a scrapbook of patterns from CT as sorcery.

Changing one character in a PDF

Posted on 5 May 2026 by John

I saw a post on X saying

Changing a hyphen to an en-dash increases your PDF file size by ~10 bytes.

My first thought was that it had something to do with hyphen being an ASCII character and an en-dash not. Changing a hyphen to an en-dash would make a UTF-8 encoded text file a couple bytes longer. (See why here.) Maybe adding one non-ASCII character could cause the file to include a glyph it didn’t before.

I did a couple experiments. I made a minimal LaTeX file with only the text

    See pages 9-10.

and another with

    See pages 9--10.

(In LaTeX, a hyphen compiles to a hyphen and two hyphens compile to an en-dash.)

I compiled both files using pdflatex. The PDF with the hyphen was 13172 bytes and the one with the en-dash was 13099 bytes. Replacing the hyphen with the en-dash made the file 73 bytes smaller.

I repeated the experiment with a Libre Office ODT document. Changing the hyphen to an en-dash reduced the file size from 13548 bytes to 13514 bytes.

Then I went back to LaTeX and pasted a paragraph of lorem ipsum text into both files. Now the file with the hyphen produced a 18131 byte PDF and the file with the en-dash produced a 18203 byte PDF. So in this instance changing the hyphen to an en-dash increased the size of the file by 72 bytes.

After adding the lorem ipsum text to the ODT files, the PDF resulting from the file with the hyphen was 23 bytes larger, 17846 bytes versus 17823 bytes.

PDFs are complicated. Changing one character can make the file bigger or smaller, by an unpredictable amount. It depends, among other things, on what software was used to create the PDF. Even changing one letter in an all-ASCII text can change the size of the PDF, I suppose due to some internal text compression and aesthetic control characters. I don’t pretend to understand what’s going on inside a PDF.

The shape of a guitar pick

Posted on 3 May 2026 by John

I saw a post on X that plotted the function

(log x)² + (log y)² = 1.

Of course the plot of

x² + y² = 1

is a circle, but I never thought what taking logs would do to the shape.

Here’s what the contours look like setting the right hand side equal to 1, 2, 3, …, 10.

ContourPlot[Log[x]^2 + Log[y]^2, {x, 0, 10}, {y, 0, 10}, 
    Contours -> Range[10]]

The dark blue contour near the origin reminded me of a guitar pick, so I decided to take a stab at creating an equation for the shape of a guitar pick.

I wanted to rotate the image so the axis of symmetry for the pick is vertical, so I replaced x and y with x + y and x − y.

The aspect ratio was too wide, so I experimented with

log(y + kx)² + log(y − kx)² = r²

where increasing k increases the height-to-width ratio. After a little experimentation I settled on k = 1.5 and r = 1.

This has an aspect ratio of roughly 5:4, which is about what I measured from a photo of a guitar pick.

Updating: refining the fit

After posting this article on X, Paul Graham replied with a photo of a Fender guitar pick with the image above overlaid. The fit was fairly good, but the aspect ratio wasn’t quite right.

So then I did a little research. The shape referred to in this post is known as the “351,” but even for the 351 shape the aspect ratio varies slightly between picks.

Setting k = 1.6 gives a better fit to Paul Graham’s pick.

The blue line represents my fit using k = 1.5 and the red line represents my fit using k = 1.6.

Approximating even functions by powers of cosine

Posted on 30 April 2026 by John

A couple days ago I wrote a post about turning a trick into a technique, finding another use for a clever way to construct simple, accurate approximations. I used as my example approximating the Bessel function J(x) with (1 + cos(x))/2. I learned via a helpful comment on Mathstodon that my approximation was the first-order part of a more general series

$J_0(x) = 1 + \frac{\cos(x) - 1}{2} - \frac{(\cos(x) - 1)^2}{48} + \frac{7(\cos(x) - 1)^3}{1440} + \cdots$

The first-order approximation has error O(x⁴), as shown in the earlier post. Adding the second-order term makes the error O(x⁶), and adding the third-order term makes it O(x⁸).

I’ve written a few times about cosine approximations to the normal probability density. For example, see this post. We could use the same idea as the series above to approximate the normal density with a series of powers of cosine. This gives us

$\exp(-x^2/2) = 1 + (\cos(x) - 1) + \frac{(\cos(x) - 1)^2}{3} + \frac{2(\cos(x) - 1)^3}{45} + \cdots$

and as before, the first, second, and third order truncated series have error O(x⁴), O(x⁶), and O(x⁸).

The general theory behind what’s going on here is an extension of Bürmann’s theorem. The original version of the theorem relies on a series inversion theorem that in turn relies on the approximating function, in our case cos(x) − 1, not having zero derivative at the center of the series. But there is a more general form of Bürmann’s theorem based on a more general form of series inversion. We will always need a more general version of the theorem when working with even functions because even functions have zero derivative at zero.

Here’s another example, this time using the Bessel function J₁, an odd function, which does use the original version of Bürmann’s theorem to approximate J₁ by powers of sine.

$J_1(x) = \frac{1}{2} \sin(x) + \frac{1}{48} \sin^3(x) + \frac{17}{1920} \sin^5(x) + \cdots$

In this case truncating the series after sin^k(x) gives an error O(x^{k + 2}).

You can find more on Bürmann’s theorem in Whittaker and Watson.

Three ways to differentiate ReLU

Posted on 30 April 2026 by John

When a function is not differentiable in the classical sense there are multiple ways to compute a generalized derivative. This post will look at three generalizations of the classical derivative, each applied to the ReLU (rectified linear unit) function. The ReLU function is a commonly used activation function for neural networks. It’s also called the ramp function for obvious reasons.

The function is simply r(x) = max(0, x).

Pointwise derivative

The pointwise derivative would be 0 for x < 0, 1 for x > 0, and undefined at x = 0. So except at 0, the pointwise derivative of the ramp function is the Heaviside function.

$H(x) = \left\{ \begin{array}{ll} 1 & \mbox{if } x \geq 0 \\ 0 & \mbox{if } x < 0 \end{array} \right.$
In a real analysis course, you’d simply say r′(x) =H(x) because functions are only defined up to equivalent modulo sets of measure zero, i.e. the definition at x = 0 doesn’t matter.

Distributional derivative

In distribution theory you’d identify the function r(x) with the distribution whose action on a test function φ is

$\langle r, \varphi \rangle = \int_{-\infty}^\infty r(x)\, \varphi(x) \, dx$

Then the derivative of r would be the distribution r′ satisfying

$\langle r^{\prime}, \varphi\rangle = -\langle r, \varphi^{\prime} \rangle$

for all smooth functions φ with compact support. You can prove using integration by parts that the above equals the integral of φ from 0 to ∞, which is the same as the action of H(x) on φ.

In this case the distributional derivative of r is the same as the pointwise derivative of r interpreted as a distribution. This does not happen in general [1]. For example, the pointwise derivative of H is zero but the distributional derivative of H is δ, the Dirac delta distribution.

For more on distributional derivatives, see How to differentiate a non-differentiable function.

Subgradient

The subgradient of a function f at a point x, written ∂f(x), is the set of slopes of tangent lines to the graph of f at x. If f is differentiable at x, then there is only one slope, namely f′(x), and we typically say the subgradient of f at x is simply f′(x) when strictly speaking we should say it is the one-element set {f′(x)}.

A line tangent to the graph of the ReLU function at a negative value of x has slope 0, and a tangent line at a positive x has slope 1. But because there’s a sharp corner at x = 0, a tangent at this point could have any slope between 0 and 1.

$\partial f(x) = \left\{ \begin{array}{cl} 1 & \text{if } x > 0 \\<br /> \left[0,1\right] & \text{if } x = 0 \\<br /> 0 & \text{if } x < 0 \end{array} \right.$

My dissertation was full of subgradients of convex functions. This made me uneasy because subgradients are not real-valued functions; they’re set-valued functions. Most of the time you can blithely ignore this distinction, but there’s always a nagging suspicion that it’s going to bite you unexpectedly.

[1] When is the pointwise derivative of f as a function equal to the derivative of f as a distribution? It’s not enough for f to be continuous, but it is sufficient for f to be absolutely continuous.

Turning a trick into a technique

Posted on 28 April 2026 by John

Someone said a technique is a trick that works twice.

I wanted to see if I could get anything interesting by turning the trick in the previous post into a technique. The trick created a high-order approximation by subtracting a multiple one even function from another. Even functions only have even-order terms, and by using the right multiple you can cancel out the second-order term as well.

For an example, I’d like to approximate the Bessel function J₀(x) by the better known cosine function. Both are even functions.

J₀(x) = 1 − x²/4 + x⁴/64 + …
cos(x) = 1 − x²/2 + x⁴/24 + …

and so

2 J₀(x) − cos(x) = 1 − x⁴/96 + …

which means

J₀(x) ≈ (1 + cos(x))/2

is an excellent approximation for small x.

Let’s try this for a couple examples.

J₀(0.2) = 0.990025 and (1 + cos(0.2))/2 = 0.990033.

J₀(0.05) = 0.99937510 and (1 + cos(0.05))/2 = 0.99937513.

Circular arc approximation

Posted on 28 April 2026 by John

Suppose you have an arc a, a portion of a circle of radius r, and you know two things: the length c of the chord of the arc, and the length b of the chord of half the arc, illustrated below.

Here θ is the central angle of the arc. Then the length of the arc, rθ, is approximately

a = rθ ≈ 12 b²/(c + 4b).

If the arc is moderately small, the approximation is very accurate.

This approximation is simple, accurate, and not obvious, much like the one in this post

Derivation

Let φ = θ/4. Then the angle between the chords b and c is φ. This follows from the inscribed angle theorem, illustrated below.

There are two right triangles in the diagram above that have an angle φ: a smaller triangle with hypotenuse b and a larger triangle with hypotenuse 2r. From the smaller triangle we learn

cos(φ) = c / 2b

and from the larger triangle we learn

sin(φ) = b / 2r.

Now expand in power series.

c / 2b = cos(φ) = 1 − φ²/2! + φ⁴/4! − …
2b / a = sin(φ) / φ = 1 − φ²/3! + φ⁴/5! − …

If we multiply 2b / a by 3 and subtract c / 2b then the φ² terms cancel out and we get

6b / a − c / 2b = 2 − φ⁴/60 + …

and so

6b / a − c / 2b ≈ 2

to a very high degree of accuracy when φ is small. The approximation follows by solving for a.

Example

Let θ = π/3 and so φ = 0.26…, not a particularly small value of φ, but small enough for the approximation to work well.

Set r = 1 so a = θ. Then

b = 2 sin(π/12) = 0.51764

and

c = 2b cos(π/12) = 1.

Now in application, we know b and c, not θ, and so pretend we measured b = 0.51764 and c = 1. Then we would approximate a by

12b²/(c + 4b) = 1.04718

while the exact value is 1.04720. Unless you can measure lengths to more than four significant figures, the approximation may has well be exact because approximation error would be less than measurement error.

[1] J. M. Bruce. Approximation to a Circular Arc. The American Mathematical Monthly. Vol. 49, No. 3 (March 1942), pp. 184–185

Closed-form solution to the nonlinear pendulum equation

Posted on 25 April 2026 by John

The previous post looks at the nonlinear pendulum equation and what difference it makes to the solutions if you linearize the equation.

If the initial displacement is small enough, you can simply replace sin θ with θ. If the initial displacement is larger, you can improve the accuracy quite a bit by solving the linearized equation and then adjusting the period.

You can also find an exact solution, but not in terms of elementary functions; you have to use Jacobi elliptic functions. These are functions somewhat analogous to trig functions, though it’s not helpful to try to pin down the analogies. For example, the Jacobi function sn is like the sine function in some ways but very different in others, depending on the range of arguments.

We start with the differential equation

θ″(t) + c² sin( θ(t) ) = 0

where c² = g/L, i.e. the gravitational constant divided by pendulum length, and initial conditions θ(0) = θ₀ and θ′(0) = 0. We assume −π < θ₀ < π.

Then the solution is

θ(t) = 2 arcsin( a cd(ct | m ) )

where a = sin(θ₀/2), m = a², and cd is one of the 12 Jacobi elliptic functions. Note that cd, like all the Jacobi functions, has an argument and a parameter. In the equation above the argument is ct and the parameter is m.

The last plot in the previous post was misleading, showing roughly equal parts genuine difference and error from solving the differential equation numerically. Here’s the code that was used to solve the nonlinear equation.

from scipy.special import ellipj, ellipk
from numpy import sin, cos, pi, linspace, arcsin
from scipy.integrate import solve_ivp

def exact_period(θ):
    return 2*ellipk(sin(θ/2)**2)/pi

def nonlinear_ode(t, z):
    x, y = z
    return [y, -sin(x)]    

theta0 = pi/3
b = 2*pi*exact_period(theta0)
t = linspace(0, 2*b, 2000)

sol = solve_ivp(nonlinear_ode, [0, 2*b], [theta0, 0], t_eval=t)

The solution is contained in sol.y[0].

Let’s compare the numerical solution to the exact solution.

def f(t, c, theta0):
    a = sin(theta0/2)
    m = a**2
    sn, cn, dn, ph = ellipj(c*t, m)
    return 2*arcsin(a*cn/dn)

There are a couple things to note about the code. First,SciPy doesn’t implement the cd function, but it can be computed as cn/dn. Second, the function ellipj returns four functions at once because it takes about as much time to calculate all four as it does to compute one of them.

Here is a plot of the error in solving the differential equation.

And here is the difference between the exact solution to the nonlinear pendulum equation and the stretched solution to the linear equation.

nth derivative of a quotient

Posted on 25 April 2026 by John

There’s a nice formula for the nth derivative of a product. It looks a lot like the binomial theorem.

$(gh)^{(n)} = \sum_{k=0}^n \binom{n}{k} g^{(k)} h^{(n-k)}$

There is also a formula for the nth derivative of a quotient, but it’s more complicated and less known.

We start by writing the quotient rule in an unusual way.

$\left(\frac{g}{h}\right)^{(1)} = \frac{1}{h^2} \left| \begin{array}{cc} h & g \\ h^\prime & g^\prime \\ \end{array} \right|$

Applying the quotient rule twice gives the following.

$\left(\frac{g}{h}\right)^{(2)} = \frac{1}{h^3} \left| \begin{array}{ccc} h & 0 & g \\ h^\prime & h & g^\prime \\ h^{\prime\prime} & 2h^\prime & g^{\prime\prime} \\ \end{array} \right|$

And here’s the general rule in all its glory.

$\left(\frac{g}{h}\right)^{(n)} = \frac{1}{h^{\,n+1}} \left| \begin{array}{cccccc} h & 0 & 0 & \cdots & 0 & g \\[3pt] h^\prime & h & 0 & \cdots & 0 & g^\prime \\[3pt] h^{\prime\prime} & 2h^\prime & h & \cdots & 0 & g^{\prime\prime} \\[3pt] \cdots & \cdots & \cdots & \cdots & \cdots & \cdots \\[3pt] h^{(n)} & \binom{n}{1}h^{(n-1)} & \binom{n}{2}h^{(n-2)} & \cdots & \binom{n}{1}h^\prime & g^{(n)} \end{array} \right|$

Source: V. F. Ivanoff. The nth Derivative of a Fractional Function. The American Mathematical Monthly, Vol. 55, No. 8 (Oct., 1948), p. 491

Author: John

Unified config files

Unified keys

Unified config files

The mythology of category theory

Changing one character in a PDF

Related posts

The shape of a guitar pick

Updating: refining the fit

Approximating even functions by powers of cosine

Three ways to differentiate ReLU

Pointwise derivative

Distributional derivative

Subgradient

Turning a trick into a technique

Circular arc approximation

Derivation

Example

Closed-form solution to the nonlinear pendulum equation

nth derivative of a quotient