Text case changes the size of QR codes

Let’s make a QR code out of a sentence two ways: mixed case and upper case. We’ll use Python with the qrcode library.

>>> import qrcode
>>> s = "The quick brown fox jumps over the lazy dog."
>>> qrcode.make(s).save("mixed.png")
>>> qrcode.make(s.upper()).save("upper.png")

Here are the mixed case and upper case QR codes.

The QR code creation algorithm interprets the mixed case sentence as binary data but it interprets the upper case sentence as alphanumeric data.

Alphanumeric data, in the context of QR codes, comes from the following alphabet of 45 characters:

0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ $%*+-./:

Since 45² = 2025 < 2048 = 211 two alphanumeric characters can be encoded in 11 bits. If text contains a single character outside this alphabet, such as a lower case letter, then the text is encoded as ISO/IEC 8859-1 using 8 bits per character.

Switching from mixed-case text to upper case text reduces the bits per character from 8 to 5.5, and so we should expect the resulting QR code to require about 30% fewer pixels. In the example above we go from a 33 × 33 grid down to a 29 × 29 grid, from 1089 pixels to 841.

Application to Bitcoin addressess

Bech32 encoding uses an alphabet of 32 characters while Base58 encoding uses an alphabet of 58 characters, and so the former needs about 17% more characters to represent the same data. But Bech32 uses a monocase alphabet, and base 58 does not, and so Bech32 encoding requires fewer QR code pixels to represent the same data as Base58 encoding.

(Bech32 encoding uses a lower case alphabet, but the letters are converted to upper case before creating QR codes.)

Related posts

An ancient generalization of the Pythagorean theorem

Apollonius of Perga (c. 262 BC – c. 190 BC) discovered a theorem that generalizes the Pythagorean theorem but isn’t nearly as well known.

Let ABC be a general triangle, and let D be the midpoint of the segment AB. Let a be the length of the side opposite A and b the length of the side opposite B. Let m be the length of AD and h the length of the mediant, the line CD.

Apollonius’s theorem says

a² + b² = 2(m² + h²).

To see that this is a generalization of the Pythagorean theorem, apply Apollonius’ theorem to an isosceles triangle. Now ab and ACD is a right triangle.

Apollonius’ theorem says

2b² = 2m² + 2h²

which is the Pythagorean theorem applied to ACD with each term doubled.

Mentally compute logs base 2

The previous post required computing

\frac{128}{\log_2 5}

After writing the post, I thought about how you would mentally approximate log2 5. The most crude approximation would round 5 down to 4 and use log2 4 = 2 to approximate log2 5. That would be good enough for an order of magnitude guess, but we can do much better without too much more work.

Simple approximation

I’ve written before about the approximation

\log_2 x \approx 3\frac{x - 1}{x + 1}

for x between 1/√2 and √2. We can write 5 as 4 (5/4) and so

\begin{align*} \log_2 5 &= \log_2 4 (5/4) \\ &= \log_2 4 + \log_2 (5/4) \\ &\approx 2 + 3\frac{5/4 - 1}{5/4 + 1} \\ &= 2 + 3 \frac{1/4}{9/4} \\ &= 7/3 \end{align*}

How accurate is this? The exact value of log2 5 is 2.3219…. Approximating this number by 7/3 is much better than approximating it by just 2, reducing the relative error from 16% down to 0.5%.

Origin story

Where did the approximation

\log_2 x \approx 3\frac{x - 1}{x + 1}

come from?

I don’t remember where I found it. I wouldn’t be surprised if it was from something Ron Doerfler wrote. But how might someone have derived it?

You’d like an approximation that works on the interval from 1/√2 to √2 because you can always multiply or divide by a power of 2 to reduce the problem to this interval. Rational approximations are the usual way to approximate functions over an interval [1], and for mental calculation you’d want to use the lowest order possible, i.e. degree 1 in the numerator and denominator.

Here’s how we could ask Mathematica to find a rational approximation for us [2].

Simplify[
    N[
        ResourceFunction["EconomizedRationalApproximation"][
            Log[2, x], { x, {1/Sqrt[2], Sqrt[2]}, 1, 1}]]]

This returns

(2.97035 x − 2.97155) / (1.04593 + x)

which we round off to

(3 x − 3) / (1 + x).

The N function turns a symbolic result into one with floating point numbers. Without this call we get a complicated expression involving square roots and logs of rational numbers.

The Simplify function returns an algebraically equivalent but simpler expression for its argument. In our case the function finishes the calculation by removing some parentheses.

Related posts

[1] Power series approximations are easier to compute, but power series approximations don’t give the best accuracy over an interval. Power series are excellent at the point where they’re centered, but degrade as you move away from the center. Rational approximations spread the error more uniformly.

[2] I first tried using Mathematica’s MiniMaxApproximation function, but it ran into numerical problems, so I switched to EconomizedRationalApproximation.

Freshman’s dream

The “Freshman’s dream” is the statement

(x + y)p = xp + yp

It’s not true in general, but it is true mod p if p is a prime. It’s a cute result, but it’s also useful in applications, such as finite field computations in cryptography.

Here’s a demonstration of the Freshman’s dream in Python.

>>> p = 5
>>> [((x + y)**p - x**p - y**p) % p for x in range(p) for y in range(p)]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

Here’s an example using a prime too large for verify the results by looking at the output.

>>> import numpy as np
>>> p = 103
>>> v = [((x + y)**p - x**p - y**p) % p for x in range(p) for y in range(p)]
>>> np.all( np.array(v) == 0 )
True

You can use the same code to show that the Freshman’s dream is not true in general if p is not a prime, and it’s not true in general if p is a prime but the exponent is less than p.

987654321 / 123456789

I recently saw someone post [1] that 987654321/123456789 is very nearly 8, specifically 8.0000000729.

I wondered whether there’s anything distinct about base 10 in this. For example, would the ratio of 54321six and 12345six be close to an integer? The ratio is 4.00268, which is pretty close to 4.

What about a larger base? Let’s try base 16. The expression

0xFEDCBA987654321 / 0x123456789ABCDEF

in Python returns 14. The exact ratio is not 14, but it’s as close to 14 as a standard floating point number can be.

For a base b, let denom(b) to be the number formed by concatenating all the digits in ascending order and let num(b) be the number formed by concatenating all the digits in descending order.

\begin{align*} \text{num}(b) &= \sum_{k=1}^{b-1} kb^{k-1} \\ \text{denom}(b) &= \sum_{k=1}^{b-1} (b-k)b^{k-1} \end{align*}

Then for b > 2 we have

\frac{\text{num}(b)}{\text{denom}(b)} = b - 2 + \frac{b-1}{\text{denom}(b)}

The following Python code demonstrates [2] that this is true for b up to 1000.

num = lambda b: sum([k*b**(k-1) for k in range(1, b)])
denom = lambda b: sum([(b-k)*b**(k-1) for k in range(1, b)])

for b in range(3, 1001):
    n, d = num(b), denom(b)
    assert(n // d == b-2)
    assert(n % d == b-1)

So for any base the ratio is nearly an integer, namely b − 2, and the fractional part is roughly 1/bb−2.

When b = 16, as in the example above, the result is approximately

14 + 16−14 = 8 + 4 + 2 + 2−56

which would take 60 bits to represent exactly, but a floating point fraction only has 53 bits. That’s why our calculation returned exactly 14 with no fractional part.

 

[1] I saw @ColinTheMathmo post it on Mastodon. He said he saw it on Fermat’s Library somewhere. I assume it’s a very old observation and that the analysis I did above has been done many times before.

[2] Why include a script rather than a proof? One reason is that the proof is straight-forward but tedious and the script is compact.

A more general reason that I give computational demonstrations of theorems is that programs are complementary to proofs. Programs and proofs are both subject to bugs, but they’re not likely to have the same bugs. And because programs made details explicit by necessity, a program might fill in gaps that aren’t sufficiently spelled out in a proof.

How to make a Smith chart

The Smith chart from electrical engineering is the image of a Cartesian grid under the function

f(z) = (z − 1)/(z + 1).

More specifically, it’s the image of a grid in the right half-plane.

Smith chart

This post will derive the basic mathematical properties of this graph but will not go into the applications. Said another way, I’ll explain how to make a Smith chart, not how to use one.

We will use z to denote points in the right half-plane and w to denote the image of these points under f. We will speak of lines in the z plane and the circles they correspond to in the w plane.

Möbius transformations

Our function f is a special case of a Möbius transformation. There is a theorem that says Möbius transformation map generalized circles to generalized circles. Here a generalized circle means a circle or a line; you can think of a line as a circle with infinite radius. We’re going to get a lot of mileage out of that theorem.

Image of the imaginary axis

The function f maps the imaginary axis in the z plane to the unit circle in the w plane. We can prove this using the theorem above. The imaginary axis is a line, so it’s image is either a line or a circle. We can take three points on the imaginary axis in the z plane and see where they go.

When we pick z equal to 0, i, and −i from the imaginary axis we get w values of −1, i, and −i. These three w values do not line on a line, so the image of the imaginary axis must be a circle. Furthermore, three points uniquely determine a circle, so the image of the imaginary axis is the circle containing −1, i, and −i, i.e. the unit circle.

Image of the right half-plane

The imaginary axis is the boundary of the right half-plane. Since it is mapped to the unit circle, the right half-plane is either mapped to the interior of the unit circle or the exterior of the unit circle. The point z = 1 goes to w = 0, and so the right half-plane is mapped inside the unit circle.

Images of vertical lines

Let’s think about what happens to vertical lines in the z plane, lines with constant positive real part. The images of these lines in the w plane must be either lines or circles. And since the right-half plane gets mapped inside the unit circle, these lines must get mapped to circles.

We can say a little more. All lines contain the point ∞, and f(∞) = 1, so the image of every vertical line in the z plane is a circle in the w plane, inside the unit circle and tangent to the unit circle at w = 1. (Tossing around ∞ is a bit informal, but it’s easy to make rigorous.)

The vertical lines in the z plane

map to tangent circles in the w plane.

Images of horizontal lines

Next, let’s think about horizontal lines in the z plane, lines with constant imaginary part. The image of these lines is either a line or a circle. Which is it? The image of a line is a line if it contains ∞, otherwise it’s a circle. Now f(z) = ∞ if and only if z = −1, and so the image of the real axis is a line, but the image of every other horizontal line is a circle.

Since f(∞) = 1, the image of every horizontal line passes through 1, just as the images of all the vertical lines passes through 1.

Since horizontal lines extend past the right half-plane, the image circles extend past the unit circle. The part of the line with positive real part gets mapped inside the unit circle, and the part of the line with negative real part gets mapped outside the unit circle. In particular, the image of the positive real axis is the interval [−1, 1].

Möbius transformations are conformal maps, and so they preserve angles of intersection. Since horizontal lines are perpendicular to vertical lines, the circles that are the images of the horizontal lines meet the circles that are the images of vertical lines at right angles.

The horizontal rays in the z plane

become partial circles in the w plane.

If we were to look at horizontal lines rather than rays, i.e. if we extended the lines into the left half-plane, the images in the w plane would be full circles.

Now let’s put our images together. The grid

in the z plane becomes the following in the w plane.

Spacing

An evenly spaced grid in the z plane becomes a very unevenly spaced graph in the w plane. Things are crowded on the right hand side and sparse on the left. A useable Smith chart needs to be roughly evenly filled in, which means it has to be the image of an unevenly filled in grid in the z plane. For example, you’d need more vertical lines in the z plane with small real values than with large real values.

I address the issue of spacing in the next post.

Generating random points in Colorado

The previous post looked at how to generate random points on a sphere, generating spherical coordinates directly. I wanted to include a demonstration that this generates points with the same distribution as the more customary way of generating points on a sphere, and then decided the demonstration should be its own post.

I’ll generate random points in Colorado using both methods and show that both put the same proportion of points in the state, within a margin that we’d expect from random points.

I thought about using Wyoming, because I’ve been there recently, or Kansas, because I’m going there soon. Although Wyoming and Kansas are essentially rectangles, there are minor complications to the coordinates of the corners. But Colorado is very simple: it’s bounded by latitudes 37°N and 41°N and by longitudes 102.05°W and 109.05°W.

Here are the two ways of generating points on a sphere.

from numpy import *

# Random point on unit sphere in spherical coordinates
def random_spherical():
    rho = 1
    theta = 2*pi*random.random()
    phi = arccos(1 - 2*random.random())
    return (rho, theta, phi)

# Random point on unit sphere in Cartesian coordinates
def random_cartesian():
    v = random.normal(size=3)
    return v / linalg.norm(v)

We’ll need to convert Cartesian coordinates to spherical coordinates.

def cartesian_to_spherical(v):
    rho = linalg.norm(v)
    theta = arctan2(v[1], v[0])
    phi = arccos(v[2] / rho)
    return (rho, theta, phi)

And we’ll need to convert spherical coordinates to latitude and longitude in degrees. Note that latitude is π/2 minus the polar angle.

def spherical_to_lat_long(rho, theta, phi):
    lat = rad2deg(pi/2 - phi)
    long = rad2deg(theta)
    return (lat, long)

And finally, we need a way to test whether a (lat, long) pair is in Colorado.

def in_colorado(lat, long):
    return (37 <= lat < 41) and (102.05 <= long <= 109.05)

Now we’ll generate a million random points each way and count how many land in Colorado.

random.seed(20251022)
N = 1_000_000

count = 0
for _ in range(N):
    rho, theta, phi = random_spherical()
    lat, long = spherical_to_lat_long(rho, theta, phi)
    if in_colorado(lat, long):
        count += 1
print(count / N)

count = 0
for _ in range(N):
    v = random_cartesian()
    rho, theta, phi = cartesian_to_spherical(v)
    lat, long = spherical_to_lat_long(rho, theta, phi)
    if in_colorado(lat, long):
        count += 1
print(count / N) 

This prints 0.000534 and 0.000536.

Efficiency

Note that this isn’t a very efficient way to generate points in Colorado since only about 5 out of every 10,000 points land in the state. If you wanted to generate points only in Colorado, you could randomly generate latitudes between 37° and 41°N and randomly generate longitudes 102.05° and 109.05°. This might be good enough in practice, depending on the application, but it’s not quite right because Colorado is not a Euclidean rectangle but a spherical rectangle.

A more accurate approach would be to randomly generate longitudes 102.05° and 109.05° as above, but generate latitudes as the arccos of points uniformly generated between cos(37°) and cos(41°).

 

Random spherical coordinates

The topic of generating random points on a unit sphere has come up several times here. The standard method using normal random samples generates points (x, y, z) in Cartesian coordinates.

If you wanted points in spherical coordinates, you could first generate points in Cartesian coordinates, then convert the points to spherical coordinates. But it would seem more natural, and possibly more efficient, to directly generate spherical coordinates (ρ, θ, ϕ). That’s what we’ll do in this post.

Unfortunately there are several conventions for spherical coordinates, as described here, so we should start by saying we’re using the math convention that (ρ, θ, ϕ) means (radius, longitude, polar angle). The polar angle ϕ is 0 at the north pole and π at the south pole.

Generating the first two spherical coordinate components is trivial. On a unit sphere, ρ = 1, and θ we generate uniformly on [0, 2φ]. The polar angle ϕ requires more thought. If we simply generated φ uniformly from [0, π] we’d put too many points near the poles and too few points near the equator.

As I wrote about a few days ago, the x, y, and z coordinates of points on a sphere are uniformly distributed. In particular, the z coordinate is uniformly distributed, and so we need cos ϕ to be uniformly distributed, not ϕ itself. We can accomplish this by generating uniform points from [−1, 1] and taking the inverse cosine.

Here’s Python code summarizing the algorithm for generating random points on a sphere in spherical coordinates.

from numpy import random, pi, arccos

def random_pt() # in spherical coordinates
    rho = 1
    theta = 2*pi*random.random()
    phi = arccos(1 - 2*random.random())
    return (rho, theta, phi)

See the next post for a demonstration.

ODE to Fisher’s transform

I was calculating a correlation coefficient this afternoon and ran into something interesting.

Suppose you have two uncorrelated random variables X and Y. If you draw, say, a thousand samples each from X and Y and compute Pearson’s correlation coefficient, you almost certainly will not get 0, though you very likely will get a small number.

How do you find a confidence interval around a correlation coefficient?

Sample correlation coefficient values do not follow a normally distribution, though the distribution is approximately normal if the population correlation is small. The distribution gets further from normal as the correlation gets close to 1 or −1.

Enter Fisher’s transformation. If you run the sample correlation coefficient r through the function

½ log((1 + r)/(1 − r)) = arctanh(r)

you get something that has a distribution closer to the normal distribution. You find a confidence interval for the transformed variable, then undo the transformation.

Now where did the Fisher transform come from?

I don’t know whether this was Fisher’s derivation, but Hotelling came up the following derivation. Assume you apply a transform G(r) to the correlation coefficient. Write an asymptotic expansion for the kurtosis κ3 and set the first term equal to zero. This leads to the ordinary differential equation

3(1 − r³) G″(r) − 6r G′(r) = 0

which has the solution G(r) = arctanh(r).

I found this interesting because I’ve worked with differential equations and with statistics, but I’ve rarely seen them overlap.

 

Quality metrics

Sometimes you’ll hear that a process has so many nines of reliability or that the error rate is so many sigmas. A few years ago I wrote a post on converting between nines and sigmas. See that post for details, approximations, etc.

Here I’d like to post a new graph that I believe is an improvement on the graph in the original post.

The grid lines make it easier to see how to convert between nines and sigmas. For example, 5 nines corresponds to a little more than 4 sigmas.

It’s curious that 3 nines is approximately 3 sigmas, and 9 nines is very nearly 6 sigmas. There are no more points on the red curve with near integer coordinates until you go out to 72 nines, which is nearly 18 sigmas.