Bessel function crossings

Posted on 8 May 2019 by John

The previous post looked at the angles that graphs make when they cross. For example, sin(x) and cos(x) always cross with the same angle. The same holds for sin(kx) and cos(kx) since the k simply rescales the x-axis.

The post ended with wondering about functions analogous to sine and cosine, such as Bessel functions. This post will look at that question in more detail. Specifically we’ll look at the functions J_ν and Y_ν.

Because these two Bessel functions satisfy the same second order linear homogeneous differential equation, the Strum separation theorem says that their zeros are interlaced: between each pair of consecutive zeros of J_ν is exactly one zero of Y_ν, and between each pair of consecutive zeros of Y_ν there is exactly one zero of J_ν.

Plotting Bessel functions J_3 and Y_3

In the following Python code, we find zeros of J_ν, then look in between for places where J_ν and Y_ν cross. Next we find the angle the two curves make at each intersection and plot the angles.

    from scipy.special import jn_zeros, jv, yv
    from scipy.optimize import bisect
    from numpy import empty, linspace, arccos
    import matplotlib.pyplot as plt
    
    n = 3 # bessel function order
    N = 100 # number of zeros
    
    z = jn_zeros(n, N) # Zeros of J_n
    crossings = empty(N-1)
    
    f = lambda x: jv(n,x) - yv(n,x)    
    for i in range(N-1):
        crossings[i] = bisect(f, z[i], z[i+1])
    
    def angle(n, x):
        # Derivatives of J_nu and Y_nu
        dj = 0.5*(jv(n-1,x) - jv(n+1,x))
        dy = 0.5*(yv(n-1,x) - yv(n+1,x))
        
        top = 1 + dj*dy
        bottom = ((1 + dj**2)*(1 + dy**2))**0.5
        return arccos(top/bottom)
        
    y = angle(n, crossings)
    plt.plot(y)
    plt.xlabel("Crossing number")
    plt.ylabel("Angle in radians")
    plt.show()

This shows that the angles steadily decrease, apparently quadratically.

Angles of crossing of J_3 and Y_3

This quadratic behavior is what we should expect from the asymptotics of J_ν and Y_ν: For large arguments they act like shifted and rescaled versions of sin(x)/√x. So if we looked at √xJ_ν and √xY_ν rather than J_ν and Y_ν we’d expect the angles to reach some positive asymptote, and they do, as shown below.

Angles of crossing of √x J_3 and √xY_3

Squircle perimeter and the isoparametric problem

Posted on 11 April 2019 by John

If you have a fixed length of rope and you want to enclose the most area inside the rope, make it into a circle. This is the solution to the so-called isoparametric problem.

Dido’s problem is similar. If one side of your bounded area is given by a straight line, make your rope into a semi-circle.

This post looks at a similar problem for a squircle. Peter Panholzer mentioned the problem of finding the squircle exponent that led to the largest area in proportion to its arclength. This sounds like the isoparametric problem, but it’s not.

The isoparametric problem holds perimeter constant and lets the shape enclosed vary, maximizing the area. The question here is to hold the axes constant and maximize the ratio of the area to the perimeter. Panholzer reports the maximum occurs at p = 4.39365.

Computing the perimeter

The volume of a squircle can be found in closed form, and I’ve mentioned the equation a few times, for example here. The perimeter, however, cannot be found in closed form, as far as I know, except for special exponents.

We can solve for y as a function of x and find the arclength using the formula taught in calculus courses. However, the derivative of y has a singularity at x = 1. By switching to polar coordinates, we can find arclength in terms of an integrand with no singularities. We can also simplify things a little by computing the total arclength as 4 times the arclength in the first quadrant; this avoids having to take absolute values.

The following Python code computes the perimeter and the ratio of the area to the perimeter.

    from scipy import sin, cos, pi
    from scipy.integrate import quad
    from scipy.special import gamma
    
    def r(t, p):
        return (cos(t)**p + sin(t)**p)**-(1/p)
    
    def drdt(t, p):
        deriv = (cos(t)**p + sin(t)**p)**(-1-1/p)
        deriv *= cos(t)**(p-1)*sin(t) - sin(t)**(p-1)*cos(t)
        return deriv
    
    def integrand(t, p):
        return (drdt(t,p)**2 + r(t,p)**2)**0.5
    
    def arclength(p):
        integral = quad(lambda t: integrand(t,p), 0, pi/2)[0]
        return 4*integral
    
    def area(p):
        return 4*gamma(1 + 1/p)**2 / gamma(1 + 2/p)
    
    def ratio(p):
        return area(p) / arclength(p)

Basic geometry tells us the ratio is 1/2 when p = 2 and we have a circle. The ratio is also 1/2 when p = ∞, i.e. when we have a square. We can use the code above to find that the ratio when p = 0.528627, so there is at least one local maximum for values of p between 2 and ∞.

Here’s a plot of the ratio of area to perimeter as a function of p.

ratio of area to perimeter for squircle

The plot is quite flat for large values of p, but if we zoom in we can see that there is a local maximum near 4.4.

close up of previous graph near the maximum

When I find the maximum of the ratio function using Brent’s method (scipy.optimize.brent) I find p = 4.39365679, which agrees with the value Panholzer stated.

Here’s a plot of the squircle corresponding to this value of p.

squircle with largest area to perimeter ratio

Back to the isoparametric problem

Now why doesn’t this contradict the isoparametric problem? Area scales quadratically but perimeter scales linearly. If you don’t hold perimeter constant, you can find a larger ratio of area to volume by making the perimeter larger. And that’s what we’ve done. When p = 2, we have a unit circle, with area π and perimeter 2π. When p = 4.39365679 we have area 3.750961567 and permimeter 7.09566295. If we were to take a circle with the same perimeter, it would have area 4.00660097, larger than the squircle we started with.

An infinite product challenge

Posted on 5 March 2019 by John

Gil Kalai wrote a blog post yesterday entitled “Test Your Intuition (or knowledge, or programming skills) 36.” The challenge is to evaluate the infinite product

$\prod_{p\,\, \mathrm{prime}} \frac{p^2+1}{p^2 - 1}$

I imagine there’s an elegant analytical solution, but since the title suggested that programming might suffice, I decided to try a little Python. I used primerange from SymPy to generate the list of primes up to 200, and cumprod from NumPy to generate the list of partial products.

    cumprod(
        [(p*p+1)/(p*p-1) for p in primerange(1,200)]
    )

Apparently the product converges to 5/2, and a plot suggests that it converges very quickly.

Plot of partial products

Here’s another plot to look more closely at the rate of convergence. Here we look at the difference between 5/2 and the partial products, on a log scale, for primes less than 2000.

Plot of 2.5 minus partial products, log scale

Implementing the ChaCha RNG in Python

Posted on 3 March 2019 by John

My previous post talked about the ChaCha random number generator (CSPRNG) and how Google is using it in a stream cipher for encryption on low-end devices. This post talks about how to implement ChaCha in pure Python.

First of all, the only reason to implement ChaCha in pure Python is to play with it. It would be more natural and more efficient to implement ChaCha in C.

RFC 8439 gives detailed, language-neutral directions for how to implement ChaCha, including test cases for intermediate results. At its core is the function that does a “quarter round” operation on four unsigned integers. This function depends on three operations:

addition mod 2³², denoted +
bitwise XOR, denoted ^, and
bit rotation, denoted <<<=n.

In C, the += operator on unsigned integers would do what the RFC denotes by +=, but in Python working with (signed) integers we need to explicitly take remainders mod 2³². The Python bitwise-or operator ^ can be used directly. We’ll write a function roll that corresponds to <<<=.

So the following line of pseudocode from the RFC

    a += b; d ^= a; d <<<= 16;

becomes

    a = (a+b) % 2**32; d = roll(d^a, 16)

in Python. One way to implement roll would be to use the bitstring library:

    from bitstring import Bits

    def roll(x, n):
        bits = Bits(uint=x, length=32)
        return (bits[n:] + bits[:n]).uint

Another approach, a little harder to understand but not needing an external library, would be

    def roll2(x, n):
        return (x << n) % (2 << 31) + (x >> (32-n))

So here’s an implementation of the ChaCha quarter round:

    def quarter_round(a, b, c, d):
        a = (a+b) % 2**32; d = roll(d^a, 16)
        c = (c+d) % 2**32; b = roll(b^c, 12)
        a = (a+b) % 2**32; d = roll(d^a,  8)
        c = (c+d) % 2**32; b = roll(b^c,  7)
        return a, b, c, d

ChaCha has a state consisting of 16 unsigned integers. A “round” of ChaCha consists of four quarter rounds, operating on four of these integers at a time. All the details are in the RFC.

Incidentally, the inner workings of the BLAKE2 secure hash function are similar to those of ChaCha.

Computing Legendre and Jacobi symbols

Posted on 12 February 2019 by John

In a earlier post I introduce the Legendre symbol

$\left(\frac{a}{p}\right)$

where a is a positive integer and p is prime. It is defined to be 0 if a is a multiple of p, 1 if a has a square root mod p, and −1 otherwise.

The Jacobi symbol is a generalization of the Legendre symbol and uses the same notation. It relaxes the requirement that p be prime and only requires that p is odd.

If m has prime factors p_i with exponents e_i, then the Jacobi symbol is defined by

$\left(\frac{n}{m}\right) = \prod \left(\frac{n}{p_i} \right )^{e_i}$

Note that the symbol on the left is a Jacobi symbol while the symbols on the right are Legendre symbols.

The Legendre and Jacobi symbols are not fractions, but they act in some ways like fractions, and so the notation is suggestive. They come up in applications of number theory, so it’s useful to be able to compute them.

Algorithm for computing Jacobi symbols

Since the Legendre symbol is a special case of the Jacobi symbol, we only need an algorithm for computing the latter.

In the earlier post mentioned above, I outline an algorithm for computing Legendre symbols. The code below is more explicit, and more general. It’s Python code, but it doesn’t depend on any libraries or special features of Python, so it could easily be translated to another language. The algorithm is taken from Algorithmic Number Theory by Bach and Shallit. Its execution time is O( (log a)(log n) ).

    def jacobi(a, n):
        assert(n > a > 0 and n%2 == 1)
        t = 1
        while a != 0:
            while a % 2 == 0:
                a /= 2
                r = n % 8
                if r == 3 or r == 5:
                    t = -t
            a, n = n, a
            if a % 4 == n % 4 == 3:
                t = -t
            a %= n
        if n == 1:
            return t
        else:
            return 0

Testing the Python code

To test the code we randomly generate positive integers a and odd integers n greater than a. We compare our self-contained Jacobi symbol function to the one in SymPy.

    N = 1000
    for _ in range(100):
        a = randrange(1, N)
        n = randrange(a+1, 2*N)
        if n%2 == 0:
            n += 1
            
        j1 = jacobi_symbol(a, n)
        j2 = jacobi(a, n)
        if j1 != j2:
            print(a, n, j1, j2)

This prints nothing, suggesting that we coded the algorithm correctly.

Exploring the sum-product conjecture

Posted on 7 February 2019 by John

Quanta Magazine posted an article yesterday about the sum-product problem of Paul Erdős and Endre Szemerédi. This problem starts with a finite set of real numbers A then considers the size of the sets A+A and A*A. That is, if we add every element of A to every other element of A, how many distinct sums are there? If we take products instead, how many distinct products are there?

Proven results

Erdős and Szemerédi proved that there are constants c and ε > 0 such that

max{|A+A|, |A*A|} ≥ c|A|^1+ε

In other words, either A+A or A*A is substantially bigger than A. Erdős and Szemerédi only proved that some positive ε exists, but they suspected ε could be chosen close to 1, i.e. that either |A+A| or |A*A| is bounded below by a fixed multiple of |A|² or nearly so. George Shakan later showed that one can take ε to be any value less than

1/3 + 5/5277 = 0.3342899…

but the conjecture remains that the upper limit on ε is 1.

Python code

The following Python code will let you explore the sum-product conjecture empirically. It randomly selects sets of size N from the non-negative integers less than R, then computes the sum and product sets using set comprehensions.

    from numpy.random import choice

    def trial(R, N):
        # R = integer range, N = sample size
        x = choice(R, N, replace=False)
        s = {a+b for a in x for b in x}
        p = {a*b for a in x for b in x}
        return (len(s), len(p))

When I first tried this code I thought it had a bug. I called trial 10 times and got the same values for |A+A| and |A*A| every time. That was because I chose R large relative to N. In that case, it is likely that every sum and every product will be unique, aside from the redundancy from commutativity. That is, if R >> N, it is likely that |A+A| and |A*A| will both equal N(N+1)/2. Things get more interesting when N is closer to R.

Probability vs combinatorics

The Erdős-Szemerédi problem is a problem in combinatorics, looking for deterministic lower bounds. But it seems natural to consider a probabilistic extension. Instead of asking about lower bounds on |A+A| and |A*A| you could ask for the distribution on |A+A| and |A*A| when the sets A are drawn from some probability distribution.

If the set A is drawn from a continuous distribution, then |A+A| and |A*A| both equal N(N+1)/2 with probability 1. Only careful choices, ones that would happen randomly with probability zero, could prevent the sums and products from being unique, modulo commutativity, as in the case R >> N above.

If the set A is an arithmetic sequence then |A+A| is small and |A*A| is large, and the opposite holds if A is a geometric sequence. So it might be interesting to look at the correlation of |A+A| and |A*A| when A comes from a discrete distribution, such as choosing N integers uniformly from [1, R] when N/R is not too small.

Projecting Unicode to ASCII

Posted on 9 January 2019 by John

Sometimes you need to downgrade Unicode text to more restricted ASCII text. For example, while working on my previous post, I was surprised that there didn’t appear to be an asteroid named after Poincaré. There is one, but it was listed as Poincare in my list of asteroid names.

Python module

I used the Python module unidecode to convert names to ASCII before searching, and that fixed the problem. Here’s a small example showing how the code works.

    import unidecode

    for x in ["Poincaré", "Gödel"]:
        print(x, unidecode.unidecode(x))

This produces

    Poincaré Poincare
    Gödel Godel

Installing the unidecode module also installs a command line utility by the same name. So you could, for example, pipe text to that utility.

As someone pointed out on Hacker News, this isn’t so impressive for Western languages,

But if you need to project Arabic, Russian or Chinese, unidecode is close to black magic:
>>> from unidecode import unidecode
>>> unidecode("北亰")
'Bei Jing '

(Someone has said in the comments that 北亰 is a typo and should be 北京. I can’t say whether this is right, but I can say that unidecode transliterates both to “Bei Jing.”)

Projections

I titled this post “Projecting Unicode to ASCII” because this code is a projection in the mathematical sense. A projection is a function P such that for all inputs x,

P( P(x) ) = P(x).

That is, applying the function twice does the same thing as applying the function once. The name comes from projection in the colloquial sense, such as projecting a three dimensional object onto a two dimensional plane. An equivalent term is to say P is idempotent. [1]

The unidecode function maps the full range of Unicode characters into the range 0x00 to 0x7F, and if you apply it to a character already in that range, the function leaves it unchanged. So the function is a projection, or you could say the function is idempotent.

Projection is such a simple condition that it hardly seems worth giving it a name. And yet it is extremely useful. A general principle in user interface to design is to make something a projection if the user expects it to be a projection. Users probably don’t have the vocabulary to say “I expected this to be a projection” but they’ll be frustrated if something is almost a projection but not quite.

For example, if software has a button to convert an image from color to grayscale, it would be surprising if (accidentally) clicking button a second time had any effect. It would be unexpected if it returned the original color image, and it would be even more unexpected if it did something else, such as keeping the image in grayscale but lowering the resolution.

[1] The term “idempotent” may be used more generally than “projection,” the latter being more common in linear algebra. Some people may think of a projection as linear idempotent function. We’re not exactly doing linear algebra here, but people do think of portions of Unicode geometrically, speaking of “planes.”

Groups of order 2019

Posted on 30 December 2018 by John

How many groups have 2019 elements? What are these groups?

2019 is a semiprime, i.e. the product of two primes, 3 and 673. For every semiprime s, there are either one or two distinct groups of order s.

As explained here, if s = pq with p > q, all groups of order s are isomorphic if q is not a factor of p − 1. If q does divide p − 1 then there are exactly two non-isomorphic groups of order s, one abelian and one non-abelian. In our case, 3 does divide 672, so there are two groups of order 2019. The first is easy: the cyclic group of order 2019. The second is more complex.

You could take the direct product of the cyclic groups of order 3 and 673, but that turns out to be isomorphic to the cyclic group of order 2019. But if you take the semidirect product you get the other group of order 2019, the non-abelian one.

Semidirect products

Starting with two groups G and H, the direct product G × H is the Cartesian product of G and H with multiplication defined componentwise. The semidirect product of G and H, written [1]

$G \rtimes H$

also starts with the Cartesian product of G and H but defines multiplication differently.

In our application, G will be the integers mod 673 with addition and H will be a three-element subgroup of the integers mod 673 with multiplication [2]. Let H be the set {1, 255, 417} with respect to multiplication [3]. Note that 1 is its own inverse and 255 and 417 are inverses of each other.

Product

Now the group product of (g₁, h₁) and (g₂, h₂) is defined to be

(g₁ + h₁⁻¹g₂, h₁ h₂)

So, for example, the product of (5, 255) and (334, 417) is (5 + 417*334, 255*417) which reduces to (645, 1) working mod 673.

(We haven’t defined the semidirect product in general, but the procedure above suffices to define the semidirect product for any two groups of prime order, and so it is sufficient to find all groups of semiprime order.)

Note that our group is non-abelian. For example, if we reverse the order of multiplication above we get (263, 1).

Identity

The identity element is just the pair consisting of the identity elements from G and H. In our case, this is (0, 1) because 0 is the additive identity and 1 is the multiplicative identity.

Inverse

The inverse of an element (g, h) is given by

(−gh, h⁻¹).

So, for example, the inverse of (600, 255) is (444, 417).

Python code

The goal of this post is to be concrete rather than general.

So to make everything perfectly explicit, we’ll write a little Python code implementing the product and inverse.

    
    def hinv(h):
        if h == 255:
            return 417
        if h == 417:
            return 255
        if h == 1:
            return 1
    
    def prod(x, y):
        g1, h1 = x
        g2, h2 = y
        g3 = (g1 + hinv(h1)*g2) % 673
        h3 = (h1*h2) % 673
        return (g3, h3)
    
    def group_inv(x):
        g, h = x
        return ((-g*h)%673, hinv(h))
    
    x = (5, 255)
    y = (334, 417)
    print(prod(x, y))
    print(prod(y, x))
    print(group_inv((600, 255)))

The following code verifies that our group satisfies the axioms of a group.

    from itertools import product
    
    identity = (0, 1)
    h_list = [1, 255, 417]
    
    def elem(x):
        g, h = x
        g_ok = (0 <= g <= 672)
        h_ok = (h in h_list)
        return (g_ok and h_ok)
    
    group = product(range(673), h_list)
    assert(len([g for g in group]) == 2019)
    
    # closed under multiplication    
    for x in group:
        for y in group:
            assert( elem(prod(x, y)) )
    
    # multiplication is associative
    for x in group:
        for y in group:
            for z in group:
                xy_z = prod(prod(x, y),z)
                x_yz = prod(x, prod(y,z))
                assert(xy_z == x_yz)
    
    # identity acts like it's supposed to
    for x in group:
        assert( prod(x, identity) == x )
        assert( prod(identity, x) == x )
    
    # every element has an inverse
    for x in group:
        ginv = group_inv(x)
        assert( elem(ginv) )
        assert( prod(x, ginv) == identity )
        assert( prod(ginv, x) == identity )

[1] The symbol for semidirect product is ⋊. It’s U+22CA in Unicode and \rtimes in LaTeX.

[2] In general, the semidirect product depends on a choice of an action of the group H on the group G. Here the action is multiplication by an element of H. Different actions can result in different groups. Sometimes the particular choice of action is made explicit as a subscript on the ⋊ symbol.

[3] How did I find these numbers? There are 672 non-zero numbers mod 673, so I picked a number, it happened to be 5, and raised it to the powers 672/3 and 2*672/3.

Check sums and error detection

Posted on 28 December 2018 by John

The previous post looked at Crockford’s base 32 encoding, a minor variation on the way math conventionally represents base 32 numbers, with concessions for human use. By not using the letter O, for example, it avoids confusion with the digit 0.

Crockford recommends the following check sum procedure, a simple error detection code:

The check symbol encodes the number modulo 37, 37 being the least prime number greater than 32.

That is, we take the remainder when the base 32 number is divided by 37 and append the result to the original encoded number. The remainder could be larger than 31, so we need to expand our alphabet of symbols. Crockford recommends using the symbols *, ~, $, =, and U to represent remainders of 32, 33, 34, 35, and 36.

Crockford says his check sum will “detect wrong-symbol and transposed-symbol errors.” We will show that this is the case in the proof below.

Python example

Here’s a little Python code to demonstrate how the checksum works

    from base32_crockford import encode, decode

    s = "H88CMK9BVJ1V"
    w = "H88CMK9BVJ1W" # wrong last char
    t = "H88CMK9BVJV1" # transposed last chars

    def append_checksum(s):
        return encode(decode(s), checksum=True)

    print(append_checksum(s))
    print(append_checksum(w))
    print(append_checksum(t))

This produces the following output.

    H88CMK9BVJ1VP
    H88CMK9BVJ1WQ
    H88CMK9BVJV1E

The checksum character of the original string is P. When the last character is changed, the checksum changes from P to Q. Similarly, transposing the last two characters changes the checksum from P to E.

The following code illustrates that the check sum can be a non-alphabetic character.

    s = "H88CMK9BVJ10"
    n = decode(s)
    r = n % 37
    print(r)
    print(encode(n, checksum=True))

This produces

    32
    H88CMK9BVJ10*

As we said above, a remainder of 32 is represented by *.

Proof

If you change one character in a base 32 number, its remainder by 37 will change as well, and so the check sum changes.

Specifically, if you change the nth digit from the right, counting from 0, by an amount k, then you change the number by a factor of k 32ⁿ. Since 0 < k < 32, k is not divisible by 37, nor is 32ⁿ. Because 37 is prime, k 32ⁿ is not divisible by 37 [1]. The same argument holds if we replace 37 by any larger prime.

Now what about transpositions? If you swap consecutive digits a and b in a number, you also change the remainder by 37 (or any larger prime) and hence the check sum.

Again, let’s be specific. Suppose we transpose the nth and (n + 1)st digits from the right, again counting from 0. Denote these digits by a and b respectively. Then swapping these two digits changes the number by an amount

(b 2ⁿ⁺¹ + a 2ⁿ) − (a 2ⁿ⁺¹ + b 2ⁿ) = (b − a) 2ⁿ

If a ≠ b, then b − a is a number between −31 and 31, but not 0, and so b − a is not divisible by 37. Neither is any power of 2 divisible by 37, so we’ve changed the remainder by 37, i.e. changed the check sum. And as before, the same argument works for any prime larger than 47.

[1] A prime p divides a product ab only if it divides a or it divides b. This isn’t true for composite numbers. For example, 6 divides 4*9 = 36, but 6 doesn’t divide 4 or 9.

Base 32 and base 64 encoding

Posted on 28 December 2018 by John

Math has a conventional way to represent numbers in bases larger than 10, and software development has a couple variations on this theme that are only incidentally mathematical.

Math convention

By convention, math books typically represent numbers in bases larger than 10 by using letters as new digit symbols following 9. For example, base 16 would use 0, 1, 2, …, 9, A, B, C, D, E, and F as its “digits.” This works for bases up to 36; base 36 would use all the letters of the alphabet. There’s no firm convention for whether to use upper or lower case letters.

Base 64 encoding

The common use for base 64 encoding isn’t to represent bits as numbers per se, but to have an efficient way to transmit bits in a context that requires text characters.

There are around 100 possible characters on a keyboard, and 64 is the largest power of 2 less than 100 [1], and so base 64 is the most dense encoding using common characters in a base that is a power of 2.

Base 64 encoding does not follow the math convention of using the digits first and then adding more symbols; it’s free not to because there’s no intention of treating the output as numbers. Instead, the capital letters A through Z represent the numbers 0 though 25, the lower case letters a through z represent the numbers 26 through 51, and the digits 0 through 9 represent the numbers 52 through 61. The symbol + is used for 62 and / is used for 63.

Crockford’s base 32 encoding

Douglas Crockford proposed an interesting form of base 32 encoding. His encoding mostly follows the math convention: 0, 1, 2, …, 9, A, B, …, except he does not use the letters I, L, O, and U. This eliminates the possibility of confusing i, I, or l with 1, or confusing O with 0. Crockford had one more letter he could eliminate, and he chose U in order to avoid an “accidental obscenity.” [2]

Crockford’s base 32 encoding is a compromise between efficiency and human legibility. It is more efficient than hexadecimal, representing 25% more bits per character. It’s less efficient than base 64, representing 17% fewer bits per character, but is more legible than base 64 encoding because it eliminates commonly confused characters.

His encoding is also case insensitive. He recommends using only capital letters for output, but permitting upper or lower case letters in input. This is in the spirit of Postel’s law, also known as the robustness principle:

Be conservative in what you send, and liberal in what you accept.

See the next post for an explanation of Crockford’s check sum proposal.

A password generator

Here’s a Python script to generate passwords using Crockford’s base 32 encoding.

    from secrets import randbits
    from base32_crockford import encode

    def gen_pwd(numbits):
        print(encode(randbits(numbits)))

For example, gen_pwd(60) would create a 12-character password with 60-bits of entropy, and this password would be free of commonly confused characters.

[1] We want to use powers of 2 because it’s easy to convert between base 2 and base 2ⁿ: start at the right end and convert bits in groups of n. For example, to convert a binary string to hexadecimal (base 2⁴ = 16), convert groups of four bits each to hexadecimal. So to convert the binary number 101111001 to hex, we break it into 1 0111 1001 and convert each piece to hex, with 1 -> 1, 0111 -> 7, and 1001 -> 9, to find 101111001 -> 179. If we a base that is not a power of 2, the conversion would be more complicated and not so localized.

[2] All the words on George Carlin’s infamous list include either an I or a U, and so none can result from Crockford’s base 32 encoding. If one were willing to risk accidental obscenities, it would be good to put U back in and remove S since the latter resembles 5, particularly in some fonts.

Python

Bessel function crossings

Related posts

Squircle perimeter and the isoparametric problem

Computing the perimeter

Back to the isoparametric problem

More squircle posts

An infinite product challenge

Implementing the ChaCha RNG in Python

More RNG posts

Computing Legendre and Jacobi symbols

Algorithm for computing Jacobi symbols

Testing the Python code

Related number theory posts

Exploring the sum-product conjecture

Proven results

Python code

Probability vs combinatorics

Projecting Unicode to ASCII

Python module

Projections

Related posts

Check sums and error detection

Python example

Proof

Related posts

Base 32 and base 64 encoding

Math convention

Base 64 encoding

Crockford’s base 32 encoding

A password generator

Related posts