Conway’s pinwheel tiling

John Conway discovered a right triangle that can be partitioned into five similar triangles. The sides are in proportion 1 : 2 : √5.

You can make a larger similar triangle by making the entire triangle the central (green) triangle of a new triangle.

Here’s the same image with the small triangles filled in as in the original.

Repeating this process creates an aperiodic tiling of the plane.

The tiling was discovered by Conway, but Charles Radin was the first to describe it in a publication [1]. Radin attributes the tiling to Conway.

Alternate visualization

It would be easiest to illustrate the tiling if we were standing together in a room and placing new triangles on the floor, watching the tiling expand. Given the limitations of a screen, it may be easier to visualize subdividing the triangle rather than tiling the plane.

Imagine the smallest triangles are a constant size and at each step we’re viewing the process from further away. We see a constant size outer triangle at each step, but the triangle is growing and covering the plane.

Here’s an animated GIF of the process.

 

Related posts

[1] Charles Radin. “The Pinwheel Tilings of the Plane.” Annals of Mathematics, vol. 139, no. 3, 1994, pp. 661–702.

Silent Payments

Bitcoin transactions appear to be private because names are not attached to accounts. But that is not sufficient to ensure privacy; if it were, much of my work in data privacy would be unnecessary. It’s quite possible to identify people in data that does not contain any direct identifiers.

I hesitate to use the term pseudonymization because people define it multiple ways, but I’d say Bitcoin addresses provide pseudonymization but not necessarily deidentification.

Privacy and public ledgers are in tension. The Bitcoin ledger is superficially private because it does not contain user information, only addresses [1]. However, transaction details are publicly recorded on the blockchain.

To add some privacy to Bitcoin, addresses are typically used only once. Wallet software generates new addresses for each transaction, and so it’s not trivial to track all the money flowing between two people. But it’s not impossible either. Forensic tools make it possible to identify parties behind blockchain transactions via metadata and correlating information on the blockchain with information available off-chain.

Silent Payments

Suppose you want to take donations via Bitcoin. If you put a Bitcoin address on your site, that address has to be permanent, and it’s publicly associated with you. It would be trivial to identify (the addresses of) everyone who sends you a donation.

Silent payments provide a way to work around this problem. Alice could send donations to Bob repeatedly without it being obvious who either party is, and Bob would not have to change his site. To be clear, there would be no way to tell from the addresses that funds are moving between Alice and Bob. The metadata vulnerabilities remain.

Silent payments are not implemented in Bitcoin Core, but there are partial implementations out there. See BIP 352.

The silent payment proposal depends on ECDH (elliptic curve Diffie-Hellman) key exchange, so I’ll do a digression on ECDH before returning to silent payments per se.

ECDH

The first thing to know about elliptic curves, as far as cryptography is concerned, is that there is a way to add two points on an elliptic curve and obtain a third point. This turns the curve into an Abelian group.

You can bootstrap this addition to create a multiplication operation: given a point g on an elliptic curve and an integer nng means add g to itself n times. Multiplication can be implemented efficiently even when n is an enormous number. However, and this is key, multiplication cannot be undone efficiently. Given g and n, you can compute ng quickly, but it’s practically impossible to start with knowledge of ng and g and recover n. To put it another way, multiplication is efficient but division is practically impossible [2].

Suppose Alice and Bob both agree on an elliptic curve and a point g on the curve. This information can be public. ECDH lets Alice and Bob share a secret as follows. Alice generates a large random integer a, her private key, and computes a public key A = ag. Similarly, Bob generates a large random integer b and computes a public key Bbg. They’re shared secret is

aBbA.

Alice can compute aB because she (alone) knows a and B is public information. Similarly Bob can compute bA. The two points on the curve aB and bA are the same because

aBabgbagbA.

Back to silent payments

ECDH lets Alice and Bob share a secret, a point on a (very large) elliptic curve. This is the heart of silent payments.

In its simplest form, Alice can send Bob funds using the address

PB + hash(aBg.

A slightly more complicated form lets Alice send Bob funds multiple times. The kth time she sends money to Bob she uses the address

PB + hash(aB || kg

where || denotes concatenation.

But how do you know k? You have to search the blockchain to determine k, much like the way hierarchical wallets search the blockchain. Is the address corresponding to k = 0 on the blockchain? Then try again with k = 1. Keep doing this until you find the first unused k. Making this process more efficient is one of the things that is being worked on to fully implement silent payments.

Note that Bob needs to make B public, but B is not a Bitcoin address per se; it’s information needed to generate addresses via the process described above. Actual addresses are never reused.

Related posts

[1] Though if you obtain your Bitcoin through an exchange, KYC laws require them to save a lot of private information.

[2] Recovering n from ng is known as the discrete logarithm problem. It would be more logical to call it the discrete division problem, but if you write the group operation on an elliptic curve as multiplication rather than addition, then it’s a discrete logarithm, i.e. trying to solve for an unknown exponent. If and when a large-scale quantum computer exists, the discrete logarithm problem will be practical to solve, but presumably not until then.

An inverse problem for the quadratic equation

In elementary algebra we learn how to solve ax2 + bx + c = 0 for roots r = r1, r2. But how about the inverse problem: given a real-valued r, find integer coefficients a, b and c such that r is a root of the corresponding quadratic equation.

A simple approximation can be found by setting b=0, so r2 = – c/a. By appropriately choosing large values of a and c, one can approximate the solution to this inverse problem to arbitrary accuracy.

Are there any other solutions (possibly taking advantage of b for a better approximation)?

Two families of algorithms are available: PSLQ [1-3] and LLL [4-5]. The underlying problem formulation behind these algorithms can be understood geometrically. For fixed x, L(a, b, c) = ax2 + bx + c is a linear function of (a, b, c), here considering a, b and c as real-valued. Thus, for such x, the equation L(a, b, c) = 0 defines a known plane in 3-D space. However, we seek a, b, c that are integers. The problem then is to find a lattice point (a, b, c) with a, b, c integers that is “near” to this plane (for example, close to the plane within tolerance τ, which means a 2-D thin “slab” of thickness 2τ surrounding this plane hits a lattice point). We would also prefer if the magnitude of a, b and c is “small” (e.g., bounded by a constant H).

Given this formulation, the PSLQ algorithm attempts to find an exact solution, and gives up if it can’t find one. The LLL algorithm on the other hand finds approximate solutions within a tolerance.

Python has the mpmath package, which contains mpmath.pslq for the PSLQ algorithm. The Python SciPy package has Matrix.LLL() for the LLL reduction.

A colleague of mine was previously submitting overnight runs to compute solutions using  brute force search. More recently, using a Python code for the LLL algorithm, he is able to solve a large problem in seconds, with answers that exactly reproduce the brute force solution.

References

[1] Ferguson, Helaman R. P.; Bailey, David H. (1992). A Polynomial Time, Numerically Stable Integer Relation Algorithm. RNR Technical Report RNR-91-032, NASA Ames Research Center. URL: https://www.davidhbailey.com/dhbpapers/pslq.pdf

[2] Ferguson, Helaman R. P.; Bailey, David H.; Arno, Steve (1999). Analysis of PSLQ, an integer relation finding algorithm. Mathematics of Computation 68(225): 351–369. URL (AMS PDF): https://www.ams.org/mcom/1999-68-225/S0025-5718-99-00995-3/S0025-5718-99-00995-3.pdf

[3] Bailey, David H. (2020). PSLQ: An Algorithm to Discover Integer Relations. (Expository overview.) URL: https://www.davidhbailey.com/dhbpapers/pslq-comp-alg.pdf

[4] Lenstra, Arjen K.; Lenstra, Hendrik W., Jr.; Lovász, László (1982). Factoring Polynomials with Rational Coefficients. Mathematische Annalen 261: 515–534. DOI: https://doi.org/10.1007/BF01457454 (Springer page: https://link.springer.com/article/10.1007/BF01457454 (cf. https://www.math.ucdavis.edu/~deloera/MISC/LA-BIBLIO/trunk/Lovasz/LovaszLenstraLenstrafactor.pdf)

[5] Nguyen, Phong Q.; Vallée, Brigitte (eds.) (2010). The LLL Algorithm: Survey and Applications. Springer. URL: https://link.springer.com/book/10.1007/978-3-642-02295-1

Inverting series that are flat at zero

The previous post looked at solving the equation

2\theta + \sin(2\theta) = \pi \sin( \psi)

which arises from the Mollweide map projection. Newton’s method works well unless φ is near π/2. Using a modified version of Newton’s method makes the convergence faster when φ = π/2, which is kinda useless because we know the solution is $theta; = π/2 there. When φ is near π/2, the modified Newton’s method may diverge. I ended the previous post by saying a series solution would work better when φ is sufficiently close to π/2. This post will flesh that out.

Let x = π − 2θ. Now the task is to solve

x - \sin(x) = y

for small positive values of y.

The left side is

\frac{x^3}{3!} - \frac{x^5}{5!} + \frac{x^7}{7!} + \cdots

and so for very small values of y, and thus very small values of x, we have

x = (6y)^{1/3}

If this solution is not sufficiently accurate, we can invert the power series above to get a power series in y that gives the solution x. However, the Lagrange inversion theorem does not apply because the series has a zero derivative at 0. Instead, we have to use Puiseux series inversion, looking for a series in y1/3 rather than a series in y. From the Puiseux series we can see that

x = (6y)^{1/3} + y/10

is a more accurate solution. For even more accuracy, you can compute more terms of the Puiseux series.

Mollweide map projection and Newton’s method

Mollweide projection

Karl Brandan Mollweide (1774-1825) designed an equal-area map projection, mapping the surface of the earth to an ellipse with an aspect ratio of 2:1. If you were looking at the earth from a distance, the face you’re looking at would correspond to a circle in the middle of the Mollweide map [1]. The part of the earth that you can’t see is mapped to the rest of the ellipse.

The lines of latitude are not quite evenly spaced; some distortion is required to achieve equal area. Instead, latitude φ corresponds to θ on the map according to the equation

2\theta + \sin(2\theta) = \pi \sin( \varphi)

There is no closed-form solution for θ as a function of φ and so the equation must be solved numerically. For each φ we have to find the root of

f(\theta) = 2\theta + \sin(2\theta) - \pi \sin(\varphi)

Here’s a plot of θ as a function of φ.

Newton’s method

Newton’s method works efficiently (converges quadratically) except when φ is close to π/2. The reason Newton’s method slows down for high latitudes is that f and its derivative are both at π/2, i.e. f has a double root there.

Newton’s method slows down from quadratic to linear convergence near a double root, but there is a minor modification that maintains quadratic convergence near a double root.

x_{n+1} = x_n - m\frac{f(x_n)}{f^\prime(x_n)}

When m = 1 this is the usual form of Newton’s method. Setting m = 2 tunes the method for double roots.

The modified version of Newton’s method with m = 2 works when the root you’re trying to is exactly a double root. However, if you’re trying to find a root near a double root, setting m = 2 can cause the method to diverge, so you have to be very careful with changing m.

Illustration

Here’s a version of Newton’s method, specialized for the Mollweide equation. We can specify the value of m, and the initial estimate of the root defaults to θ = φ.

from numpy import sin, cos, pi

def mynewton(phi, m, x0 = None):
    x = x0 if x0 else phi
    delta = 1
    iterations = 0
    
    while abs(delta) > 1e-12:
        fx = 2*x + sin(2*x) - pi*sin(phi)
        fprime = 2 + 2*cos(2*x)
        delta = m*fx/fprime
        x -= delta
        iterations += 1

    return (x, iterations)

Far from the double root

Say φ = 0.25 and m = 1. When we use the default initial estimate Newton’s method converges in 5 iterations.

Near the double root

Next, say φ = π/2 − 0.001. Again we set m = 1 and use the default initial estimate. Now Newton’s method takes 16 iterations. The convergence is slower because we’re getting close to the double root at π/2.

But if we switch to m = 2, Newton’s method diverges. Slow convergence is better than no convergence, so we’re better off sticking with m = 1.

On the double root

Now let’s say φ = π/2. In this case the default guess is exactly correct, but the code divides by zero and crashes. So let’s take our initial estimate to be π/2 − 0.001. When m = 1, Newton’s method takes 15 steps to converge. When m = 2, it converges in 6 steps, which is clearly faster. However, both cases the returned result is only good to 6 decimal places, even though the code is looking for 12 decimal places.

This shows that as we get close to φ = π/2, we have both a speed and an accuracy issue.

A better method

I haven’t worked this out, but here’s how I imagine I would fix this. I’d tune Newton’s method, preventing it from taking steps that are too large, so that the method is accurate for φ in the range [0, π/2 − ε] and use a different method, such as power series inversion, for φ in [π/2 − ε, π/2].

Update: The next post works out the series solution alluded to above.

Related posts

[1] Someone pointed out on Hacker News that this statement could be confusing. The perspective you’d see from a distance is not the Mollweide projection—that would not be equal area—but it corresponds to the same region. As someone put it on HN “a circle in the center corresponds to one half of the globe.”

1420 MHz

There are four manmade object that have left our solar system: Pioneer 10, Pioneer 11, Voyager 1, and Voyager 2.

Yesterday I wrote about the Golden Record which is on both of the Voyager probes. Some of the graphics on the cover of the Golden Record is also on plaques attached to the Pioneer probes. In particular, the two plaques and the two record covers have the following image.

Pioneer plaque hydrogen diagram

This image is essential to decoding the rest of the images. It represents a “spin flip transition” of a neutral hydrogen atom. This event has a frequency of 1420.405 MHz, corresponding to a wavelength of about 21 cm. The little hash mark is meant to indicate that this is a unit of measurement, both of length (21 cm) and time (the time it takes light to travel 21 cm).

The thought behind the unit of measurement was that it might make sense to an extraterrestrial being. Frequencies associated with hydrogen atoms are more universal than seconds (which ultimately derive from the period of Earth’s rotation) or meters (which derive from the circumference of the Earth).

The Wow! signal

The frequency 1420 MHz is special, so special that scientists believed advanced civilizations would recognize it. That’s why an astronomer, Jerry R. Ehman, was excited to observe a strong signal at that frequency on August 15, 1977. (Incidentally, this was five days before Voyager 2 was launched.)

Ehman wrote “Wow!” on a computer printout of the signal, and the signal has become known as “the Wow! signal.”

Ehman thought this signal might have come from an extraterrestrial intelligence for the same reason we were using its frequency in our message to potential extraterrestrials.

Wow! 6DQUJ5

Related posts

The AI fork in the road

If AI can do part of your job, you’re likely to either be fired or promoted.

AI will always have some error rate, and if it can do your job at an acceptable error rate, you’ll need to find another job.

But if the error rate is not acceptable, the ability to identify and fix errors becomes more valuable.

People with the experience to recognize errors and fix them quickly are more productive when delegating work to AI. Higher productivity should result in higher wages, though you may have to become self-employed to be paid more for getting more done.

Contractors and consultants are more often paid in proportion to the value of their work than salaried employees are, especially since 1971.

Recognizing errors takes experience. This is especially true of LLM-generated errors because LLMs always return plausible results (according to some model) though they do not always return correct results.

Day of the week centuries from now

Yesterday I wrote a post outlining mental math posts I’d written over the years. I don’t write about mental math that often, but I’ve been writing for a long time, and so the posts add up. In the process of writing the outline post I noticed a small gap.

In the post on mentally calculating the day of the week I focus on this century, then tersely say “For dates in the 20th century, add 1. For dates in the 22nd century, subtract 2.” This post will expand on that. I’ll say a little more about what these rules mean, how to extend them to other centuries, and why they work.

Dates in the 20th century

Suppose you wanted to know the day of the week for July 4, 1976. You could use the method in the post linked above and end up with Saturday. Moving ahead one day of the week gives you Sunday.

Dates in future centuries

For days in the 2100s you would find the day of the week for the corresponding date in this century, then subtract two days. For the 2200s you’d add three days, and for the 2300s you’d add one day.

The Gregorian calendar repeats every 400 years, so if you can find the days of the week for the years 2000 through 2399, you can find the day of the week for all years in the future by subtracting enough multiples of 400 to get into that range of years. For example, in the year 2825, September 18 will fall on a Thursday because it falls on a Thursday today in 2025.

You could also go backward by centuries as well as forward, but there’s a complication. If you run the Gregorian calendar backward far enough, you run into a time before the Gregorian calendar was adopted.

Why the rules work

So why do the century adjustment rules work? There’s a simple but incomplete explanation, and a more complicated complete explanation.

Partial explanation

The number of days in most centuries, i.e. those not divisible by 400, is

365 × 100 + 24 = 36524

which is congruent to 5 mod 7, which is congruent to −2 mod 7. Dates in the 2100s are 36524 days ahead of their counterparts in the 2000s, so we move back two days in the week. The same applies when going from the 2100s to the 22oos, and from the 2200s to the 2300s.

For example, September 18, 2125 is 36524 days from now—note that February, 2100 will not have a Leap Day—and so it will fall two days earlier in the week, on a Tuesday.

That explanation is mostly correct, but there’s a wrinkle. Not all dates in the 2100s are 36524 days ahead of their counterparts in the 2000s. There are indeed 36524 days between September 18, 2025 and September 18, 2125. But here are 36525 days between January 1, 2000 and January 1, 2100. The former was on a Saturday and the latter will be on a Friday. It would seem that our process is wrong, but it’s not.

Full explanation

Let’s go back and look at the algorithm for finding days of the week,

  1. Take the last two digits of the year and add the number of times 4 divides that number [1].
  2. Add a constant corresponding to the month.
  3. Add the day of the month.
  4. Subtract 1 in January or February of a leap year.
  5. Take the remainder by 7.
  6. Adjust for the century.

Suppose we wanted to find the day of the week for January 1, 2100. The first step gives us 0. The constant for January is 6, so at the end of step 2 we have 6. At the end of step 3 we have 7.

Now comes the key part: in step 4 we do not subtract 1 because 2100 is not a leap year. Step 5 gives us 0, i.e. Sunday. When we adjust for the century by moving back two days, we get Friday.

This explanation is admittedly tedious, but it reveals a subtle part of the algorithm: you have to carry out the algorithm, in particular step 4, for the original year. To find the day of the week for a year in the 2200s, for example, you carry out the algorithm as if it were valid for the 2200s, until you get to the last step.

Related posts

[1] This is known as the year share. There are many ways to calculate it (mod 7) that are less obvious but easier to calculate mentally. It’s interesting that these methods generally look more complicated in writing, but they’re easier to carry out.

Learning languages with the help of algorithms

Suppose you’re learning a new language and want to boost your vocabulary in a very time-efficient way.

People have many ways to learn a language, different for each person. Suppose you wanted to improve your vocabulary by reading books in that language. To get the most impact, you’d like to pick books that cover as many common words in the language as possible.

Here is a formalization. Suppose for a large set of m books of average length n words, you want to pick the one book that has the highest vocabulary impact from the set of books. This vocabulary impact of a book is measured by a weighted sum across all vocabulary words in the book, each word weighted by how common the word is in the language, as measured by word frequency across all books; this essentially gives a probability weight of each word in the language.

That’s an easy problem to solve. First, optionally filter out stop words like (in the case of English) “the“ and “and”  considered in some sense to have “not much meaning.“ Second, across all books build a unique word list along with counts of number of occurrences of each word. Finally, for each book evaluate the coverage of those words, computing a score as described above.

Computing the unique word list and scores can be done by a hashing process that runs in linear order mn time typically, worst case mn log(mn). To compute the score also costs average linear time. So the entire process can be done in linear time.

What if you want to find the best two books to read, with the best joint vocabulary coverage? To find the optimal solution, the best known time is not linear but is quadratic for the general case.

How about the best k books for arbitrary k > 0?

This is an NP-hard problem, meaning that the compute time to solve this problem exactly for the best known algorithms for the general case grows exponentially in k as the size k of your desired set of reading books increases.

So you cannot expect to solve this problem exactly for large k. All hope is not lost, however. This is a maximal weighted cover problem, residing in a subset of the NP hard problems known as submodular problems.  Because of this, approximate algorithms are known that guarantee approximation accuracy within a certain known factor of the true best solution (see [1-5]).

These algorithms are described in [6], [7], [8]. The basic idea of the algorithms is to add a single high-impact book at a time to the running set of high-impact books—a greedy algorithm. It is not guaranteed to be the best book list, but it is reasonable.

The helpful Python submodlib package runs very fast on this problem.

One can improve the quality of the result by spending more on computation.  First, you could use a blocking strategy to compute precisely the best two books to add at each step, or three books, and so forth (computational complexity is quadratic, cubic and so forth). Similarly one could use a look-ahead strategy: add two books to the list that are together the best by an exact computation, then leave one off and find the best second and third books, and so forth. These in general do not improve on the submodularity bound, however in practice the result is more accurate.

One can also use various heuristics to improve performance in some cases. For example, if a book has little or no vocabulary that is not present in the other books, it can sometimes be safely discarded. However, in general the exact case remains hard (unless P=NP).

References

[1] Abhimanyu Das and David Kempe. Submodular meets Spectral: Greedy Algorithms for Subset Selection, Sparse Approximation and Dictionary Selection. Proceedings of ICML, 2011.

[2] Abhimanyu Das and David Kempe. Approximate Submodularity and its Applications: Subset Selection, Sparse Approximation and Dictionary Selection.
Journal of Machine Learning Research (JMLR), 19(3):1–35, 2018.

[3] M. Conforti and G. Cornuéjols. Submodular set functions, matroids and the greedy algorithm: tight worst-case bounds. Discrete Applied Mathematics, 7(3):251–274, 1984.

[4] Maxim Sviridenko, Jan Vondrák, and Justin Ward. Optimal approximation for submodular and supermodular optimization with bounded curvature. Proceedings of SODA, 2013.

[5] Rishabh Iyer, Stefanie Jegelka, and Jeff Bilmes. Curvature and Optimal Algorithms for Learning and Minimizing Submodular Functions. 2013.
https://arxiv.org/abs/1311.2110

[6] Nemhauser, George L., Laurence A. Wolsey, and Marshall L. Fisher. “An analysis of approximations for maximizing submodular set functions—I.” Mathematical programming 14, no. 1 (1978): 265-294.

[7] “Maximum coverage problem,” https://en.wikipedia.org/wiki/Maximum_coverage_problem.

[8] Wei, Kai, Rishabh Iyer, and Jeff Bilmes. “Fast multi-stage submodular maximization.” In International conference on machine learning, pp. 1494-1502. PMLR, 2014.

 

Mental math posts

I’ve written quite a few posts on mental math over the years. I think mental math is important/interesting for a couple reasons. First, there is some utility in being able to carry out small calculations with rough accuracy without having stop and open up a calculator. Second, the constraints imposed by mental calculation make you look at a problem differently, often in interesting ways. An approximate mental calculation often requires more understanding than a precise machine calculation.

Divisibility and factoring

Day of the week

Transcendental functions

This page outlines how to calculate logarithms base 2, e, and 10, and their inverses, linking to other posts for details.

It also outlines computing trig functions, roots, and the gamma function. See the links below for more accurate ways to compute logarithms.

Incidentally, when I’ve posted about these approximations before, inevitably someone will say these are just Taylor approximations. No, they aren’t.

Logarithms

Miscellaneous