Uncategorized

Example of memorizing a 256-bit private key

There are techniques that can enable anyone to memorize much more than may seem possible. This post will show how I generated and memorized a 256-bit encryption key this morning using the approach explained here.

TANSTAAFL

There ain’t no such thing as a free lunch. This saying is abbreviated TANSTAAFL in Heinlein’s novel The Moon is a Harsh Mistress. It takes effort to safe effort.

Memorization techniques make it easier to remember new things if you invest effort into the techniques. The more you invest into the techniques, the easier memorizing new things is.

Whether this investment is practical depends on the person. I find it more interesting than practical; I rarely need to memorize anything. But I know of people who have used these techniques to great advantage.

Generating a key

First, I’ll generate a 256-bit number using Python.

>>> import secrets
>>> s = secrets.randombits(256)

This produced the following:

14769232028620221959695310712392700269168526908419649910136349315042507303581

If you were to run the same code you would not get the same number, which is the point of the secrets module. It seeds a random number generator with entropy extracted from the state of the computer it is running on.

Parsing digits

The number above has 77 digits, so I split it into a two-digit number, 14, and 25 three-digit numbers: 769, 232, …, 581. Then using Major mnemonic system encoding, I associate each number with a letter of the NATO phonetic alphabet.

I encode 14 as “tar”, and the NATO word for the letter A is “alpha,” so I imagine an alpha male covered in tar.

I encode 769 as “ketchup,” and the NATO word for B is “bravo,” so I imagine a brave bottle of ketchup, arms akimbo, and wearing a superhero cape.

I encode 581 as “elevator” [1], and the phonetic word for Z is Zulu, and so I imagine Shaka Zulu riding in an elevator.

Using this process I memorized the random number above in a few minutes.

Just for fun I asked DALL-E 2 to produce an image of “Shaka Zulu, spear in hand, riding in an elevator” the image below is one of the ones it created.

Shaka Zulu, spear in hand, riding in an elevator

Related posts

[1] Strictly speaking, “elevator” decodes as 5814. But I use a common convention of only considering the first three consonants in a word. This is a good trade-off because it’s not likely you could encode a 4-digit number in a single word.

Top twelve posts of 2023

Illustrating Euclid Proposition XIII.10

These were the most popular posts I wrote in 2023.

Privacy and encryption

Geometry

Number theory

Linear algebra

Miscellaneous

Doomsday 2024

John Conway’s “Doomsday” rule observes that that every year, the dates 4/4, 6/6, 8/8, 10/10, 12/12, 5/9, 9/5, 7/11, and 11/7 fall on the same day of the week. In 2024 all these dates fall on a Thursday.

These dates are easy to memorize because they break down in double pairs of even digits—4/4, 6/6, 8/8, 10/10, and 12/12— and symmetric pairs of odd digits—5/9 and 9/5, 7/11 and 11/7. Because of their symmetry, this set of dates is the same whether interpreted according to the American or European convention.

In addition, the last day of February falls on “Doomsday.” Since 2024 is a leap year, this will be February 29.

Related post: Mentally calculating the day of the week

Solving a triangle the size of Argentina

The numbers in today’s date—11, 28, and 23—make up the sides of a triangle. This doesn’t always happen; the two smaller numbers have to add up to more than the larger number.

We’ll look at triangles with sides 11, 23, and 28 in the plane, on a sphere, and on a hypersphere. Most of the post will be devoted to the middle case, a large triangle on the surface of the earth.

Solving a triangle in the plane

If we draw a triangle with sides 11, 23, and 28, we can find out the angles of the triangle using the law of cosines:

c² = a² + b² – 2ab cos C

where C is the angle opposite the side c. We can find each of the angles of the triangle by rotating which side we call c.

If we let c = 11, then C = arccos((23² + 28² − 11²)/(2 × 23 × 28)) = 22.26°.

If we let c = 23, then C = arccos((11² + 28² − 23²)/(2 × 11 × 28)) = 52.38°.

If we let c = 28, then C = arccos((11² + 23² − 28²)/(2 × 11 × 23)) = 105.36°.

Solving a triangle on a sphere

Now suppose we make our 11-23-28 triangle very large, drawing our triangle on the face of the earth. We pick our unit of measurement to be 100 miles, and we get a triangle very roughly the size and shape of Argentina.

We can still use the law of cosines, but it takes a different form, and the meaning of the terms changes. The law of cosines on a sphere is

cos(c) = cos(a) cos(b) + sin(a) sin(b) cos(C).

As before, a, b, and c are sides of the triangle, and the sides b and c intersect at an angle C. However, now the sides themselves are angles because they are arcs on a sphere. Now a, b, and c are measured in degrees or radians, not in miles.

If the length of an arc is x, the angular measure of the arc is 2πx/R where R is the radius of the sphere. The mean radius of the earth is 3959 miles, and we’ll assume the earth is a sphere with that radius.

We can solve for the angle opposite the longest side by using

C = arccos( (cos(c) – cos(a) cos(b)) / sin(a) sin(b) )

where

a = 2π × 1100 / 3959
b = 2π × 2300 / 3959
c = 2π × 2800 / 3959

It turns out that C = 149.8160°, and the other angles are 14.3977° and 29.4896°.

Importantly, the sum of these three angles is more than 180°. In fact it’s 193.7033°.

The sum of the vertex angles in a spherical triangle is always more than 180°, and the bigger the triangle, the more the sum exceeds 180°. The amount by which the sum exceeds 180° is called the spherical excess E and it is proportional to the area. In radians,

E = area / R².

In our example the excess is 13.7033° and so the area of our triangle is

13.7033° × (π radians / 180°) × 3959² miles² = 3,749,000 miles².

Now Argentina has an area of about a million square miles, so our triangle is bigger than Argentina, but smaller than South America (6.8 million square miles). Argentina is about 2300 miles from north to south, so one of the sides of our triangle matches Argentina well.

Note that there are no similar triangles on a sphere: if you change the lengths of the sides proportionately, you change the vertex angles.

Solving a triangle on a pseudosphere

In a hyperbolic space, such as the surface of a pseudosphere, a surface that looks sorta like the bell of a trombone, the law of cosines becomes

cosh(c) = cosh(a) cosh(b) + κ sinh(a) sinh(b) cos(C)

where κ < 0 is the curvature of the space. Note that if we set κ = 1 and delete all the hs this would become the law of cosines on a sphere.

Just as the sum of the angles in a triangle add up to more than 180° on a sphere, and exactly 180° in a plane, they add up to less than 180° on a pseudosphere. I suppose you could call the difference between 180° and the sum of the vertex angles the spherical deficiency by analogy with spherical excess, but I don’t recall hearing that term used.

Related posts

Radius of a stretched spring

When you stretch a coiled spring, the radius decreases slightly, so slightly that you can’t see the difference unless you stretch the spring so much that you damage it.

The math is essentially the same as in the previous post about wrapping Christmas lights around a tree trunk.

If you have a coiled spring of radius r, the points along the coil can be described by

(r cos t, r sin t, ht/2π)

where h is the spacing between turns. If t runs from 0 to T, the length of the spring is hT/2π and the length of the material in the spring, if it were uncoiled, would be

(r² + h²/4π²)1/2 T.

When we stretch a spring, we increase h. We don’t increase the total amount of material, so the radius must decrease, though not by much.

Suppose the spring initially has radius r1 and coil spacing h1. Then when we stretch it the spring has radius r2 and coil spacing h2. Since we haven’t created new material, we must have

(r1² + h1²/4π²)1/2 T = (r2² + h2²/4π²)1/2 T

and so

r1² + h1²/4π² = r2² + h2²/4π².

A small change in h results in a change in r an order of magnitude smaller, for reasons given in the previous post. Both posts boil down to the observation that for y small relative to x,

(x² + y²)1/2  x  = y² /2x + O(y4).

If we choose our units so that the initial radius is on the order of 1, then a change in length on the order of y results in a change in radius on the order of y².

Partitioning dots and dashes

Given a set of dots and dashes, how many ways can they be partitioned into a set of Morse code letters?

There is at least one way, since you could take each dot to be an E and each dash to be a T.

If you have a sequence of n dots and dashes, there no more than 2n−1 ways to partition the symbols: at each of the n − 1 spaces between symbols, you either start a new partition or you don’t. This is an over-estimate for large n since a Morse code letter has at most 4 dots or dashes, and not all combinations of four dots and dashes corresponds to a letter.

Last year I wrote about the song YYZ and how it was inspired by the sound of “YYZ” in Morse code, YYZ being the designation of the Toronto airport. Here’s the song’s opening theme:

The C code given here enumerates partitions of dots and dashes, and it shows that there are 1324 ways to partition -.---.----.. into the Morse code for letters [1]. This number 1324 is closer to our upper estimate of 211 = 2048 than our lower estimate of 1.

Define a function M(n) as follows. Express n in binary, convert the 0s to dots and the 1s to dashes, and let M(n) be the number of ways this sequence of dots and dashes can be partitioned into letters. The n corresponding to the Morse code for YYZ is 101110111100two = 3004ten, so M(3004) = 1324.

I looked to see whether M(n) were in OEIS and it doesn’t appear to be, though there are several sequences in OEIS that include Morse code in their definition.

It’s easy to see that 1 ≤ M(n) ≤ n. Exercise for the reader: find sharper upper and lower bounds for M(n). For example, show that every group of three bits can be partitioned four ways, and so M(n) has a lower bound something like n2/3.

Related posts

[1] The code returns 1490 possibilities, but some of these contain one or more asterisks indicating combinations of dots and dashes that do not correspond to letters.

DICOM image data

X ray of hand and arm

 

The previous post discussed EXIF data embedded in a digital photo. DICOM files are analogous medical images.

You can think of a DICOM image as a JPEG with medical metadata. Strictly speaking a DICOM file is a sort of database, and one of the fields in the database contains the pixels. The pixels are usually stored in JPEG format or some variation thereof, but they don’t have to be.

DICOM stands for Digital Imaging and Communications in Medicine. It is a standard created by the ACR (American College of Radiology) and NEMA (National Electrical Manufacturers Association). DICOM uses other standards, such as JPEG, and is used by standards built on top of it, such as IHE and HL7.

Consumer photos may contain a lot of EXIF metadata, but DICOM images can contain even more metadata. The DICOM standard is huge. It comes in 16 parts, and the data dictionary part alone is 274 pages. Pages 23 through 176 of the data dictionary consist of one long table defining possible DICOM data fields. Assuming an average of 30 fields per page, this is about 4,600 data fields. To make matters worse more flexible, many of these fields can contain arbitrarily long text strings. Well, not entirely arbitrary: fields must be less than 232 characters. Moby Dick is about 220 characters, so a 232 character limit is essentially no practical limit.

I have had numerous clients send me a description of their data in a small Excel file. Then they’ll say “and we have some images” meaning DICOM images. Maybe the Excel file contains a few dozen fields, but then the DICOM images potentially contain thousands of fields. The Excel file is burying the lede: the vast majority of the data (potentially) is in the DICOM images.

The enormous number of fields, and the lack of much structure to these fields, is a widely recognized problem. According to Mustra et al [1],

A major disadvantage of the DICOM Standard is the possibility for entering probably too many optional fields. This disadvantage is mostly showing in inconsistency of filling all the fields with the data. Some image objects are often incomplete because some fields are left blank and some are filled with incorrect data.

This is quite an understatement, saying that over four thousand fields is “probably too many optional fields.”

Related posts

[1] Mario Mustra, Kresimir Delac, Mislav Grgic. Overview of the DICOM Standard. 50th International Symposium ELMAR-2008, 10-12 September 2008, Zadar, Croatia

Photo by Cara Shelton on Unsplash

Personal information in digital photos

Is it possible to identify the people in the photo above? Maybe. Digital images potentially contain a large amount of metadata that could reveal the photographer’s identify and location. There may also be a surprising number of clues in the photo itself.

EXIF metadata

The standard format for image metadata is EXIF, Exchangeable Image File Format. Some of this information is obviously identifiable, such as fields called CameraOwnerName, Photographer, and ImageEditor. A camera may or may not include such information, and someone may remove this image from photos after they are taken, but this image is possible inside the photo.

Similarly, the photo may include information regarding where the photo was taken, such as in the GPSLatitude, GPSLongitude, and GPSAltitude fields. There are also fields for recording when the photo was taken or edited.

A recurring theme in data privacy is that information that is not obviously identifiable may still be used to identify someone. If this data doesn’t do the whole job, it narrows down possibilities to the point that other known information may complete the identification.

For example, the highly technical fields contained in an image could identify the camera equipment. The camera serial number directly identifies the camera, but other fields may indirectly identify the camera.

Similarly, a image without GPS data still maybe contain indirect location. For example, there are fields for recording temperature, humidity, and atmospheric pressure. These fields used in combination with timestamps could identify a location, or at least narrow down the set of possible locations.

There are many EXIF fields that are allowed to be arbitrarily long ASCII or Unicode (UTF-8) sequences. A program for editing EXIF data would allow someone to copy the contents of Moby Dick into one of these fields.

The next post describes a similar situation for medical images.

Clues in the photo itself

Stripping EXIF data from an image before making it public is a good idea both for privacy and for size. If a free text field does contain Moby Dick, you could make your image 1.2 MB smaller by removing it.

However, it’s often possible to detect from the photo itself where the photo was taken. I stumbled on a YouTube channel of someone who identifies photos as a hobby. No doubt there are many such people. The host invites people to send in photos and he uses openly available information to track down where they are.

If you strip the precise time and location information from the metadata, someone may be able to infer approximate replacements from clues in the photo itself such as shadows or seasonal vegetation.

Ordinary people have no idea how much location information can be inferred from a photo. Neither do some people who ought to know better. There was a story a few months ago about a photo at a secret military location whose position was inferred from, among other clues, stars that faintly appeared in the sky near dusk.

Update: As noted in the comments, Facebook has a patent on a way to identify people from the pattern of dust on their camera lenses.

Related posts

Photo by Evgeniy Prokofiev on Unsplash

What can you learn from a phone number?

What can someone learn about you from your phone number?

The answer depends on what other information someone has. Identifiers always depend on context. To a naked man in a tree [1] the phone number doesn’t carry any information. But to someone with a list of names and phone numbers, some sort of reverse phone number look up, it might tell them your name.

A while back I wrote about area codes and how they are distributed among states. NANPA publicly posts data that goes into greater detail with central office codes. Using this data, you can look up the first six digits of a phone number and find more specifically where the central office associated with the number is located geographically.

For example, take the phone number 469 863 7090. This is a business phone number, and so you could type it into a search engine and find out exactly whose number it is. But if that weren’t possible, you could look up 469-863 in the NANPA database to find that the number is located in Frisco, Texas. In fact, the number belongs to Sky Rocket Burger. Recommended.

Now people can move around and keep their mobile phone numbers, so any kind of phone look up may tell you about where someone used to be rather than where they are. That could be even more useful.

Related posts

[1] A lawyer once told me that his law school professor said that the only thing the interstate commerce clause of the US Constitution doesn’t apply to is a naked man in a tree.