Archiving data on paper

This is a guest post by Ondřej Čertík. Ondřej formerly worked at Los Alamos National Labs and now works for GSI Technologies. He is known in the Python community for starting the SymPy project and in the Fortran community for starting LFortran. — John

 

Ondřej Čertík

I finally got to experiment a bit with archiving data on a regular A4 or US Letter page using a regular printer and a phone camera to read it. It’s been bothering me for about 10 years. What is the maximum data size that we can store on the paper and reliably retrieve?

My setup

It seems it is limited by my camera, not my printer. This image stores 30KB of data. I printed it, took a photo with my iPhone, and then decoded and unpacked it using tar and bzip2. It correctly unpacks into 360KB of C++ files, 6305 lines.

I am using the Twibright Optar software.

It uses Golay codes (used by Voyager) that can fix 3 bad bits in each 24-bit code word (which contains 12-bit payload, and 12-bit parity bits). If there are 4 bad bits, the errors are detected, but cannot be corrected. In my experiment, there were 6 codewords which had 3 bad bits, and no codewords with more bad bits, so there was no data loss. It seems most of the bad bits were in the area where my phone cast a shadow on the paper, so possibly retaking the picture in broad daylight might help.

Bad bits

Here are the stats from the optar code:

7305 bits bad from 483840, bit error rate 1.5098%.

49.1855% black dirt, 50.8145% white dirt and 0 (0%) irreparable. Golay stats
===========
0 bad bits      13164
1 bad bit       6693
2 bad bits      297
3 bad bits      6
4 bad bits      0
total codewords 20160

The original setup can store 200KB of data. I tried it, it seems to print fine. But it’s really tiny, and my iPhone camera can’t read it well enough, so nothing is recovered. So I used larger pixels. The 30KB is what I was able to store and retrieve and from the stats above, it seems this is the limit.

Other possibilities

Competing products, such as this one only store 3-4KB/page, so my experiment above is 10x that.

One idea is to use a different error correction scheme. The Golay above only uses 50% to store data. If we could use 75% to store data, that gives us 50% more capacity.

Another idea to improve is to use colors, with 8 colors we get 3x larger capacity, with 16 colors we get 4x more. That would give us around 100KB/page with colors. Can we do better?

Comparison with other storage techniques

I think the closest to compare is floppy disks. The 5¼-inch floppy disk that I used as a kid could store 180 KB single side, 360KB both sides. The 3½-inch floppy disk could store 720 KB single side or 1.44MB both sides. We can also print on both sides of the paper, so let’s just compare single side for both:

  • Optar original (requires a good scanner): 200 KB
  • My experiment above (iPhone): 30 KB
  • Estimate with 8 colors: 120 KB
  • 5¼-inch floppy disk: 180 KB
  • 3½-inch floppy disk: 720 KB

It seems that the 5¼-inch floppy disk is a good target.

Applications

One application is to store this at the end of a book, so that you don’t need to distribute CDs or floppy disks (as used to be the habit), but you just put 10-20 pages to the appendix, this should be possible to decode even 100 years from now quite reliably.

Another application is just archiving any projects that I care about. I still have some floppy disks in my parents’ attic, and I am quite sure they are completely unreadable by now. While I also have some printouts on paper from the same era 30 years ago, and they are perfectly readable.

I think the requirement to use a regular iPhone is good (mine is a few years old, perhaps the newest one can do better?). If we allow scanners, then of course we can do better, but not many people have a high quality scanner at home, and there is no limit to it: GitHub’s Arctic Vault uses microfilm and high quality scanner. I want something that can be put into a book, or article on paper, something that anyone can decode with any phone, and that doesn’t require any special treatment to store.

Emails moved to Substack

Substack icon

Until recently I used two email services: one to send out daily blog post announcements and another for monthly blog highlights. I’ve combined these into one Substack account for weekly blog highlights.

Apparently readers really like this move. Daily and monthly email subscriptions flatlined some time ago, but Substack subscriptions are going up steadily.

Substack is a kind of hybrid of RSS and Twitter. Like RSS, you can subscribe to articles. But like Twitter (and Mastodon, and many other platforms) you can also have a timeline of brief messages, which Substack calls notes.

Only articles trigger an email. You have to use the Substack app to see notes. I think this will work out well. My plan for now is to write a Substack article about once a week with blog highlights, and use notes to announce posts as they come out.

 

What’s the Best Code Editor?

Emacs, vi, TextEdit, nano, Sublime, Notepad, Wordpad, Visual Studio, Eclipse, etc., etc.—everyone’s got a favorite.

I used Visual Studio previously and liked the integrated debugger. Recently I started using VS again and found the code editing windows rather cluttered. Thankfully you can tone this down, if you can locate the right options.

Eclipse for Java has instantaneous checking for syntax errors. I have mixed feelings on this. Perhaps you could type a little more code before getting a glaring error message?

Concerning IDEs (integrated development environments) like this—I’ve met people who think that a full GUI-based IDE is the only way to go. Maybe so. However , there’s another view.

You’d think if anyone would know how to write code quickly, accurately and effectively, it would be world-class competitive programmers. They’re the best, right?

One of the very top people is Gennady Korotkevich. He’s won many international competitions.

What does he use? Far Manager, a text-based user interface tool with a mere two panels and command prompt. It’s based on 1980s pre-GUI file manager methodologies that were implemented under DOS.

It reminds me of a conversation I had with our admin when I was in grad school. I asked, “Why do you use vi instead of MS Word for editing documents?” Answer: “I like vi because it’s faster—your fingers never need to leave the keyboard.”

Admittedly, not all developer workflows would necessarily find this approach optimal. But still it makes you think. Sometimes the conventional answer is not the best one.

Do you have a favorite code editor? Please let us know in the comments.

Bounding the perimeter of a triangle between circles

Suppose you have a triangle and you know the size of the largest circle that can fit inside (the incircle) and the size of the smallest circle that can fit outside (the circumcircle). How would you estimate the perimeter of the triangle?

In terms of the figure below, if you know the circumference of the red and blue circles, how could you estimate the perimeter of the black triangle?

Triangle with inscribed and circumscribed circles

A crude estimate is that the triangle perimeter must be greater than the incircle circumference and less than the circumcircle circumference. But we can do better.

Gerretsen’s inequalities

It is conventional in this kind of problem to work with the semiperimeter of the triangle, half the perimeter, rather than the perimeter. Let r be the radius of the incircle, R the radius of the circumcircle, and s the semiperimeter of the triangle. Then Gerretsen’s inequalities say

16Rr − 5r² ≤ s² ≤ 4R² + 4Rr + 3r²

In the figure above,

r = 1.058, R = 2.5074, s = 6.1427

and Gerretsen’s inequalities give

36.8532  ≤ s² = 37.7327 ≤ 39.1200.

In the case of an equilateral triangle, Gerretsen’s inequalities are in fact equations.

Note that Gerretsen’s inequalities say nothing about the centers of the circles. An incircle must be inside a circumcircle, but for a variety of triangles with the centers of the two circles in different relations you have the same bounds.

Kooi’s inequality

Kooi’s inequality [2] gives another upper bound for the perimeter of the triangle:

s² ≤ ½ R(4R + r)² / (2Rr)

which in the example above gives a tighter upper bound, 38.9515.

Kooi’s upper bound is uniformly better than the upper bound half of Gerretsen’s inequalities. But further refinements are possible [3].

Related posts

[1] J. C. H. Gerretsen, Ongelijkheden in de driehoek. Nieuw Tijdschr 41 (1953), 1–7.

[2] O. Kooi, Inequalities for the triangle, Simon Stevin, 32 (1958), 97–101.

[3] Martin Lukarevski and Dan Stefan Marinescu. A refinement of the Kooi’s inequality, Mittenpunkt and applications. Journal of Mathematical Applications. Volume 13, Number 3 (2019), 827–832

Music of the spheres

The idea of “music of the spheres” dates back to the Pythagoreans. They saw an analogy between orbital frequency ratios and musical frequency ratios.

HD 110067 is a star 105 light years away that has six known planets in orbital resonance. The orbital frequencies of the planets are related to each other by small integer ratios.

The planets, starting from the star, are labeled b, c, d, e, f, and g. In 9 “years”, from the perspective of g, the planets complete 54, 36, 24, 16, 12, and 9 orbits respectively. So the ratio of orbital frequencies between each pair of consecutive planets are either 3:2 or 4:3. In musical terms, these ratios are fifths and fourths.

In the chord below, the musical frequency ratios are the same as the orbital frequency rations in the HD 110067 system.

Here’s what the chord sounds like on a piano:

hd11067.wav

Related posts

The Real Book

I listened to the 99% Invisible podcast about The Real Book this morning and thought back to my first copy.

My first year in college I had a jazz class, and I needed to get a copy of The Real Book, a book of sheet music for jazz standards. The book that was illegal at the time, but there was no legal alternative, and I had no scruples about copyright back then.

When a legal version came out later I replaced my original book with the one in the photo below.

The New Real Book Legal

The podcast refers to “When Hal Leonard finally published the legal version of the Real Book in 2004 …” but my book says “Copyright 1988 Sher Music Co.” Maybe Hal Leonard published a version in 2004, but there was a version that came out years earlier.

The podcast also says “Hal Leonard actually hired a copyist to mimic the old Real Book’s iconic script and turn it into a digital font.” But my 1988 version looks not unlike the original. Maybe my version used a kind of typesetting common in jazz, but the Hal Leonard version looks even more like the original handwritten sheet music.

Substack replacing email subscription

The service that sent out my email to blog subscribers stopped working a couple weeks ago, and I’m trying out Substack as a replacement. You can find my Substack account here.

My plan for now is to use this account to make blog post announcements, maybe once a week, with a little introductory commentary for each link. I expect to adjust course in response to feedback. Maybe I’ll write some posts just on Substack, but for now I intend to use it as a place to post blog round-ups.

The Stubstack is free and I have no intention to ever charge for it. It’s possible I might make some premium add-on in the future, but I doubt it. If I did, the blog post round-up would remain free.

 

Determinant of an infinite matrix

What does the infinite determinant

D = \left| \begin{array}{lllll} 1 & a_1 & 0 & 0 & \cdots \\ b_1 & 1 & a_2 & 0 & \cdots \\ 0 & b_2 & 1 & a_3 & \\ 0 & 0 & b_3 & 1 & \ddots \\ \vdots & \vdots & & \ddots & \ddots \\ \end{array} \right|

mean and when does it converge?

The determinant D above is the limit of the determinants Dn defined by

D_n = \left| \begin{array}{lllll} 1 & a_1 & & & \\ b_1 & 1 & a_2 & & \\ & b_2 & 1 & \ddots & \\ & & \ddots & \ddots & a_{n-1} \\ & & & b_{n-1} & 1 \\ \end{array} \right|

If all the a‘s are 1 and all the b‘s are −1 then this post shows that Dn = Fn, the nth Fibonacci number. The Fibonacci numbers obviously don’t converge, so in this case the determinant of the infinite matrix does not converge.

In 1895, Helge von Koch said in a letter to Poincaré that the infinite determinant is absolutely convergent if and only if the sum

\sum_{i=1}^\infty a_ib_i

is absolutely convergent. A proof is given in [1].

The proof shows that the Dn are bounded by

\prod_{i=1}^n\left(1 + |a_ib_i| \right)

and so the infinite determinant converges if the corresponding infinite product converges. And a theorem on infinite products says

\prod_{i=1}^\infty\left(1 + |a_ib_i| \right)

converges absolute if the sum in Koch’s theorem converges. In fact,

\prod_{i=1}^\infty\left(1 + |a_ib_i| \right) \leq \exp\left(\sum_{i=1}^\infty |a_ib_i| \right )

and so we have an upper bound on the infinite determinant.

Related post: Triadiagonal systems, determinants, and cubic splines

[1] A. A. Shaw. H. von Koch’s First Lemma and Its Generalization. The American Mathematical Monthly, April 1931, Vol. 38, No. 4, pp. 188–194

Area of quadrilateral as a determinant

I’ve written several posts about how determinants come up in geometry. These determinants often look similar, having columns related to coordinates and a column of ones. You can find several examples here along with an explanation for this pattern.

If you have three points z1, z2, and z3 in the complex plane, you can find the area of a triangle with these points as vertices

A(z_1, z_2, z_3) = \frac{i}{4} \, \left| \begin{matrix} z_1 & \bar{z}_1 & 1 \\ z_2 & \bar{z}_2 & 1 \\ z_3 & \bar{z}_3 & 1 \\ \end{matrix} \right|

You can read more about this here.

If you add the second column to the first, and subtract the first column from the second, you can get the equation for the area of a triangle in the real plane.

A = \frac{1}{2} \, \left| \begin{matrix} x_1 & y_1 & 1 \\ x_2 & y_2 & 1 \\ x_3 & y_3 & 1 \\ \end{matrix} \right|

Presumably the former equation was first derived from the latter, but the two are equivalent.

Now suppose you have a quadrilateral whose vertices are numbered in clockwise or counterclockwise order. Then the area is given by

A = \frac{1}{2} \, \left| \begin{matrix} x_1 & y_1 & 1 & 0\\ x_2 & y_2 & 1 & 1\\ x_3 & y_3 & 1 & 0\\ x_4 & y_4 & 1 & 1\\ \end{matrix} \right|

The proof is easy. If you expand the determinant by minors on the last column, you get the sum of two 3 × 3 determinants corresponding to the areas of the two triangles formed by splitting the quadrilateral along the diagonal connecting the first and third points.

A very accurate logarithm approximation

The previous post looked at an efficient way to approximate nth roots of fractions near 1 by hand. This post does the same for logarithms.

As before, we assume x = p/q and define

s = p + q
d = pq

Because we’re interested in values of x near 1, d is small, and small numbers are convenient to work with by hand.

In [1] Kellogg gives the approximation

log x ≈ 3(x² − 1)/((x+ 1)² + 2x) = 6ds/(3s² − d²)

So, for example, suppose we wanted to take the natural log of 7/8. then p = 7, q = 8, s = 15, and d = −1.

log x ≈ (6×15×(−1))/(3×225 − 1) = − 90/674 = − 45/337.

This approximation is good to six decimal places.

Kellogg claims that

This value of E [the natural logarithm], if q [what I’ve called x] be between .9 and 1.1, is true to the seventh decimal.

He then goes on to explain how to create an even more accurate approximation, and how to deal with larger values of x.

Here’s a plot verifying Kellogg’s claim.

Note the that scale of the plot is 10−8. As the flat spot in the middle suggests, you get even more decimal places for x closer to 1.

[1] Ansel N. Kellogg. Empirical formulæ; for Approximate Computation. The American Mathematical Monthly. February 1987, Vol. 4 No. 2, pp. 39–49.