Annuity and factorial notation

Actuarial mathematics makes frequent use of a notation that as far as I know isn’t used anywhere else, and that is a bracket enclosing the northeast corner of a symbol.

\angln

This is used as subscript of another symbol, such as an a or an s, and there may be other decorations such more superscripts or subscripts.

\ddot{a}_{\angln i}

For typesetting in LaTeX, the bracket is the only part that’s not standard. If you can put a bracket around a symbol, you can make the bracketed symbol a subscript just as you’d make anything else a subscript. LaTeX package actuarialangle lets you do this.

The angle notation that actuaries use for annuity-related calculations is not used anywhere else that I know of, but a variation was once used for factorials back in the 1800s, putting a bracket around the southwest corner of a symbol rather than the northeast. You can see examples here, including one use by David Hilbert.

snippet of paper by David Hilbert

Related post: Floor, ceiling, bracket

Base58 versus Base85 encoding

Base58 encoding and Base85 encoding are used to represent binary data in a human-friendly way. Base58 uses a smaller character set and so is more conservative. Base85 uses a larger character set and so is more efficient.

There is a gotcha in that “base” means something different in Base58 compared to Base85. More on that below.

Base58

Base58 encoding is primarily used as part of the Bitcoin system. It is part of the Base58Check protocol used for encoding addresses and keys.

Base58 encoding is essentially the same as mathematical base 58 encoding, with a specific character set. The symbols for the “digits” 0 through 57 are chosen to avoid typographically similar letters. We’ll give that character set in the examples below.

There is only one version of Base58 in common use as far as I know, unlike Base85.

Base85

Base85 is a more compact alternative to Base64 encoding. The former encodes 4 bytes in 5 characters while the latter requires 6 characters. Base85 is used inside the PDF format. It is also used in the patch encoding for git.

Base85 encoding is analogous to binary-coded decimal (BCD). In some early computer systems, integers would not be expressed in binary per se. Instead, each digit would be represented as by four bits. So to represent a number like 427, you’d express 4, 2, and 7 in binary: 0100 0010 0111. If you were to express 427 in binary you’d get 110101011.

Base85 breaks bits into 32-bit words, then expresses each word in base 85. So you might say it’s base 85-encoded 32-bit words by analogy to binary coded decimal.

There are variations on Base85 encoding that use different alphabets, and so two software packages that say they do Base85 encoding might produce different results.

Base85 is more efficient than Base58 in the sense that it represents data using fewer symbols. It is also more computationally efficient because each 32-bit word is encoded independently.

Examples

We give four examples below: Base58 and Base85 applied to four bytes of data and eight bytes of data. The data length matters for Base85.

Base58, four bytes

Let n = CAFEBABEhex = 3405691582ten. This is the “magic number” at the beginning of Java class files, a pun on “java” as a slang for coffee.

In base 58 this number would be

5:10:55:3:26:22

We can verify this as follows:

    >>> 5*58**5 + 10*58**4 + 55*58**3 + 3*58**2 + 26*58 + 22
    3405691582
    >>>  hex(_)
    '0xcafebabe'

The Base58 alphabet is

    123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz

and so the Base58 encoding of 0xCAFEBABE would be the 5th, 10th, 55th, … elements of this alphabet (with zero-based index) which results in 6Bx4TP.

Note that the Base58 alphabet contains the digit 1 but not the letter l. It contains the lower case letter o but not the capital letter 0 or the digit 0.

Base85, four bytes

Now suppose we want to encode n using Base85. Now we would get

65:20:50:84:67

If we use the alphabet

    !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstu

then the “digits” above become b5Sud.

Note that the Base85 alphabet contains characters that could be confused, such as 0 (zero), O (capital letter), o (lower case letter). The characters were chosen to be printable ASCII characters, not necessarily visually distinct.

Base58, eight bytes

Now suppose n = CAFEBABECAFEBABEhex = 14627333968358193854ten.

We convert n to base 58 to get

33:55:17:43:49:44:3:47:49:44:26

which becomes axJkrm4prmT in the Base58 alphabet.

Base85, eight bytes

To encode CAFEBABECAFEBABEhex in Base85 we do not convert the number to base 85. Instead, we convert each 4-byte word to base 85. So we get two copies of CAFEBABEhex and so the encoding is b5Sudb5Sud.

If we were to wrongly convert n to base 85, we’d get

63:13:1:27:77:35:57:62:38:49

which becomes `."<nDZ_GR which is not the correct encoding.

Related posts

Hyperinflation changes everything

Zimbabwe one hundred trillion dollar note

My two latest blog posts have been about compound interest. I gave examples of interest rates of 6% up to 18% per year.

Hyperinflation, with rates of inflation in excess of 50% per month, changes everything. Although many economists accept 50% per month as the threshold of hyperinflation, the world has seen worse. Zimbabwe, for example, faced 98% inflation per day. With hyperinflation lots of implicit assumptions and approximations break down.

One take away from the previous posts is that at moderate interest rates, the frequency of compounding doesn’t make that much difference. A nominal interest rate of 12%, compounded continuously, is effectively a 12.75% interest rate. Compound less than continuously, say monthly or daily, and the effective rate will be somewhere between 12% and 12.75%.

But now say interest is 50% per month. Simple interest would be 600% a year, but nobody would accept simple interest. Compounded every month, the effective interest rate would be 12975%. Compounded daily it would be 38433%. And compounded continuously it would be 40343%.

In the numerical post, I said that the difference between continuous interest and interest compounded n times per year was approximately r² / 2n. That works fine when r is say 0.12. When r = 10 it’s off by two orders of magnitude. The implicit assumption that r² < r breaks down when r > 1.

Hyperinflation causes unusual behavior, such as paying for a meal before you eat it rather than afterward, because by the end of the meal the price will be appreciably higher.

What I find hardest to understand about hyperinflation is that people continue to use hyperinflated currency far longer than I would imagine. Once you start using a clock rather than a calendar when doing interest calculations, I would think that people would abandon the inflated currency in favor of something harder, like gold or silver, or even cigarettes. And eventually people do, but “eventually” is further out than I would imagine. It’s absurd to haul paper money in a wheel barrow, and yet people do it.

Numerical problem with an interest calculation

The previous post looked at the difference between continuously compounded interest and interest compounded a large discrete number of times. This difference was calculated using the following Python function.

    def f(P, n, r) : return P*(exp(r) - (1 + r/n)**n)

where the function arguments are principle, number of compoundings, and interest rate.

When I was playing around with this I noticed

    f(1000000, 365*24*60*60, 0.12)

returned a negative value, −0.00066. If this were correct, it would mean that compounding a 12% loan once a second results in more interest than continuous compounding. But this cannot happen. Compounding more often can only increase the amount of interest.

The problem with the Python function is that when n is very large, exp(r) and (1 + r/n)n are so nearly equal that their subtraction results in a complete loss of precision, a phenomenon known as catastrophic cancellation. The two terms are indistinguishable within the limits of floating point calculation because the former is the limit of the latter as n goes to infinity.

One way to calculate

exp(r) − (1 + r/n)n

for moderately small r (such as typical interest rates) and very large n (such as the number of seconds in a year) would be to write out the power series for exp(r), expand (1 + r/n)n using the binomial theorem, and subtract.

Then we find that

exp(r) − (1 + r/n)nr² / 2n

is a good approximation.

When n = 365 × 24 × 60 × 60 and r = 0.12, the approximation gives 2.28 × 10−10 and the correct result is 2.57 × 10−10. The approximation is only good to one significant figure, but it gets the sign and the order of magnitude correct. You could retain more series terms for more accuracy.

Interest compounding with every heartbeat

When I was a child, I heard an advertisement for a bank that compounded the interest on your savings account with every heartbeat. I thought that was an odd thing to say and wondered what it meant. If you have a rapid heart rate, does your money compound more frequently?

I figured there was probably some fine print, such as saying interest was compounded once a second or something like that. Beyond some frequency it doesn’t matter that much how often interest is compounded, and that’s essentially what continuously compounded interest is: interest compounded so often that it doesn’t matter how often it is compounded [1].

So how often do you need to compound interest before the difference between discretely compounded interest and continuously compounded interest doesn’t matter? Well, that depends on what you think matters. The more demanding you are about what matters, the finer the discrete compounding needs to be. It also matters what the interest rate is. The following Python function [2] gives the difference between continuous compounding and compounding n times per year, at a percentage rate r and with principle P.

    def f(P, n, r) : return P*(exp(r) - (1 + r/n)**n)

Let’s first say that the frequency of compounding matters if it makes a difference of more than $1 on a loan of $1,000,000 over a year. The difference between continuous interest and compounding daily at 6% is $5.24. If we increase the frequency of compounding to hourly, the difference is $0.22, which we are saying does not matter.

When the interest rate goes up, the difference between continuous and discrete compounding also goes up. If we triple the interest rate to 18%, now the difference is $2.21, but if we go to compounding every minute, the difference is $0.04.

Now if we’re more demanding, and we want the difference in interest to be less than a cent on a principle of one million dollars, we need to compound even more often. In that case compounding once a second is enough, given an interest rate of 18%, which means that’s frequent enough for any lower interest rate.

Related posts

[1] You could make this statement rigorous by saying for every definition of what matters, i.e. for every tolerance ε, there exists an N such that for all n > N the difference between continuous compounding and compounding with n periods is less than ε.

[2] The Python function is correct in theory, and also in practice as long as n isn’t too big. Very large n could lead to a numerical problem, addressed in the next post.

Powers of 3 + √2

Yesterday’s post looked at the distribution of powers of x mod 1. For almost all x > 1 the distribution is uniform in the limit. But there are exceptions, and the post raised the question of whether 3 + √2 is an exception.

A plot made it look like 3 + √2 is an exception, but that turned out to be a numerical problem.

A higher precision calculation showed that the zeros on the right end of the plot were erroneous.

So this raises the question of how to calculate (3 + √2)n accurately for large n. The way I created the second plot was to use bc to numerically calculate the powers of 3 + √2. In this post, I’ll look at using Mathematica to calculate the powers symbolically.

For all positive integers n,

(3 + √2)n = an + bn√2

where an and bn are positive integers. We want to compute the a and b values.

If you ask Mathematica to compute (3 + √2)n it will simply echo the expression. But if you use the Expand function it will give you want you want. For example

    Expand[(3 + Sqrt[2])^10]

returns

    1404491 + 993054 √2

We can use the Coefficient function to split a + b √2 into a and b.

    parts[n_] := 
        Module[{x = (3 + Sqrt[2])^n}, 
            {Coefficient[x, Sqrt[2], 0], Coefficient[x, Sqrt[2], 1]}]

Now parts[10] returns the pair {1404491, 993054}.

Here’s something interesting. If we set

(3 + √2)n = an + bn√2

as above, then the two halves of the expression on the right are asymptotically equal. That is, as n goes to infinity, the ratio

anbn√2

converges to 1.

We can see this by defining

    ratio[n_] := 
        Module[ {a = Part[ parts[n], 1], b = Part[parts[n], 2]}, 
        N[a / (b Sqrt[2])]]

and evaluating ratio at increasing values of n. ratio[12] returns 1.00001 and ratio[13] returns 1, not that the ratio is exactly 1, but it is as close to 1 as a floating point number can represent.

This seems to be true more generally, as we can investigate with the following function.

    ratio2[p_, q_, r_, n_] := 
        Module[{x = (p + q Sqrt[r])^n}, 
            N[Coefficient[x, Sqrt[r], 0]/(Coefficient[x, Sqrt[r], 1] Sqrt[r])]]

When r is a prime and

(p + q√r)n = an + bnr

then it seems that the ratio an / bnr converges to 1 as n goes to infinity. For example, ratio2[3, 5, 11, 40] returns 1, meaning that the two halves of the expression for (3 + 5√11)n are asymptotically equal.

I don’t know whether the suggested result is true, or how to prove it if it is true. Feels like a result from algebraic number theory, which is not something I know much about.

Update: An anonymous person on X suggested a clever and simple proof. Observe that

\begin{align*} a_n &= \frac{(3+\sqrt{2})^n + (3-\sqrt{2})^n}{2} \\ b_n\sqrt{2} &= \frac{(3 + \sqrt{2})^n - (3-\sqrt{2})^n}{2} \end{align*}

In this form it’s clear that the ratio an / bn √2 converges to 1, and the proof can be generalized to cover more.

Multiples and powers mod 1

For a positive real number x, the expression x mod 1 means the fractional part of x:

x mod 1 = x − ⌊ x ⌋.

This post will look at the distributions of nx mod 1 and xn mod 1 for n = 1, 2, 3, ….

The distribution of nx mod 1 is easy to classify. If x is rational then nx will cycle through a finite number of values in the interval [0, 1]. If x is irrational then nx will be uniformly distributed in the interval.

The distribution of xn mod 1 is more subtle. We have three basic facts.

  1. If 0 ≤ x < 1, then xn → 0 as n → ∞.
  2. If x = 1 then xn = 1 for all n.
  3. For almost all x > 1 the sequence xn mod 1 is uniformly distributed. But for some values of x it is not, and there’s no known classification for these exceptional values of x.

The rest of the post will focus on the third fact.

Suppose x = √2. Then for all even n, xn mod 1 = 0. The sequence isn’t uniformly distributed in [0, 1] because half the sequence is piled up at 0.

For another example, let’s look at powers of the golden ratio φ = (1 + √5)/2. For even values of n the sequence φn converges to 1, and for odd values of n it converges to 0. You can find a proof here.

At one time it was not known whether xn mod 1 is uniformly distributed for x = 3 + √2.  (See HAKMEM, item 32.) I don’t know whether this is still unknown. Let’s see what we can tell from a plot.

Apparently sometime around n = 25 the sequence suddenly converges to 0. That looks awfully suspicious. Let’s calculate x25 using bc.

    $ echo '(3 + sqrt(2))^25' | bc -l
    13223107213151867.43024035874391918714

This says x25 mod 1 should be around 0.43, not near 0. The reason we’re seeing all zeros is that all the bits in the significand are being used to store the integer part and none are left over for the fractional part. A standard floating point number has 53 bits of significand and

(3 + √2)25 > 253.

For more details, see Anatomy of a floating point number.

Now let’s see what happens when we calculate xn mod 1 correctly using bc. Here’s the revised plot.

Is this uniformly distributed? Well, it’s not obviously not uniformly distributed. But we can’t tell by looking at a plot because uniform distribution is an asymptotic property.

We didn’t give the definition of a uniformly distributed sequence until now because an intuitive understanding of the term was sufficient. Now we should be more precise.

Given a set E, a sequence ω, and an integer N, define A(E, ω, N) to be the number of elements in the first N elements of the sequence ω that fall inside E. We say ω is uniformly distributed mod 1 if for every 0 ≤ ab ≤ 1,

limN → ∞ A([a, b), ω, N) / N = ba.

That is, the relative port of points that fall in an interval is equal to the length of the interval.

Now let’s go back to the example x = √2. We know that the powers of this value of x are not uniformly distributed mod 1. But we also said that for almost all x > 1 the powers of x are uniformly distributed. That means every tiny little interval around √2 contains a number y such that the powers of y are uniformly distributed mod 1. Now if y is very close to √2 and we plot the first few values of yn mod 1 then half of the values will be near 0. The sequence will not look uniformly distributed, though it is uniformly distributed in the limit.

Evaluating and lowering AI hallucination cost

AI models hallucinate, and always will [1]. In some sense this is nothing new: everything has an error rate. But what is new is the nature of AI errors. The output of an AI is plausible by construction [2], and so errors can look reasonable even when they’re wrong. For example, if you ask for the capital city of Illinois, an AI might err by saying Chicago, but it wouldn’t say Seattle.

You need to decide two things when evaluating whether to use an AI for a task: the cost of errors and the cost of validation.

Cost of error on horizontal axis, cost of validation on vertical axis

Image generation is an ideal task for AIs because the consequences of an error are usually low. And validation is almost effortless: if an image looks OK, it’s OK. So imagine generation is in the bottom left corner of the image above.

Text-to-speech is a little more interesting to evaluate. The consequences of an error depend on user expectations. Someone may be mildly annoyed when a computer makes an error reading an email out loud, but the same person may angry if he thought he was talking to a human and discovers he’s been talking to a machine.

Validating text-to-speech requires a human to listen to the audio. This doesn’t require much effort, though the cost might be too much in some contexts. (I’m in the middle of an AI-narrated book and wondering why the producer didn’t go to the effort of fixing the pronunciation errors. The print version of the book has high production quality, but not as much effort went into the audio.) So I’d say text-to-speech is somewhere in the bottom half of the graph, but the horizontal location depends on expectations.

Code generation could be anywhere on the graph, depending on who is generating the code and what the code is being used for. If an experienced programmer is asking AI to generate code that in principle he could have written, then it should be fairly easy to find and fix errors. But it’s not a good idea for a non-programmer to generate code for a safety-critical system.

Mitigating risks and costs

The place you don’t want to be is up and to the right: errors are consequential and validation is expensive. If that’s where your problem lies, then you want to either mitigate the consequences of errors or reduce the cost of validation.

This is something my consultancy helps clients with. We find ways to identify errors and to mitigate the impact of inevitable errors that slip through.

If you’d like help moving down and to the left, lowering the cost of errors and the cost of validation, let’s set up a meeting to discuss how we can help.

LET’S TALK

[1] LLMs have billions of parameters, pushing a trillion. But as large as the parameter space is, the space of potential prompts is far larger, and so the parameters do not contain enough information to respond completely accurately to every possible prompt.

[2] LLMs predict the word that is likely to come next given the words produced so far. In that sense, the output is always reasonable. If an output does not appear reasonable to a human, it is because the human has more context.

A Continental Divide for Newton’s Method

Newton’s method is a simple and efficient method for finding the roots of equations, provided you start close enough to the root. But determining the set of starting points that converge to a given root, or converge at all, can be very complicated.

In one case it is easy to completely classify where points converge. Suppose you have a quadratic polynomial p(x) with distinct roots a and b in the complex plane. The line perpendicular to the line between a and b is a sort of continental divide, splitting the plane into two watersheds. If you start Newton’s method at any point on a root’s side of the line, it will converge to that root.

To illustrate this, we will set a = 7 + 14i and b = 20 + 25i, and use Newton’s method to find the roots of

p(z) = (z − a)(z − b).

You can start far from the root and still converge, which isn’t typical but is the case for quadratic polynomials.

And the point you’ll converge to is determined by which side of the continental divide you start on. For example, the following code shows that if you start at 1000 + 2000i you will converge to 20 + 25i.

a = 7 + 14j
b = 20 + 25j

p      = lambda z: (z - a)*(z - b)
pprime = lambda z: (z - a) + (z - b)
newton = lambda z: z - p(z)/pprime(z)

z = 1000 + 2000j
for _ in range(12):
    z = newton(z)
    print(z)

If you’re not familiar with Python, note that it uses j for the imaginary unit. Here’s why.

Things are more complicated if you start Newton’s method exactly on the line dividing the two watersheds. The method will not converge. There is a dense set of starting points on the line that will cycle through a finite number of points. And there is a dense set of points that lead to division by zero. (See HAKMEM, item 3.)

Related posts