Curiously simple approximations

As I’ve written about here and elsewhere, the following simple approximations are fairly accurate.

log10 x ≈ (x-1)/(x+1)

loge x ≈ 2 (x – 1)/(x + 1)

log2 x ≈ 3(x – 1)/(x + 1)

It’s a little surprising that each is as accurate as it is, but it’s also surprising that the approximations for loge and loge are small integer multiples of the approximation for log10.

Logarithms in all bases are proportional: If b and β are two bases, then for all x,

logβ(x) = logb(x) / logb(β).

So it’s not surprising that the approximations above are proportional. But the proportionality constants are not what the equation above would predict.

There are a couple things going on. First, these approximations are intended for rough mental calculations, so round numbers are desirable. Second, we want to minimize the error over some range. The approximation for logb(x) needs to work over the interval from 1/√b to √b. The approximations don’t work outside this range, but they don’t need to since you can always reduce the problem to computing logs in this interval. These two goals work together.

We could make the log10 x approximation more accurate near 1 if we multiplied it by 0.9. That would make the approximation a little harder to use, but it wouldn’t improve the accuracy over the interval 1/√10 to √10. The constant 1 works just as well as 0.9, and it’s easier to multiply by 1.

Here’s a plot of the approximation errors. Notice that different bases are plotted over different ranges since the log for each base b needs to work from 1/√b to √b.

 

100 digits worth memorizing

I was thinking today about how people memorize many digits of π, and how it would be much more practical to memorize a moderate amount of numbers to low precision.

So suppose instead of memorizing 100 digits of π, you memorized 100 digits of other numbers. What might those numbers be? I decided to take a shot at it. I exclude things that are common knowledge, like multiplication tables up to 12 and familiar constants like the number of hours in a day.

There’s some ambiguity over what constitutes a digit. For example, if you say the speed of light is 300,000 km/s, is that one digit? Five digits? My first thought was to count it as one digit, but then what about Avagadro’s number 6×1023? I decided to write numbers in scientific notation, so the speed of light is 2 digits (3e5 km/s) and Avagadro’s number is 3 digits (6e23).

Here are 40 numbers worth remembering, with a total of 100 digits.

Powers of 2

23 = 8
24 = 16
25 = 32
26 = 64
27 = 128
28 = 256
29 = 512
210 = 1024

Squares

13² = 169
14² = 196
15² = 225

Probability

P(|Z| < 1) ≈ 0.68
P(|Z| < 2) ≈ 0.95

Here Z is a standard normal random variable.

Music

Middle C ≈ 262 Hz
27/12 ≈ 1.5

The second fact says that seven half steps equals one (Pythagorean) fifth.

Mathematical constants

π ≈ 3.14
√2 ≈ 1.414
1/√2 ≈ 0.707
φ ≈ 1.618
loge 10 ≈ 2.3
log10 e ≈ 0.4343
γ ≈ 0.577
e ≈ 2.718

Here φ is the golden ratio and γ is the Euler-Mascheroni constant.

I included √2 and 1/√2 because both come up so often.

Similarly, loge 10 and log10 e are reciprocals, but it’s convenient to know both.

The number of significant figures above is intentionally inconsistent. It’s at least as easy, and maybe easier, to remember √2 as 1.414 than as 1.41. Similarly, if you’re going to memorize that log10 e  is 0.43, you might as well memorize 0.4343. Buy two get two free.

Each constant is truncated before a digit less than 5, so all the figures are correct and correctly rounded. For φ and loge 10 the next digit is a zero, so you get an implicit extra digit of precision.

The requirement that truncation = rounding means that you have to truncate e at either 2.7 or 2.718. If you’re going to memorize the latter, you could memorize six more digits with little extra effort since these digits are repetitive:

e = 2.7 1828 1828

Measurements and unit conversion

c = 3e8 m/s
g = 9.8 m/s²
NA = 6e23
Earth circumference = 4e7m
1 AU = 1.5e8 km
1 inch = 2.54 cm

Floating point

Maximum double = 1.8e308
Epsilon = 2e-16

These numbers could change from system to system, but they rarely do. See Anatomy of a floating point number.

Decibels

1 dB = 100.1 = 1.25
2 dB = 100.2 = 1.6
3 dB = 100.3 = 2
4 dB = 100.4 = 2.5
5 dB = 100.5 = 3.2
6 dB = 100.6 = 4
7 dB = 100.7 = 5
8 dB = 100.8 = 6.3
9 db = 100.9 = 8

These numbers are handy, even if you don’t work with decibels per se.

Update: The next post points out a remarkable pattern between the first and last sets of numbers in this post.

Related posts

What use is mental math today?

Now that most people are carrying around a powerful computer in their pocket, what use is it to be able to do math in your head?

Here’s something I’ve noticed lately: being able to do quick approximations in mid-conversation is a superpower.

Zoom call

When I’m on Zoom with a client, I can’t say “Excuse me a second. Something you said gave me an idea, and I’d like to pull out my calculator app.” Instead, I can say things like “That would require collecting four times as much data. Are you OK with that?”

There’s no advantage to being able to do calculations to six decimal places on the spot like Mr. Spock, and I can’t do that anyway. But being able to come up with one significant figure or even an order-of-magnitude approximation quickly keeps the conversation flowing.

I have never had a client say something like “Could you be more precise? You said between 10 and 15, and our project is only worth doing if the answer is more than 13.2.” If they did say something like that, I’d say “I will look at this more carefully offline and get back to you with a more precise answer.”

I’m combining two closely-related but separate skills here. One is the ability to simple calculations. The other is the ability to know what to calculate, how to do so-called Fermi problems. These problems are named after Enrico Fermi, someone who was known for being able to make rough estimates with little or no data.

A famous example of a Fermi problem is “How many piano tuners are there in New York?” I don’t know whether this goes back to Fermi himself, but it’s the kind of question he would ask. Of course nobody knows exactly how many piano tuners there are in New York, but you could guess about how many piano owners there are, how often a piano needs to be tuned, and how many tuners it would take to service this demand.

The piano tuner example is more complicated than the kinds of calculations I have to do on Zoom calls, but it may be the most well-known Fermi problem.

In my work with data privacy, for example, I often have to estimate how common some combination of personal characteristics is. Of course nobody should bet their company on guesses I pull out of the air, but it does help keep a conversation going if I can say on the spot whether something sounds like a privacy risk or not. If a project sounds feasible, then I go back and make things more precise.

Related links

Mental math trick and deeper issues

I’d like to do two things in this post. I’ll present a fun bit of mental math, then use it to illustrate a few deeper points.

The story starts with a Twitter thread I wrote on @AlgebraFact.

The Fibonacci series begins

1, 1, 2, 3, 5, 8, 13, 21, ….

So this gives us a sequence of possible conversion factors:

1/1, 1/2, 2/3, 3/5, 5/8, 8/13, 13/21, ….

Which conversion factor is best?

“Best” depends on one’s optimization criteria. A lot of misunderstandings boil down to implicitly assuming everyone has the same optimization criteria in mind when they do not.

For the purposes of mentally approximating the width of the Strait of Gibraltar, 8/13 is ideal because the resulting arithmetic is trivial. Accuracy is not an issue here: surely the Strait is not exactly 13 kilometers wide. There’s probably more error in the 13 km statement than there is in the conversion to miles.

But what if accuracy were an issue?

The reason consecutive Fibonacci ratios make good conversion factors between miles and kilometers is that

Fn+1 / Fn → φ = 1.6180 …

as n grows, where φ is the golden ratio. We’re wanting to convert from kilometers to miles, so we actually want the conjugate golden ratio

1/φ = φ − 1 = 0.6180 …

This is very close to the number of miles in a kilometer, which is 0.6214.

So back to our question: which conversion factor is best, if by best we mean most accurate?

The further out we go in the Fibonacci sequence, the closer the ratio of consecutive numbers is to the golden ratio, so we should go out as far as possible.

Except that’s wrong. We’ve committed a common error: optimizing a proxy. We don’t want to get as close as possible to the golden ratio; we want to get as close as possible to the conversion factor. The golden ratio was a proxy.

Initially, getting closer to our proxy got us closer to our actual goal. As we progress through the sequence

1/1, 1/2, 2/3, 3/5, 5/8, 8/13, 13/21

we get closer to 1/φ = 0.6180 and closer to our conversion factor 0.6214. But after 13/21 our goals diverge. Going further brings us closer to 1/φ but further from our conversion factor as the following plot illustrates.

Optimizing a proxy is not an error per se, but reification is. This is when you lose sight of your goal and forget that a proxy is a proxy. Optimizing a proxy is a practical expedient, and may be sufficient, but you have to remain aware that that’s what you’re doing.

For more along these lines, see the McNamara fallacy.

Mentally compute cosine

Ronald Doerfler presents a simple approximation for the cosine of an angle in degrees in his book Dead Reckoning.

cos d° ≈ 1 − d(d + 1)/7000

The approximation is good to about three digits for angles between 0° and 40°.

Although cosine is an even function, the approximation above is not, and the error is larger for negative angles. But there’s no need to use negative values since cos(-d) = cos(d). Just compute the cosine of the absolute value of the angle.

The simplest thing to do would have been to use a Taylor approximation, but Taylor approximations don’t work well over a large interval; they’re excessively accurate near the middle and not accurate enough at the ends. The error at 40° would have been 100x larger. Also, the Taylor approach would have required multiplying or dividing by a more mentally demanding number than 7000.

If we wanted to come up with a quadratic approximation for a computer rather than for a human, we could squeeze out more accuracy. The error curve over the interval [0, 40] dips down significantly more than it goes up. An optimal approximation would go up and down about the same amount.

Related posts

More on why simple approximations work

A few weeks ago I wrote several blog posts about very simple approximations that are surprisingly accurate. These approximations are valid over a limited range, but with range reduction they can be used over the full range of the functions.

In this post I want to look again at

\exp(x) \approx \frac{2 + x}{2 - x}

and

\log(x)\approx \frac{2x-2}{x + 1}

Padé approximation

It turns out that the approximations above are both Padé approximants [1], rational functions that match the first few terms of the power series of the function being approximated.

“First few” means up to degree mn where m is the degree of the numerator and n is the degree of the denominator. In our examples, mn = 1, and so the series terms up to order 2 match.

Luck

The approximations I wrote about before were derived by solving for a constant that made the approximation error vanish at the ends of the interval of interest. Note that there’s no interval in the definition of a Padé approximant.

Also, the constants that I derived were rounded in order to have something easy to compute mentally. The approximation for log, for example, works out to have a factor of 2.0413, but I rounded it to 2 for convenience.

And yet the end result is exactly was exactly a Padé approximant.

Exp

First let’s look at the exponential function. We can see that the series for our approximation and for exp match up to x².

\begin{align*} \exp(x) &= 1 + x + \frac{x^2}{2} + \frac{x^3}{6} + \ldots \\ \frac{2+x}{2-x} &= 1 + x + \frac{x^2}{2} + \frac{x^3}{4} + \ldots \\ \end{align*}

The error in the Padé approximation for exp is less than the error in the 2nd order power series approximation for all x less than around 0.78.

Log

Here again we see that our function and our approximation have series that agree up to the x² terms.

\begin{align*} \log(x) &= (x-1) - \frac{1}{2}(x-1)^2 + \frac{1}{3}(x-1)^3 + \ldots \\ \frac{2x-2}{x+1} &= (x-1) - \frac{1}{2}(x-1)^2 + \frac{1}{4}(x-1)^3 + \ldots \end{align*}

The error in the Padé approximation for log is less than the error in the 2nd order power series approximation for all x

[1] The other approximations I presented in that series are not Padé approximations.

Better approximation for ln, still doable by hand

A while back I presented a very simple algorithm for computing natural logs:

log(x) ≈ (2x − 2)/(x + 1)

for x between exp(−0.5) and exp(0.5). It’s accurate enough for quick mental estimates.

I recently found an approximation by Ronald Doerfler that is a little more complicated but much more accurate:

log(x) ≈ 6(x − 1)/(x + 1 + 4√x)

for x in the same range. This comes from Doerfler’s book Dead Reckoning.

It requires calculating a square root, and in exchange for this complication gives about three orders of magnitude better approximation. You could use it, for example, on a calculator that has a square root key but not log key. Or if you’re dead reckoning, you need to take a square root by hand.

Here’s a plot of the error for both approximations.

The simpler approximation has error about 10−2 on each end, whereas Doerfler’s algorithm has error about 10−5 on each end.

If you can reduce your range to [−1/√2, √2] by pulling out powers of 2 first and remembering the value of log(2), then Doerfler’s algorithm is about six times more accurate.

By the way, you might wonder where Doerfler’s approximation comes from. It’s not recognizable as, say, a series expansion. It comes from doing Richardson extrapolation, but algebraically simplified rather than expressing it as an algorithm.

Calculating square roots

Here’s a simple way to estimate the square root of a number x. Take a guess g at the root and compute the average of g and x/g.

If you want to compute square roots mentally or with pencil and paper, how accurate can you get with this method? Could you, for example, get within 1%?

Initial guess

Obviously the answer depends on your guess. One way to form an initial guess is to round x up to the nearest square and take the root of that as your guess. In symbols, we take ⌈√x⌉ as our guess.

For example, if x = 38, you could take 7 as your guess since 49 is the next square after 38. In that case you’d get

(7 + 38/7)/2 = 6.214

The correct value to three places is 6.164, so our error is about 0.8%.

But we could do better. 38 is much closer to 36 than to 49, so we could take 6 as our initial guess. That is, ⌊√38⌋. Then we have

(6 + 38/6) = 6.167

and our relative error is less than 0.04%.

We’ll look at three strategies:

  1. ⌊√x
  2. ⌈√x
  3. ⌊√x⌋ or ⌈√x

In strategy 3, we choose the guess whose square is closer to x.

Initial accuracy

Here’s what we get when we use the method above to estimate the square roots of the numbers from 1 to 100.

Guess 1 has maximum error 15.5%, where as guess 2 and guess 3 have maximum error 6%.

Guess 2, taking the ceiling, is generally better than guess 1, taking the floor. But guess 3, taking the better of the two, is even better.

More important than which of the three methods used to find the initial guess is being closer to the far end of the range. We could assume x is between 25 and 100 if we first multiply x by a square if necessary to get it into that range. To find the square root of 13, for example, we multiply by 4 and calculate the square root of 13 as half the square root of 52.

If we assume x is between 25 and 100, the maximum errors for the three initial guess methods are 1.4%, 1.3%, and 0.4%.

So to answer our initial question, yes, you can get relative error less than 1%, but you have to do a couple things. You have to multiply by a square to get your number into the range [25, 100] and you have to use the third method of guessing a starting approximation.

Initial and final error

You don’t have to move your number into the range [25, 100] if you put a little more effort into your initial guess. In the example of x = 13 mentioned above, multiplying by 4 before taking the square root is essentially the same as taking 7/2 as your initial guess for √13 instead of using 3 or 4 as an initial guess.

In short, our discussion of the range on x was really a discussion of initial error in disguise. The size of x determines the quality of our initial guesses, but the size of x itself doesn’t matter to the relative error.

Here’s a plot of the final error as a function of the error in the initial guess.

Notice two things. First, the final error is roughly proportional to the square of the error in the initial guess. Second, it’s a little better to guess high than to guess low, which explains why the second method above for guessing starting points is a little better than the first.

Iteration

Another way to get more accuracy is to repeat our process. That is, after we find our estimate, we use it as a new guess and apply our method again. For example, suppose we want to compute the square root of 30. We would find

(5 + 30/5)/2 = 5.5
(5.5 + 30/5.5)/2 = 5.47727

which is correct to four decimal places

We can find square roots to any accuracy we wish by applying this method enough times. In fact, what we’re doing is applying a special case Newton’s root-finding method.

We showed above that the error after each iteration is on the order of the square of the previous error. So roughly speaking, each iteration doubles the number of correct decimal places.

Higher-order roots

See the next post for an analogous method for computing cube roots and nth roots in general.

Related posts

Within one percent

This post looks at some common approximations and determines the range over which they have an error of less than 1 percent. So everywhere in this post “≈” means “with relative error less than 1%.”

Whether 1% relative error is good enough completely depends on context.

Constants

The familiar approximations for π and e are good to within 1%: π ≈ 22/7 and e ≈ 19/7. (OK, the approximation for e isn’t so familiar, but it should be.)

Also, the speed of light is c ≈ 300,000 km/s and the fine structure constant is α ≈ 1/137. See also Koide’s coincidence.

Trig functions

The following hold for angles in radians.

  • sin xx for |x| < 0.244.
  • cos x ≈ 1 − x²/2 for |x| < 0.662.
  • tan xx for |x| < 0.173.

Inverse trig functions

Here again angles are in radians.

  • arcsin xx for |x| < 0.242.
  • arccos x ≈ π/2 − x for |x| < 0.4.
  • arctan xx for |x| < 0.173.

Log

Natural log has the following useful approximation:

  • log(1 + x) ≈ x for −0.0199 < x < 0.0200.

Factorial and gamma

Sterling’s approximation leads to the following.

  • Γ(x) ≈ √(2π/x) (x/e)x for x > 8.2876.
  • n! ≈ √(2π/(n + 1)) ((n + 1)/e)(n+1) for n ≥ 8.

Commentary

Stirling’s approximation is different from the other approximations because it is an asymptotic approximation: it improves as its argument gets larger.

The rest of the approximations are valid over finite intervals. These intervals are symmetric when the function being approximated is symmetric, that is, even or odd. So, for example, it holds for sine but not for log.

For sine and tangent, and their inverses, the absolute error is O(x3) and the value is O(x), so the relative error is O(x2). [1]

The widest interval is for cosine. That’s because the absolute error and relative error are O(x4). [2]

The narrowest interval for is log(1 + x) due to lack of symmetry. The absolute error is O(x2), the value is O(x), and so the relative error is only O(x).

Verification

Here’s Python code to validate the claims above, assuming the maximum relative error always occurs on the ends, which it does in these examples. We only need to test one side of symmetric approximations to symmetric functions because they have symmetric error.

from numpy import *
from scipy.special import gamma

def sterling_gamma(x):
    return sqrt(2*pi/x)*(x/e)**x

id = lambda x: x

for f, approx, x in [
        (sin, id, 0.244),
        (tan, id, 0.173),
        (arcsin, id, 0.242),
        (arctan, id, 0.173),
        (cos, lambda x: 1 - 0.5*x*x, 0.662),
        (arccos, lambda x: 0.5*pi - x, 0.4),
        (log1p, id, 0.02),
        (log1p, id, -0.0199),
        (gamma, sterling_gamma, 8.2876)
    ]:
    assert( abs((f(x) - approx(x))/f(x)) < 0.01 )

Related posts

[1] Odd functions have only terms with odd exponents in their series expansion around 0. The error near 0 in a truncated series is roughly equal to the first series term not included. That’s why we get third order absolute error from a first order approximation.

[2] Even functions have only even terms in their series expansion, so our second order expansion for cosine has fourth order error. And because cos(0) = 1, the relative error is basically the absolute error near 0.

Simple approximation for Gamma function

I find simple approximations more interesting than highly accurate approximations. Highly accurate approximations can be interesting too, in a different way. Somebody has to write the code to compute special functions to many decimal places, and sometimes that person has been me. But somewhat ironically, these complicated approximations are better known than simple approximations.

One reason I find simple approximations interesting is that they suggest further exploration. For example, the approximation for the perimeter of an ellipse here is interesting because of the connection to r-means. There are more accurate approximations, but not more interesting approximations, in my opinion.

Another reason I find simple approximations interesting is that if they are simple enough to be memorable and easy to evaluate, they’re useful for quick mental calculations.

I’ve written several posts about simple approximations, and this may be the last one, at least for a while. I’ve covered trig functions, logs and exponents. The most commonly used function I haven’t discussed is the gamma function, which I cover now.

Gamma

The most important function that’s not likely to be on a calculator is the gamma function Γ(x). This function comes up all the time in probability and statistics, as well as in other areas of math.

The gamma function satisfies

Γ(x + 1) = x Γ(x),

and so you can evaluate the function anywhere if you can evaluate it on an interval of length 1. For example, the next section gives an approximation on the interval [2, 3]. We could reduce calculating Γ(4.2) to a problem on that interval by

Γ(4.2) = 3.2 Γ(3.2) = 3.2 × 2.2 Γ(2.2)

Simple approximation

It turns out that

Γ(x) ≈ 2/(4 − x)

for 2 ≤ x ≤ 3. The approximation is exact on the end points, and the relative error is less than 0.9% over the interval.

An even simpler approximation over the interval [2, 3] is

Γ(x) ≈ x − 1.

It has a maximum relative error of about 13%, so it’s far less accurate, but it’s trivial to compute in your head.

Derivation

The way I found the rational approximation was to use Mathematica to find the best bilinear approximation using

    MiniMaxApproximation[Gamma[x], {x, {2, 3}, 1, 1}]

and rounding the coefficients. In hindsight, I could have simply used blinear interpolation. I did use linear interpolation to derive the approximation x − 1.

More gamma function posts