Floor, ceiling, bracket

Mathematics notation changes slowly over time, generally for the better. I can’t think of an instance that I think was a step backward.

Gauss introduced the notation [x] for the greatest integer less than or equal to x in 1808. The notation was standard until relatively recently, though some authors used the same notation to mean the integer part of x. The two definitions agree if x is positive, but not if x is negative.

Not only is there an ambiguity between the two meanings of [x], it’s not immediately obvious that there is an ambiguity since we naturally think first of positive numbers. This leads to latent errors, such as software that works fine until the first person gives something a negative input.

In 1962 Kenneth Iverson introduced the notation ⌊x⌋ (“floor of x“) and ⌈x⌉ (“ceiling of x“) in his book A Programming Language, the book that introduced APL. According to Concrete Mathematics, Iverson

found that typesetters could handle the symbols by shaving off the tops and bottoms of ‘[‘ and ‘]’.

This slight modification of the existing notation made things much clearer. The notation [x] is not mnemonic, but clearly ⌊x⌋ means to move down and ⌈x⌉ means to move up.

Before Iverson introduced his ceiling function, there wasn’t a standard notation for the smallest integer greater than or equal to x. If you did need to refer to what we now call the ceiling function, it was awkward to do so. And if there was a symmetry in some operation between rounding down and rounding up, the symmetry was obscured by asymmetric notation.

My impression is that ⌊x⌋ became more common than [x] somewhere around 1990, maybe earlier in computer science and later in mathematics.

Iverson and APL

Iverson’s introduction of the floor and ceiling functions was brilliant. The notation is mnemonic, and it filled what in retrospect was a gaping hole. In hindsight, it’s obvious that if you have a notation for what we now call floor, you should also have a notation for what we now call ceiling.

Iverson also introduced the indicator function notation, putting a Boolean expression in brackets to denote the function that is 1 when the expression is true and 0 when the expression is false. Like his floor and ceiling notation, the indicator function notation is brilliant. I give an example of this notation in action here.

I had a small consulting project once where my main contribution was to introduce indicator function notation. That simple change in notation made it clear how to untangle a complicated calculation.

Since two of Iverson’s notations were so simple and useful, might there be more? He introduced a lot of new notations in his programming language APL, and so it makes sense to mine APL for more notations that might be useful. But at least in my experience, that hasn’t paid off.

I’ve tried to read Iverson’s lecture Notation as a Tool of Thought several times, and every time I’ve given up in frustration. Judging by which notations have been widely adopted, the consensus seems to be that the floor, ceiling, and indicator function notations were the only ones worth stealing from APL.

Box in ball in box in high dimension

Start with a unit circle and draw the largest square you can inside the circle and the smallest square you can outside the circle. In geometry lingo these are the inscribed and circumscribed squares.

The circle fits inside the square better than the square fits inside the circle. That is, the ratio of the area of the circle to the area of the circumscribed square is larger than the ratio of the area of the inscribed square to the area of the circle.

David Singmaster [1] generalized this theorem to higher dimensions as follows:

The n-ball fits better in the n-cube than the n-cube fits in the n-ball if and only if n ≤ 8.

Singmaster rigorously proves his statement in his paper. Here I will illustrate that it’s true with a graph.

We need three facts. First, the volume of a ball in n dimensions is

rn πn/2 / Γ((n+2)/2).

Second, the volume of a cube in n dimensions with edge 2r is


And finally, the edge of the inscribed cube is 2/√n. (To see this is right, note that the distance from the center of the cube to a corner is 1.)

This gives us all we need to create the following plot.

Both ratio curves go to zero quickly as n increases. (Note that the vertical scale is logarithmic, so the curves are going to zero exponentially.) This result is surprising but well known. Singmaster’s theorem about the ratio of the two curves is not as well known.

More high-dim geometry posts

[1] David Singmaster. On Round Pegs in Square Holes and Square Pegs in Round Holes. Mathematics Magazine, Nov., 1964, Vol. 37, No. 5, pp. 335-337

More on why simple approximations work

A few weeks ago I wrote several blog posts about very simple approximations that are surprisingly accurate. These approximations are valid over a limited range, but with range reduction they can be used over the full range of the functions.

In this post I want to look again at

\exp(x) \approx \frac{2 + x}{2 - x}


\log(x)\approx \frac{2x-2}{x + 1}

Padé approximation

It turns out that the approximations above are both Padé approximants [1], rational functions that match the first few terms of the power series of the function being approximated.

“First few” means up to degree mn where m is the degree of the numerator and n is the degree of the denominator. In our examples, mn = 1, and so the series terms up to order 2 match.


The approximations I wrote about before were derived by solving for a constant that made the approximation error vanish at the ends of the interval of interest. Note that there’s no interval in the definition of a Padé approximant.

Also, the constants that I derived were rounded in order to have something easy to compute mentally. The approximation for log, for example, works out to have a factor of 2.0413, but I rounded it to 2 for convenience.

And yet the end result is exactly was exactly a Padé approximant.


First let’s look at the exponential function. We can see that the series for our approximation and for exp match up to x².

\begin{align*} \exp(x) &= 1 + x + \frac{x^2}{2} + \frac{x^3}{6} + \ldots \\ \frac{2+x}{2-x} &= 1 + x + \frac{x^2}{2} + \frac{x^3}{4} + \ldots \\ \end{align*}

The error in the Padé approximation for exp is less than the error in the 2nd order power series approximation for all x less than around 0.78.


Here again we see that our function and our approximation have series that agree up to the x² terms.

\begin{align*} \log(x) &= (x-1) - \frac{1}{2}(x-1)^2 + \frac{1}{3}(x-1)^3 + \ldots \\ \frac{2x-2}{x+1} &= (x-1) - \frac{1}{2}(x-1)^2 + \frac{1}{4}(x-1)^3 + \ldots \end{align*}

The error in the Padé approximation for log is less than the error in the 2nd order power series approximation for all x

[1] The other approximations I presented in that series are not Padé approximations.

Mathematical stability vs numerical stability

Is 0 a stable fixed point of f(x) = 4x (1-x)?

If you set x = 0 and compute f(x) you will get exactly 0. Apply f a thousand times and you’ll never get anything but zero.

But this does not mean 0 is a stable attractor, and in fact it is not stable.

It’s easy to get mathematical stability and numerical stability confused, because the latter often illustrates the former. In this post I point out that the points on a particular unstable orbit cannot be represented exactly in floating point numbers, so iterations that start at the floating  point representations of these points will drift away.

In the example here, 0 can be represented exactly, and so we do have a computationally stable fixed point. But 0 is not a stable fixed point in the mathematical sense. Any starting point close to 0 but not exactly 0 will eventually go all over the place.

If we start at some very small ε > 0, then f(ε) ≈ 4ε. Every iteration multiplies the result by approximately 4, and eventually the result is large enough that the approximation no longer holds.

For example, suppose we start with x = 10-100. After 100 iterations we have

x = 4100 10-100 = 1.6 × 10-40.

If we were to plot this, we wouldn’t see any movement. But by the time we iterate our function 200 times, the result is chaotic. I get the following:

Spaceship operator in Python

Some programming languages, such as Perl, have an infix operator <=> that returns a three-state comparison. The expression

    a <=> b

evaluates to -1, 0, or 1 depending on whether a < b, a = b, or a > b. You could think of <=> as a concatenation of <, =, and >.

The <=> operator is often called the “spaceship operator” because it looks like Darth Vader’s ship in Star Wars.

Python doesn’t have a spaceship operator, but you can get the same effect with numpy.sign(a-b). For example, suppose you wanted to write a program to compare two integers.

You could write

    from numpy import sign
    def compare(x, y):
        cmp = ["equal to", "greater than", "less than"][sign(x-y)]
        print(f"{x} is {cmp} {y}.")

Here we take advantage of the fact that an index of -1 points to the last element of a list.

The sign function will return an integer if its argument is an integer or a float if its argument is a float. The code above will break if you pass in floating point numbers because sign will return -1.0, 0.0, or 1.0. But if you replace sign(x-y) with int(sign(x-y)) it will work for floating point arguments.

Related post: Symbol pronunciation

Real radical roots

The previous post Does chaos imply period 3? ended with looking at a couple cubic polynomials whose roots have period 3 under the mapping f(x) = 4x(1-x). These are

64 x³ – 112 x² + 56 x – 7


64 x³ – 96 x² + 36 x – 3.

I ended the post by saying you could find their roots numerically using the NSolve function in Mathematica.

What if you wanted to find the roots exactly? You could try using the NSolve function rather than the NSolve solve function, but you won’t get anything back. Doesn’t Mathematica know how to solve a cubic equation? Yes it does, and you can force it to use the cubic equation by using the ToRadicals function.

If you apply ToRadicals to the roots, Mathematica will give you the roots in radical form, with expressions involving complex numbers. For example, here’s the smallest root of the first polynomial:

 x = \frac{7}{12} -\frac { 7^{2/3} \left(1+i\sqrt{3}\right) }{ 12\ 2^{2/3} \sqrt[3]{-1+3 i \sqrt{3}} } -\frac{1}{24} \left(1-i \sqrt{3}\right) \sqrt[3]{ \frac{7}{2} \left(-1+3 i \sqrt{3}\right)}

But the roots of these polynomials are clearly real as you can see from their graphs.


If we didn’t have access to a graph, we could show that the roots are real and distinct by computing the discriminant and seeing that it is positive.

Casus irreducibilis

The polynomials have degree less than 5, so their roots can be expressed in terms of radicals. And the roots are real. But they cannot be expressed in terms of real radicals!

This an example of the “casus irreducibilis.” Most cubic equations with real roots cannot be solved using real radicals. The casus irreducibilis theorem says that the roots of a cubic polynomial can be written in terms of real radicals if and only if one of the roots is rational.

This says there’s a sort of excluded middle ground: the roots cannot involve cubic roots with real numbers. They are either simpler or more complex. If there is a rational root, you can factor it out and find the other two roots with the quadratic equation, i.e. only using square roots, not cubic roots. But if there is no rational root, expressing the roots in terms of radicals requires cube roots and complex numbers.

Our polynomials

The rational root test says that if a polynomial has a rational root p/q then q must be a factor of the leading coefficient and p must be a factor of the constant term. So in our first cubic

64 x³ – 112 x² + 56 x – 7

any rational root must have denominator 1, 2, 4, …, 64 and numerator either 1 or 7. We can check all the possibilities and see that none of them are zeros.

Similarly the only possible rational roots of

64 x³ – 96 x² + 36 x – 3

must have denominator 1, 2, 4, …, 64 and numerator either 1 or 3, and none of them are zeros.

So the roots of our cubic polynomials can be expressed in terms of radicals, but not without complex numbers, even though the roots are all real.

Unstable orbit

The reason we were interested in these two polynomials is that we were looking for orbits of period 3 under iterations of 4x(1-x). And we found them. But they form an unstable orbit.

If we make a cobweb plot of the iterations starting at any one of the roots, it appears that multiple applications of 4x(1-x) just rotate between the three roots.

But with a few more iterations—the plot below shows 100 iterations—we can see that these points are not stable. Unless you start exactly on one of the roots, which you cannot since they’re irrational, you will wander away from them.

Does chaos imply period 3?

Sharkovskii’s theorem says that if a continuous map f from an interval I to itself has a point with period 3, then it has a point with period 5. And if it has a point with period 5, then it has points with order 7, etc. The theorem has a chain of implications that all begins with 3. If you have points with period 3, you have points of all periods.

Now suppose you want to go “upstream” in Sharkovskii’s chain of implications. If you have a point with period 5, do you have a point with period 3? The answer is no: a map can have points with period 5 without having points of period 3, though it necessarily has points with all other positive integer periods except 3.

There’s an example illustrating the claim above that I’ve seen in multiple places, but I haven’t seen it presented graphically. You could work through the example analytically, but here I present it graphically.

This is the example function written in Python.

    def f(x):
        assert(1 <= x <= 5)
        if x < 2:
            return 1 + 2*x
        if x < 3:
            return 7 - x
        if x < 4:
            return 10 - 2*x
        if x <= 5:
            return 6 - x

Here’s a graphical demonstration that f has a fixed point, but no points of period 3.

The only point fixed under applying f three times is the point that was already fixed under applying f once.

This graph shows that f has points with period 5:

By Sharkovskii’s theorem f must have points with all other periods, except 3. Here’s a demonstration that it has points with period 6.

The map f is chaotic, but it does not have a point with period 3.

Logistic map

Let’s look at the most famous chaotic map, the logistic map.

f(x) = rx (1 – x)

where x is in [0, 1] and r is in [0. 4].

logistic bifurcation diagram

The images above shows orbits as r ranges over [0, 4]. Clearly f has points with period 2. There’s a whole interval of values of r that lead to points with period 2, roughly for r between 3 and 3.5. And we can see for r a little bigger there are points of period 4. But is there any point with period 3?

We can look for points of period 3 at the end of the plot, where r = 4, using Mathematica.


    f[x_] := 4 x (1 - x)

and look for points where f³(x) = x using

    Factor[f[f[f[x]]] - x]

This shows that the solutions are the roots of

x (-3 + 4 x) (-7 + 56x – 112x² + 64 x³) (-3 + 36x – 96x² + 64x³)

The first two roots are fixed points, points of period 1, but the roots of the two cubic factors are points with period 3.

The cubics clearly have all their roots in the interval [0,1] and we could find their numerical values with

    NSolve[f[f[f[x]]] == x, x]

Although the roots are fixed points, they are unstable fixed points, as demonstrated at the bottom the next post.

Related posts

Sarkovsky’s theorem

The previous post explained what is meant by period three implies chaos. This post is a follow-on that looks at Sarkovsky’s theorem, which is mostly a generalization of that theorem, but not entirely [1].

First of all, Mr. Sarkovsky is variously known Sharkovsky, Sharkovskii, etc. As with many Slavic names, his name can be anglicized multiple ways. You might use the regular expression Sh?arkovsk(ii|y) in a search.

The theorem in the previous post, by Li and Yorke, says that if a continuous function from a closed interval to itself has a point with period three, it has points with all positive periods. This was published in 1975.

Unbeknownst to Li and Yorke, and everyone else in the West at the time, Sarkovsky had published a more general result in 1964 in a Ukrainian journal. He demonstrated a total order on the positive integers so that the existence of a point with a given period implies the existence of points with all periods further down the sequence. The sequence starts with 3, and every other positive integer is in the sequence somewhere, so period 3 implies the rest.

Sarkivsky showed that period 3 implies period 5, period 5 implies period 7, period 7 implies period 9, etc. If a continuous map of an interval to itself has a point of odd period n > 1, it has points with order given by all odd numbers larger than n. That is, Sarkovsky’s order starts out

3 > 5 > 7 > …

The sequence continues

… 2×3 > 2×5 > 2×7 > …


… 2²×3 > 2²×5 > 2²×7 > …


… 2³×3 > 2³×5 > 2³×7 > …

and so on for all powers of 2 times odd numbers greater than 1.

The sequence ends with the powers of 2 in reverse order

… 2³ > 2² > 1.

Here’s Python code to determine whether period m implies period n, assuming m and n are not equal.

from sympy import factorint

# Return whether m comes befor n in Sarkovsky order
def before(m, n):

    assert(m != n)

    if m == 1 or n == 1:
        return m > n

    m_factors = factorint(m)
    n_factors = factorint(n)

    m_odd = 2 not in m_factors
    n_odd = 2 not in n_factors
    m_power_of_2 = len(m_factors) == 1 and not m_odd
    n_power_of_2 = len(n_factors) == 1 and not n_odd

    if m_odd:
        return m < n if n_odd else True if m_power_of_2: return m > n if n_power_of_2 else False

    # m is even and not a power of 2    
    if n_odd:
        return False
    if n_power_of_2:
        return True
    if m_factors[2] < n_factors[2]: return True if m_factors[2] > n_factors[2]:
        return False  
    return m < n

Next post: Can you swim “upstream” in Sarkovsky’s order?

[1] There are two parts to the paper of Li and Yorke. First, that period three implies all other periods. This is a very special case of Sarkovsky’s theorem. But Li and Yorke also proved that period three implies an uncountable number of non-periodic points, which is not part of Sarkovsky’s paper.

Period three implies chaos

One of the most famous theorems in chaos theory, maybe the most famous, is that “period three implies chaos.” This compact statement comes from the title of a paper [1] by the same name. But what does it mean?

This post will look at what the statement means, and the next post will look at a generalization.


First of all, the theorem refers to period in the sense of function iterations, not in the sense of translations. And it applies to particular points, not to the function as a whole.

The sine function is periodic in the sense that it doesn’t change if you shift it by 2π. That is,

sin(2π+x) = sin(x)

for all x. The sine function has period 2π in the sense of translations.

In dynamical systems, period refers to getting the same result, not when you shift a function, but when you apply it to itself. A point x has period n under a function f if applying the function n times gives you x back, but applying it any less than n times does not.

So, for example, the function f(x) = –x has period 2 for non-zero x, and period 1 for x = 0.


Period three specifically means

f( f( f(x) ) ) = x




xf( f(x) ).

Note that this is a property of x, not a property of f per se. That is, it is a property of f that one such x exists, but it’s not true of all points.

In fact it’s necessarily far from true for all points, which leads to what we mean by chaos.


If f is a continuous function from some interval I to itself, it cannot be the case that all points have period 3.

If one point has period 3, then some other point must have period 4. And another point must have period 97. And some point has period 1776.

If some point in I has period 3, then there are points in I that have period n for all positive n. Buy one, get infinitely many for free.

And there are some points that are not periodic, but every point in I is arbitrarily close to a point that is periodic. That is, the periodic points are dense in the interval. That is what is meant by chaos [2].

This is really an amazing theorem. It says that if there is one point that satisfies a simple condition (period three) then the function as a whole must have very complicated behavior as far as iteration is concerned.


If the existence of a point with period 3 implies the existence of points with every other period, what can you say about, for example, period 10? That’s the subject of the next post.

[1] Li, T.Y.; Yorke, J.A. (1975). “Period Three Implies Chaos” American Mathematical Monthly. 82 (10): 985–92.

[2] There is no single definition of chaos. Some take the existence of dense periodic orbits as the defining characteristic of a chaotic system. See, for example, [3].

[3] Bernd Aulbach and Bernd Kieninger. On Three Definitions of Chaos. Nonlinear Dynamics and Systems Theory, 1(1) (2001) 23–37

Better approximation for ln, still doable by hand

A while back I presented a very simple algorithm for computing natural logs:

log(x) ≈ (2x – 2)(x + 1)

for x between exp(-0.5) and exp(0.5). It’s accurate enough for quick mental estimates.

I recently found an approximation by Ronald Doerfler that is a little more complicated but much more accurate:

log(x) ≈ 6(x – 1)/(x + 1 + 4√x)

for x in the same range. This comes from Doerfler’s book Dead Reckoning.

It requires calculating a square root, and in exchange for this complication gives about three orders of magnitude better approximation. You could use it, for example, on a calculator that has a square root key but not log key. Or if you’re dead reckoning, you need to take a square root by hand.

Here’s a plot of the error for both approximations.

The simpler approximation has error about 10-2 on each end, whereas Doerfler’s algorithm has error about 10-5 on each end.

If you can reduce your range to [-1/√2, √2] by pulling out powers of 2 first and remembering the value of log(2), then Doerfler’s algorithm is about six times more accurate.

By the way, you might wonder where Doerfler’s approximation comes from. It’s not recognizable as, say, a series expansion. It comes from doing Richardson extrapolation, but algebraically simplified rather than expressing it as an algorithm.