One infinity or two?

If you want to add ∞ to the real numbers, should you add one infinity or two? The answer depends on context. This post gives several examples each of when its appropriate to add one or two infinities.

Two infinities: relativistic addition

A couple days ago I wrote about relativistic addition, where the sum of two numbers x and y is defined by

x \oplus y \equiv \frac{x + y}{1 + \dfrac{xy}{c^2}}

where c is some positive constant. If we restrict x and y to be inside the open interval (-c, c) there are no complications. We could extend this to the closed interval [-c, c] with c acting like ∞ and –c acting like -∞. The addition rule above is based on addition of velocities in relativity, the speed of light acting like an infinity. There are two infinities, corresponding to light traveling in opposite directions. There is one caveat: c ⊕ (-c) is undefined.

One infinity: parallel addition

Another novel way to add real numbers is parallel addition.

x \parallel y \equiv \frac{1}{\,\,\dfrac{1}{x} + \dfrac{1}{y}\,\,}

This kind of addition comes up often in applications. For example, the combined resistance of two resistors in parallel is the parallel sum of their individual resistances, hence the name. The harmonic mean of two numbers is twice their parallel sum.

If we understand 1/∞ to be zero, then ∞ is the identity element for parallel addition, i.e.

x \parallel \infty = \infty \parallel x = x

Now there’s only one infinity. If we incorporated -∞ it would behave exactly like ∞.

Two infinities: IEEE numbers

Standard floating point numbers have two infinities; numbers can overflow in either the positive or negative direction. For more on that, see IEEE floating point exceptions and Anatomy of a floating point number.

One infinity: Posit numbers

Posit numbers are an alternative to IEEE floating point numbers. Unlike IEEE numbers, posit numbers have only one infinity. This is graphically depicted in the unum and posit project logo below.

Posit project logo

One advantage of posits for low-precision arithmetic because fewer bits are devoted to infinities and other exceptional numbers. The overhead of these non-numbers is relatively greater in low-precision.

For an introduction to posits, see Anatomy of a posit number.

Topology: compactification

Compactification is the process of adding points to a topological space in order to create a compact space. In topological terms, this post is about one-point compactification and two-point compactification.

The two-point compactification of the real numbers is the extended real numbers [-∞, ∞] which are homeomorphic (topologically equivalent) to a closed interval, like [0, 1].

The one-point compactification of the real numbers (-∞, ∞) ∪ ∞ is homeomorphic to a circle. Think of bending the real line into a circle with a point missing, then adding ∞ to fill in the missing point.

Parallel addition can be extended to the complex numbers, and there is still only need for one ∞ to make things complete. The one-point compactification of the complex plane is homeomorphic to a sphere.

Elliptic curves

Elliptic curves have a single point at infinity, and as with parallel addition the point at infinity is the additive identity. Interestingly, elliptic curves over finite fields have a finite number of points. That is, they have something that acts like infinity even though they do not have an infinite number of points. This shows that the algebraic and topological ideas of infinity are distinct from the idea of infinity as a cardinality.

Radio Frequency Bands

The radio spectrum is conventionally [1] divided into frequency bands that seem arbitrary at first glance. For example, VHF runs from 30 to 300 MHz. All the frequency band boundaries are 3 times a power of 10.

Why all the 3’s? Two reasons: 3 is roughly the square root of 10, and the speed of light is 300,000 km/s. The former matters when you think in terms of frequencies, and the latter matters when you think in terms of wavelengths.

Frequencies

The frequency bands are centered (on a log scale) on powers of 10. The band centered at 10n runs from

3 × 10n−1 Hz

to

3 × 10n Hz.

Since 100.5 ≈ 3, this means that the log base 10 of the frequency band runs from roughly n − 1/2 to n + 1/2.

    |--------+-------+-----------------------------|
    | Center | Short | Long                        |
    |--------+-------+-----------------------------|
    |  10^0  | TLF   | Tremendously Low Frequency  |
    |  10^1  | ELF   | Extremely Low Frequency     |
    |  10^2  | SLF   | Super Low Frequency         |
    |  10^3  | ULF   | Ultra Low Frequency         |
    |  10^4  | VLF   | Very Low Frequency          |
    |  10^5  |  LF   | Low Frequency               |
    |  10^6  |  MF   | Medium Frequeny             |
    |  10^7  |  HF   | High Frequency              |
    |  10^8  | VHF   | Very High Frequency         |
    |  10^9  | UHF   | Ultra High Frequency        |
    |  10^10 | SHF   | Super High Frequency        |
    |  10^11 | EHF   | Extremely High Frequency    |
    |  10^12 | THF   | Tremendously High Frequency |
    |--------+-------+-----------------------------|

The table is easier to remember if you read it from the middle because it’s symmetric about 1 MHz.

From MF you either go up to HF or down to LF. Above HF is VHF and below LF is VLF. Above VHF is UHF, and below VLF and ULF. And so on.

The following bit of Python code computes the name of a frequency’s band following the discussion above.

    from math import log10

    def frequency_band(freq):
        decade = round(log10(freq))
        if decade in [5, 6, 7]:
            name = "LMH"[decade-5]
        else:
            name = "VUSET"[abs(decade-6)-2]
            name += "H" if decade > 6 else "L"
        return name + "F"

Wavelengths

A radio wave with frequency f has wavelength c/f. As noted above, The band centered at 10n runs from 3 × 10n-1 to 3 × 10n. This means the wavelengths vary from

3 × 108 m / 3 × 10n

to

3 × 108 m / 3 × 10n-1

and so the wavelength is between 108−n and 109−n meters.

To change the code above take wavelength as an argument, calculate the decade as

    decade = 8 − int(log10(wavelength))

Note that casting to integer truncates floats, taking the floor before changing the type.

Related posts

[1] Specifically, these are the conventions established by the International Telecommunications Union (ITU).

Relativistic addition

Let c be a positive constant and define a new addition operation on numbers in the interval (-c, c) by

x \oplus y \equiv \frac{x + y}{1 + \dfrac{xy}{c^2}}

This addition has several interesting properties. If x and y are small relative to c, then xy is approximately x + y. But the closer x or y get to c the more xy differs from x + y. In fact, xy can never escape the interval (-c, c).

The number c acts as a sort of point at infinity: cx = c for any x in (-c, c), i.e. nothing you add to c can make it any larger.

The constant is called c after the speed of light. The addition above is simply adding x and y as velocities in special relativity.

You can show that mapping z to c tanh z gives an isomorphism from the real numbers with ordinary addition to the interval (-cc) with ⊕ for addition. This is the transformation I used yesterday to graph a function that had too large a range to plot directly.

Update: Now that we have relativistic addition, how would we define relativistic multiplication?

Galileo’s polygon theorem

William J. Milne [1] attributes the following theorem to Galileo:

The area of a circle is a mean proportional between the areas of any two similar polygons, one of which is circumscribed about the circle and the other is isoparametric with the circle.

So imagine a polygon P and let S be an inscribed circle, that is, the largest circle that can fit inside P. Also imagine a second polygon P‘ the same shape as P but with perimeter equal to that of S. Then the ratio of the area of P to the area of S equals the ratio of the area of S to the area of P‘. Said another way, the area of S is the geometric mean between the areas of P and P‘.

In the figures above, the blue pentagon is P, the black circle is S, the green pentagon is P‘, and S and P‘ have the same perimeter.

Clearly the blue pentagon has a greater area than the black circle. The isoparametric theorem says that a circle has the maximum area for a given perimeter, so the black circle must have larger area than the smaller pentagon. This doesn’t prove Galileo’s theorem, but it does prove that at least the area of the circle is somewhere in between the areas of the two pentagons.

You can prove Galileo’s theorem for regular n-gons using the formulas in this post and one other fact: the ratio of the radii of inscribed and circumscribed circles sandwiching a regular n-gon is cos(π/n).

[1] William J. Milne. Plane and Solid Geometry. 1899. p. 377, exercise 135.

Visualizing real roots of a high degree polynomial

The previous post looked at finding the expected number of real zeros of high degree polynomials. If you wanted to see how many real roots a particular high degree polynomial has, you run into several difficulties.

If you use something like Descartes’ rule of signs, you’re likely to greatly over-estimate the number of roots. It would be easier to plot the function and look for the roots. But this brings up a couple more problems.

First, what range should you plot over? However far out you plot, how do you know there’s not another zero outside your plot?

Second, a high-degree polynomial is likely to vary by many orders of magnitude, making it impossible to see a plot on a single scale.

The first difficulty can be overcome by a theorem of Lagrange [1] that says all the zeros of an nth degree polynomial are contained in a disk of radius

\max\left(1, \sum_{i=0}^{n-1} \left| \frac{a_i}{a_n} \right| \right )

centered at the origin of the complex plane. Here ai is the coefficient of xi.

The second difficulty can be overcome by plotting a transform of the polynomial rather than the polynomial itself. The hyperbolic tangent function works well because it maps 0 to 0, and because it compresses the real line into the interval (-1, 1). It will give a distorted picture of the polynomial, but it will preserve the zeros.

We will illustrate all this with some Python code.

We can evaluate our polynomial from an array of coefficients using Horner’s method.

    def poly(coeff, x):
        p = 0.0
        for c in reversed(coeff):
            p *= x
            p += c
        return p

Next, we generate the coefficients of a random 100th degree polynomial.

    import scipy as sp
    from scipy.stats import norm
 
    sp.random.seed(20201228) # today's date

    degree = 100
    coeff = norm.rvs(size = degree+1)

Now we find Lagrange’s bounds on the roots.

    m = max(1, sum(abs(coeff[:-2]/coeff[-1])))

Now we can plot the tanh of our polynomial

    import numpy as np
    import matplotlib.pyplot as plt

    x = np.linspace(-m, m, 1000)
    y = poly(coeff, x)
    z = np.tanh(y)
    plt.plot(x, z)

This produces a plot that’s basically just a spike. Lagrange’s bound is very generous in this case, but at least we know we’ve looked widely enough to start with. if we zoom in on the spike we get the following.

If we hadn’t taken the tanh our plot would vary by over 80 orders of magnitude.

We can zoom in a bit to see the zeros more clearly.

[1] Another bound on zeros due to Cauchy is

1 + \max_{0 \leq i < n}\left\{ \left|\frac{a_i}{a_n}\right|\right \}

In this example, Cauchy’s bound is tighter. You could always compute both bounds and take the smaller bound.

Expected number of roots

Suppose you create a 100th degree polynomial by picking coefficients at random from a standard normal. How many real roots would you expect?

There are 100 complex roots by the fundamental theorem of algebra, but how many would you expect to be real?

A lot fewer than I would have expected. There’s a theorem due to Kac [1] that the expected number of zeros is asymptotically

2 log(n) / π.

A higher order asymptotic series is

2 log(n) / π + 0.6257… + 2/nπ + O(n−2)

So for an 100th degree polynomial, you’d expect around 3 or 4 real roots.The number of roots had to be even—the only way to have an odd number of roots is with multiple roots, and such roots have   probability zero—and so four roots would be common. Of course there could be up to 100 real roots, but that is extremely unlikely.

It turns out a polynomial would have to have degree over two million before the expected number of roots would 10.

Related posts

[1] Alan Edelman and Eric Kostlan. How many zeros of a random polynomial are real? Bulletin of the AMS. Volume 32, Number 1, January 1995

Large matrices rarely have saddlepoints

A matrix is said to have a saddlepoint if an element is the smallest element in its row and the largest element in its column. For example, 0.546 is a saddlepoint in the matrix below because it is the smallest element in the third row and the largest element in the first column.

\begin{bmatrix} -0.850 & 1.529 & \phantom{-}0.896 \\ -1.878 & 1.233 & -0.200 \\ \phantom{-}0.546 & 0.762 & \phantom{-}1.763 \\ -0.449 & 3.136 & \phantom{-}1.345 \end{bmatrix}

In [1], Goldman considers the probability of an m by n matrix having a saddlepoint if its elements are chosen independently at random from the same continuous distribution. He reports this probability as

\frac{m!\,\,n!}{(m+n-1)!}

but it may be easier to interpret as

\left. (m+n) \middle/ {m+n \choose n} \right.

When m and n are large, the binomial coefficient in the denominator is very large.

For square matrices, m and n are equal, and the probability above is 2n/(n + 1) divided by the nth Catalan number Cn.

A couple years ago I wrote about bounds on the middle binomial coefficient

\frac{2^{2n-1}}{\sqrt{n}} < \binom{2n}{n} < \frac{2^{2n-1/2}}{\sqrt{n}}

by E. T. H. Wang. These bounds let us conclude that P(n, n), the probability of an n by n matrix having a saddlepoint, is bounded as follows.

\frac{n^{3/2}}{2^{2n - 3/2}} < P(n,n) < \frac{n^{3/2}}{2^{2n-2}}

Here’s a plot of P(n, n) and its bounds.

plot of saddlepoint probability and its bounds

This shows that the probability goes to zero exponentially as n increases, being impossible to visually distinguish from 0 in the plot above when n = 8.

The upper bound on P(n, n) based on Wang’s inequalities is tighter than the corresponding lower bound. This is because Wang’s lower bound on the middle binomial coefficient is tighter (see plot here) and we’ve taken reciprocals.

Simulation

The theory above says that if we generate 4 by 3 matrices with independent normal random values, each matrix with have a saddle point with probability 1/5. The following code generates 10,000 random 4 by 3 matrices and counts what proportion have a saddle point.

    
    import numpy as np
    from scipy.stats import norm

    def has_saddlepoint(M):
        m, n = M.shape
        row_min_index = [np.argmin(M[i,:]) for i in range(m)]
        col_max_index = [np.argmax(M[:,j]) for j in range(n)]
        for i in range(m):
            if col_max_index[row_min_index[i]] == i:
                return True
        return False

    m, n = 4, 3
    N = 10000
    saddlepoints = 0

    for _ in range(N):
        M = norm.rvs(size=(m,n))
        if has_saddlepoint(M):
            saddlepoints += 1
    print(saddlepoints / N)

When I ran this, I got 0.1997, very close to the expected value.

More on random matrices

[1] A. J. Goldman. The Probability of a Saddlepoint. The American Mathematical Monthly, Vol. 64, No. 10, pp. 729-730

A generalization of sine and cosine

David Shelupsky [1] suggested a generalization of sine and cosine based on solutions to the system of differential equations

\begin{align*} \frac{d}{dt} \alpha_s(t) &= \phantom{-}\beta_s^{s-1}(t) \\ \frac{d}{dt} \beta_s(t) &= -\alpha_s^{s-1}(t) \end{align*}

with initial conditions αs(0) = 0 and βs(0) = 1.

If s = 2, then α(t) = sin(t) and β(t) = cos(t). The differential equations above reduce to the familiar fact that the derivative of sine is cosine, and the derivative of cosine is negative sine.

For larger even values of s, the functions αs and βs look like sine and cosine respectively, though flatter at their maxima and minima. Numerical experiments suggest that the solutions are periodic and the period increases with s. [2]

Here’s a plot for s = 4.

The first zero of α(t) is at 3.7066, greater than π. In the plot t ranges from 0 to 4π, but the second period isn’t finished.

If we look at the phase plot, i.e (α(t), β(t)), we get a shape that I’ve blogged about before: a squircle!

This is because, as Shelupsky proved,

\alpha_s^s(t) + \beta_s^s(t) = 1

Odd order

The comments above mostly concern the case of even s. When s is odd, functions αs and βs don’t seem much like sine or cosine. Here are plots for s = 3

and s = 5.

Other generalizations of sine and cosine

[1] David Shelupsky. A Generalization of the Trigonometric Functions. The American Mathematical Monthly, Dec. 1959, pp. 879-884

[2] After doing my numerical experiments I looked back more carefully at [1] and saw that the author proves that the solutions for even values of s are periodic, and that the periods increase with s, converging to 4 as s goes to infinity.

A connected topology for the integers

You can define a topology on the positive integers by choosing as an open basis sets the series of the form an + b where a and b are relatively prime positive integers. Solomon Golumb defines this topology in [1] and proves that it is connected.

But that paper doesn’t prove that proposed basis really is a basis. That is implicitly an exercise for the reader, and we carry out that exercise here. The proof will say that certain things can be computed, and we’ll go a step further an actually compute them with Python.

Proof

To prove that these sets form the basis of a topology, we need to establish two things.

  1. Every point is contained in some basis element.
  2. The intersection of basis elements is another basis element or empty.

The first is easy. For any positive number x, the sequence (x + 1)n + x contains x, and x + 1 is relatively prime to x. Any other number relatively prime to x would work just as well.

The second is more interesting. Consider two sequences: an + b and cm + d. A number x in the intersection satisfies the pair of equations

x = b mod a
x = d mod c.

If a and c are relatively prime, the Chinese Remainder Theorem says that the equation has a solution y, and the solutions are unique mod ac. So the intersection of the two series is acn + y.

If a and c are not relatively prime, let r be their greatest common divisor. Then the intersection of an + b and cm + d is another sequence of the same form if b and d are congruent mod r and empty otherwise.

Separating points

A topological space is connected if it is not the union of two disjoint open sets. An easy way to make a space connected is to not have any disjoint open sets. But Golumb’s topology has plenty of disjoint open sets.

In fact, his topology is Hausdorff, meaning that any two points can be separated by disjoint open sets. The proof is simple. Take two points s and t. Let p be any prime bigger than both s and t, and consider the two sets pn + s and pn + t.

Python calculation

Our proof split into two cases: when the strides of the two series are relatively prime and when they are not.

Relatively prime strides

To get concrete, suppose we have the sequences

2, 9, 16, 23, …

and

3, 14, 25, 35, …

or in other words 7n + 2 and 11n + 3. According to the theory above, these sequences intersect because 7 and 11 are relatively prime, and the intersection should be of the form 77n + y, and we should be able to compute y from the Chinese Remainder Theorem. We will use the function crt from SymPy. (See another example of using this function here.)

    >>> from sympy.ntheory.modular import crt
    >>> crt([7,11], [2, 3], symmetric=False)
    >>> (58, 77)

This reports that y = 58.

Now let’s verify that the intersection of our two series looks like 77n + 58.

    >>> A = set(2+7*n for n in range(100))
    >>> B = set(3+11*n for n in range(100))
    >>> sorted(A.intersection(B))
    [58, 135, 212, 289, 366, 443, 520, 597, 674]

Strides with a common factor: empty intersection

Now for the case where a and c have greatest common divisor r > 1. Consider, for example,

1, 16, 31, 46, …

and

2, 14, 26, 38, …

The sequences 15n + 1 and 12m + 2 do not intersect. The greatest common divisor of 15 and 12 is 3. Numbers in the first series are congruent to 1 mod 3 and numbers in the second series are congruent to 2 mod 3, so no number can be in both. If we didn’t realize this, we could call

    crt([15,12], [1, 2], symmetric=False)

and see that it returns None.

Strides with a common factor: finding the intersection

But now let’s look at

1, 16, 31, 46, …

and

13, 25, 37, 49, …

Now both sequences are congruent to 1 mod 3, so it’s at least feasible that the two series may intersect. And in fact they do.

    >>> crt([15,12], [1, 13], symmetric=False)
    (1, 60)

This says that the set of solutions are all congruent to 1 mod 60. Now 1 itself is not a solution, but 61 is, and so is 60n + 1 for all positive integers n. We can illustrate this as before by computing the intersection of the two series.

    >>> A = set(1+15*n for n in range(30))
    >>> B = set(13+12*n for n in range(30))
    >>> sorted(A.intersection(B))
    [61, 121, 181, 241, 301, 361]

Related posts

[1] Solomon W. Golomb. A Connected Topology for the Integers. The American Mathematical Monthly , Oct., 1959, pp. 663-665