Solid angle of a star

The apparent size of a distant object can be measured by projecting the object onto a unit sphere around the observer and calculating the area of the projected image.

A unit sphere has area 4π. If you’re in a ship far from land, the solid angle of the sky is 2π steradians because it takes up half a sphere.

If the object you’re looking at is a sphere of radius r whose center is a distance d away, then its apparent size is

\Omega = 2\pi\left(1 - \frac{\sqrt{d^2 - r^2}}{d}\right)

steradians. This formula assumes d > r. Otherwise you’re not looking out at the sphere; you’re inside the sphere.

If you’re looking at a star, then d is much larger than r, and we can simplify the equation above. The math is very similar to the math in an earlier post on measuring tapes. If you want to measure the size of a room, and something is blocking you from measuring straight from wall to wall, it doesn’t make much difference if the object is small relative to the room. It all has to do with Taylor series and the Pythagorean theorem.

Think of the expression above as a function of r and expand it in a Taylor series around r = 0.

\Omega = 2\pi\left(1 - \frac{\sqrt{d^2 - r^2}}{d}\right) = 2\pi\left(\frac{\sqrt{r^2}}{2d^2} + \frac{r^4}{8d^4} + \cdots \right)

and so

\Omega \approx \frac{\pi r^2}{d^2}

with an error on the order of (r/d)4. To put it another way, the error in our approximation for Ω is on the order of Ω². The largest object in the sky is the sun, and it has apparent size less than 10-4, so Ω is always small when looking at astronomical objects, and Ω² is negligible.

So for practical purposes, the apparent size of a celestial object is π times the square of the ratio of its radius to its distance. This works fine for star gazing. The approximation wouldn’t be as accurate for watching a hot air balloon launch up close.

Square degrees

Sometimes solid angles are measured in square degrees, given by π times the square of the apparent radius in degrees. This implicitly uses the approximation above since the apparent radius is r/d.


When I typed

    3.1416 (radius of sun / distance to sun)^2

into Wolfram Alpha I got 6.85 × 10-5. (When I used “pi” rather than 3.1416 it interpreted this as the radius of a pion particle.)

When I typed

    3.1416 (radius of moon / distance to moon)^2

I got 7.184 × 10-5, confirming that the sun and moon are approximately the same apparent size, which makes a solar eclipse possible.

The brightest star in the night sky is Sirius. Asking Wolfram Alpha

    3.1416 (radius of Sirius / distance to Sirius)^2

we get 6.73 × 10-16.

Related posts

Fixed points of the Fourier transform

This previous post looked at the hyperbolic secant distribution. This distribution has density

f_H(x) = \frac{1}{2} \text{sech} \left(\frac{\pi x}{2} \right)

and characteristic function sech(t). It’s curious that the density and characteristic function are so similar.

The characteristic function is essentially the Fourier transform of the density function, so this says that the hyperbolic secant function, properly scaled, is a fixed point of the Fourier transform. I’ve long known that the normal density is its own Fourier transform, but only recently learned that the same is true of the hyperbolic secant.

Hermite functions

The Hermite functions are also fixed points of the Fourier transform, or rather eigenfuctions of the Fourier transform. The eigenvalues are 1, i, -1, and i. When the eigenvalues are 1, we have fixed points.

There are two conventions for defining the Hermite functions, and multiple conventions for defining the Fourier transform, so the truth of the preceding paragraph depends on the conventions used.

For this post, we will define the Fourier transform of a function f to be

\hat{f}(\omega) = \frac{1}{\sqrt{2\pi}} \int_{-\infty}^\infty \exp(-i \omega x)\, f(x)\, dx

Then the Fourier transform of exp(-x²/2) is the same function. Since the Fourier transform is linear, this means the same holds for the density of the standard normal distribution.

We will define the Hermite polynomials by

H_n(x) = (-1)^n e^{x^2}\frac{d^n}{dx^n}e^{-x^2}

using the so-called physics convention. Hn is an nth degree polynomial.

The Hermite functions ψn(x) are the Hermite polynomials multiplied by exp(-x²/2). That is,

\psi_n(x) = H_n(x) \exp(-x^2/2)

With these definitions, the Fourier transform of ψn(x) equals (-i)n ψn(x). So when n is a multiple of 4, the Fourier transform of ψn(x) is ψn(x).

[The definition Hermite functions above omits a complicated constant term that depends on n but not on x. So our Hermite functions are proportional to the standard Hermite functions. But proportionality constants don’t matter when you’re looking for eigenfunctions or fixed points.]

Hyperbolic secant

Using the definition of Fourier transform above, the function sech(√(π/2) x) is its own Fourier transform.

This is surprising because the Hermite functions form a basis for L²(ℝ), and all have tails on the order of exp(-x²), but the hyperbolic secant has tails like exp(-x). Each Hermite function eventually decays like exp(-x²), but this happens later as n increases, so an infinite sum of Hermite functions can have thicker tails than any particular Hermite function.

Related posts

Hyperbolic secant distribution

I hadn’t run into the hyperbolic secant distribution until I saw a paper by Peng Ding [1] recently. If C is a standard Cauchy random variable, then (2/π) log |C| has a hyperbolic secant distribution. Three applications of this distribution are given in [1].

Ding’s paper contains a plot comparing the density functions for the hyperbolic secant distribution, the standard normal distribution, and the logistic distribution with scale √3/π. The scale for the logistic was chosen so that all three distributions would have variance 1.

There’s something interesting about comparing logistic distribution and the hyperbolic secant distribution densities: the former is the square of the latter, aside from some scaling, and yet the two functions are similar. You don’t often approximate a function by its square.

Here’s a plot of the two densities.

The hyperbolic secant density, the blue curve, crosses the logistic density around ± 0.56 and around ± 2.33.

The hyperbolic secant distribution has density

f_H(x) = \frac{1}{2} \text{sech} \left(\frac{\pi x}{2} \right)

and the logistic distribution, as scaled in above, has density

f_L(x) = \frac{\pi}{4\sqrt 3} \,\text{sech}^2 \left(\frac{\pi x}{2\sqrt 3} \right)

and so

\frac{\pi}{\sqrt 3} \,f_H(x)^2 = f_L(x)

Related posts

[1] Peng Ding. Three Occurrences of the Hyperbolic-Secant Distribution. The American Statistician , Feb 2014, Vol. 68, No. 1 (2014), pp. 32-35

Two-letter abbreviations.

Countries and regions have two letter abbreviations (ISO 3166-1) as do languages (ISO 639-1). So do chemical elements, US states, and books filed using the Library of Congress system.

I was curious how many of the 676 possible two-letter combinations are used by the abbreviation systems above. About two thirds, not as many as I expected.

There are 798 abbreviations in the lists mentioned, but a lot of overlap. For example, FR represents the country France, the language French, and the chemical element Francium.

There are five abbreviations that are part of all five lists: GA, LA, MT, NE, and PA.

  • GA (Gabon, Irish, gallium, Georgia, cartography)
  • LA (Lao People’s Democratic Republic, Latin, lanthanum, Louisiana, history of education)
  • MT (Malta, Maltese, meitnerium, Montana, music instruction)
  • NE (Niger, Nepali, neon, Nebraska, print media)
  • PA (Panama, Punjabi, protactinium, Pennsylvania, Greek and Latin language and literature)

Interpolating rotations with SLERP

Naive interpolation of rotation matrices does not produce a rotation matrix. That is, if R1 and R2 are rotation (orthogonal) matrices and 0 < t < 1, then

R(t) = (1-t)R_1 + tR_2

is not in general a rotation matrix.

You can represent rotations with unit quaternions rather than orthogonal matrices (see details here), so a reasonable approach might be to interpolate between the rotations represented by unit quaternions q1 and q2 using

q(t) = (1-t)q_1 + tq_2

but this has a similar problem: the quaternion above is not a unit quaternion.

One way to patch this up would be to normalize the expression above, dividing by its norm. That would indeed produce unit quaternions, and hence correspond to rotations. However, uniformly varying t from 0 to 1 does not produce a uniform rotation.

The solution, first developed by Ken Shoemake [1], is to use spherical linear interpolation or SLERP.

Let θ be the angle between q1 and q2. Then the spherical linear interpolation between q1 and q2 is given by

q(t) = \frac{\sin((1-t)\theta)}{\sin\theta}q_1 + \frac{\sin(t\theta)}{\sin\theta}q_2

Now q(t) is a unit quaternion, and uniformly increasing t from 0 to 1 creates a uniform rotation.

[1] Ken Shoemake. “Animating Rotation with Quaternion Curves.” SIGGRAPH 1985.

Shuffle product

riffle shuffle

The shuffle product of two words, w1 and w2, written

w1 Ш w2,

is the set of all words formed by the letters in w1 and w2, preserving the order of each word’s letters. The name comes from the analogy with doing a riffle shuffle of two decks of cards.

For example, bcd Ш ae, the shuffle product of bcd and ae, would be all permutations of abcde in which the consonants appear in alphabetical order and the vowels are also in alphabetical order. So abecd and baecd would be included, but badec would not be because the d and c are in the wrong order.

Side note on Ш

Incidentally, the symbol for shuffle product is the Cyrillic letter sha (Ш, U+0428), the only Cyrillic letter commonly used in mathematics, at least internationally. Presumably Russian mathematicians use other Cyrillic letters, but the only Cyrillic letter an American mathematician, for example, is likely to use is Ш.

The uses of Ш that I’m aware of are the Dirac comb distribution, the Tate–Shafarevich group, and the shuffle product.

Duplicate letters

What is the shuffle product of words containing duplicate letters? For example, what about the shuffle product of bread and crumb? Each word contains an r. The shuffle product, defined above as a set, doesn’t distinguish between the two rs. But another way to define the shuffle product is as a formal sum, with coefficients that count duplicates.

Imagine coloring the letters in abc blue and the letters in cde red. Then abccde and abccde would count as two different possibilities, one with blue c followed by red c, and one the other way around. This term in the formal sum would be 2abccde, the two capturing that there are two ways to arrive at this word.

You could also have duplicate letters within a single word. So in banana, for example, you could imagine coloring each a a different color and coloring the two ns different colors.

Mathematica code

This page gives an implementation of the shuffle product in Mathematica.

shuffleW[s1_, s2_] := 
     Module[{p, tp, ord}, 
          p = Permutations@Join[1 & /@ s1, 0 & /@ s2]\[Transpose];
          tp = BitXor[p, 1];
          ord = Accumulate[p] p + (Accumulate[tp] + Length[s1]) tp;
          Outer[Part, {Join[s1, s2]}, ord, 1][[1]]\[Transpose]]

This code takes two lists of characters and returns a list of lists of characters. You can use this to compute both senses of the shuffle product. For example, let’s compute abc Ш ac.

The Mathematica command

    shuffleW[{a, b, c}, {a, c}]

returns a list of 10 lists:

    {{a, b, c, a, c}, {a, b, a, c, c}, {a, b, a, c, c}, 
     {a, a, b, c, c}, {a, a, b, c, c}, {a, a, c, b, c}, 
     {a, a, b, c, c}, {a, a, b, c, c}, {a, a, c, b, c}, 
     {a, c, a, b, c}}

If we ask for the union of the set above with Union[%] we get

    {{a, a, b, c, c}, {a, a, c, b, c}, {a, b, a, c, c}, 
     {a, b, c, a, c}, {a, c, a, b, c}}

So using the set definition, we could say

abc Ш ac = {aabcc, aacbc, abacc, abcac, acabc}.

Using the formal sum definition we could say

abc Ш ac = 4aabcc + 2aacbc + 2abacc + abcacacabc.

Related posts

Photo by Amol Tyagi on Unsplash

Prime numbers and Taylor’s law

The previous post commented that although the digits in the decimal representation of π are not random, it is sometimes useful to think of them as random. Similarly, it is often useful to think of prime numbers as being randomly distributed.

If prime numbers were samples from a random variable, it would be natural to look into the mean and variance of that random variable. We can’t just compute the mean of all primes, but we can compute the mean and variance of all primes less than an upper bound x.

Let M(x) be the mean of all primes less than x and let V(x) be the corresponding variance. Then we have the following asymptotic results:

M(x) ~ x / 2


V(x) ~ x²/12.

We can investigate how well these limiting results fit for finite x with the following Python code.

    from sympy import sieve
    def stats(x):
        s = 0
        ss = 0
        count = 0
        for p in sieve.primerange(x):
            s += p
            ss += p**2
            count += 1
        mean = s / count
        variance = ss/count - mean**2
        return (mean, variance)

So, for example, when x = 1,000 we get a mean of 453.14, a little less than the predicted value of 500. We get a variance of 88389.44, a bit more than the predicted value of 83333.33.

When x = 1,000,000 we get closer to values predicted by the limiting formula. We get a mean of 478,361, still less than the prediction of 500,000, but closer. And we get a variance of 85,742,831,604, still larger than the prediction 83,333,333,333, but again closer. (Closer here means the ratios are getting closer to 1; the absolute difference is actually getting larger.)

Taylor’s law

Taylor’s law is named after ecologist Lionel Taylor (1924–2007) who proposed the law in 1961. Taylor observed that variance and mean are often approximately related by a power law independent of sample size, that is

V(x) ≈ a M(x)b

independent of x.

Taylor’s law is an empirical observation in ecology, but it is a theorem when applied to the distribution of primes. According to the asymptotic results above, we have a = 1/3 and b = 2 in the limit as x goes to infinity. Let’s use the code above to look at the ratio

V(x) / a M(x)b

for increasing values of x.

If we let x = 10k for k = 1, 2, 3, …, 8 we get ratios

0.612, 1.392, 1.291, 1.207, 1.156, 1.124, 1.102, 1.087

which are slowly converging to 1.

Related posts

Reference: Joel E. Cohen. Statistics of Primes (and Probably Twin Primes) Satisfy Taylor’s Law from Ecology. The American Statistician, Vol. 70, No. 4 (November 2016), pp. 399–404

The coupon collector problem and π

How far do you have to go down the decimal digits of π until you’ve seen all the digits 0 through 9?

We can print out the first few digits of π and see that there’s no 0 until the 32nd decimal place.


It’s easy to verify that the remaining digits occur before the 0, so the answer is 32.

Now suppose we want to look at pairs of digits. How far out do we have to go until we’ve seen all pairs of digits (or base 100 digits if you want to think of it that way)? And what about triples of digits?

We know we’ll need at least 100 pairs, and at least 1000 triples, so this has gotten bigger than we want to do by hand. So here’s a little Python script that will do the work for us.

    from mpmath import mp
    mp.dps = 30_000
    s = str(mp.pi)[2:] 
    for k in [1, 2, 3]:
        tuples = [s[i:i+k] for i in range(0, len(s), k)]
        d = dict()
        i = 0
        while len(d) < 10**k:
            d[tuples[i]] = 1
            i += 1

The output:


This confirms that we at the 32nd decimal place we will have seen all 10 possible digits. It says we need 396 pairs of digits before we see all 100 possible digit pairs, and we’ll need 6076 triples before we’ve seen all possible triples.

We could have used the asymptotic solution to the “coupon collector problem” to approximately predict the results above.

Suppose you have an urn with n uniquely labeled balls. You randomly select one ball at a time, return the ball to the run, and select randomly again. The coupon collector problem ask how many draws you’ll have to make before you’ve selected each ball at least once.

The expected value for the number of draws is

n Hn

where Hn is the nth harmonic number. For large n this is approximately equal to

n(log n + γ)

where γ is the Euler-Mascheroni constant. (More on the gamma constant here.)

Now assume the digits of π are random. Of course they’re not random, but random is as random does. We can get useful estimates by making the modeling assumption that the digits behave like a random sequence.

The solution to the coupon collector problem says we’d expect, on average, to sample 28 digits before we see each digit, 518 pairs before we see each pair, and 7485 triples before we see each triple. “On average” doesn’t mean much since there’s only one π, but you could interpret this as saying what you’d expect if you repeatedly chose real numbers at random and looked at their digits, assuming the normal number conjecture.

The variance on the number of draws needed is asymptotically π² n²/6, so the number of draws with usually be an interval of the expected value ± 2n.

If you want the details of the coupon collector problem, not just the expected value but the probabilities for different number of draws, see Sampling with replacement until you’ve seen everything.


Adding stars to constellations

Until yesterday, I was only aware of the traditional assignment of stars to constellations. In the comments to yesterday’s post I learned that H. A. Rey, best know for writing the Curious George books, came up with a new way of viewing the constellations in 1952, adding stars and connecting lines in order to make the constellations look more like their names. For example, to make Leo look more like a lion.

Book cover of Find The Constellations by H. A. Rey

The International Astronomical Union (IAU) makes beautiful star charts of the constellations, and uses Rey’s conventions, sorta.

This post will look at the example of Leo, from the IAU chart and from Rey’s book Find The Constellations.

(I wonder whether the ancients also added stars to what we received as the traditional versions of constellations. Maybe they didn’t consciously notice the other stars. Or maybe they did, but only saw the need to record the brightest stars, something like the way Hebrew only recorded the consonants of words.)

Here is the IAU star chart for Leo, cropped to just show the constellation graph. (The white region is Leo-as-region and the green lines are Leo-as-graph.)

Leo star chart IAU

Rey’s version of Leo is a little different. Here is my attempt to reproduce Rey’s version from page 9 of Find the Constellations.

And for comparison, here’s my reproduction of the IAU verson.

The solid blue lines are traditional. The dashed green lines were added by Rey and the IAU respectively.

Here is the Python code that produced the two images. Star names and coordinates are explained in the previous post.

# data from
import matplotlib.pyplot as plt

# star coordinates
δ = (11 + 14/60, 20 + 41/60)  
β = (11 + 49/60, 14 + 34/60)  
θ = (11 + 14/60, 15 + 26/60)  
α = (10 +  8/60, 11 + 58/60)  
η = (10 +  7/60, 16 + 46/60)  
γ = (10 + 20/60, 19 + 51/60)  
ζ = (10 + 17/60, 23 + 25/60)  
μ = ( 9 + 53/60, 26 +  0/60)  
ε = ( 9 + 46/60, 23 + 46/60)  
κ = ( 9 + 25/60, 26 + 11/60)  
λ = ( 9 + 32/60, 22 + 58/60)  
ι = (11 + 24/60, 10 + 32/60)  
σ = (11 + 21/60,  6 +  2/60)
ο = ( 9 + 41/60,  9 + 54/60)
ρ = (10 + 33/60,  9 + 18/60)

k = -20 # reverse and scale horizontal axis

def plot_stars(ss):
    for s in ss:
        plt.plot(k*s[0], s[1], 'ko')
def join(s0, s1, style, color):
    plt.plot([k*s0[0], k*s1[0]], [s0[1], s1[1]], style, color=color)    

def draw_iau():

    # traditional
    join(δ, β, '-', 'b')
    join(β, θ, '-', 'b')
    join(θ, η, '-', 'b')
    join(η, γ, '-', 'b')
    join(γ, ζ, '-', 'b')
    join(ζ, μ, '-', 'b')
    join(μ, ε, '-', 'b')
    join(δ, θ, '-', 'b')
    # added
    join(θ, ι, '--', 'g')
    join(ι, σ, '--', 'g')
    join(δ, γ, '--', 'g')
    join(ε, η, '--', 'g')
    join(μ, κ, '--', 'g')
    join(κ, λ, '--', 'g')
    join(λ, ε, '--', 'g')
    join(η, α, '--', 'g')

def draw_rey():
    plot_stars([δ,β,θ,α,η,γ,ζ,μ,ε,λ,ι,σ, ρ,ο])        

    # traditional
    join(δ, β, '-', 'b')
    # join(β, θ, '-', 'b')
    join(θ, η, '-', 'b')
    join(η, γ, '-', 'b')
    join(γ, ζ, '-', 'b')
    join(ζ, μ, '-', 'b')
    join(μ, ε, '-', 'b')
    join(δ, θ, '-', 'b')
    # added
    join(θ, ι, '--', 'g')
    join(ι, σ, '--', 'g')
    join(δ, γ, '--', 'g')
    join(λ, ε, '--', 'g')
    join(η, α, '--', 'g')
    join(λ, ε, '--', 'g')
    join(θ, ρ, '--', 'g')
    join(η, ο, '--', 'g')

Plotting constellations

Suppose you wanted to write a program to plot constellations. This leads down some interesting rabbit trails.

When you look up data on stars in constellations you run into two meanings of constellation. For example, Leo is a region of the night sky containing an untold number of stars. It is also a pattern of nine particular stars connected by imaginary lines. It’s easier to find data on the former, say sorted by brightness.

Are the nine brightest stars in Leo-the-region the nine stars of Leo-the-stick-figure? Not exactly, but close.

Wikipedia has an article that list stars in each constellation region, and star charts that have constellations as stick figures. If the stars on the chart are labeled, you can cross reference them with Wikipedia.

On a particular star chart I have, the stars in Leo are labeled with their Bayer designation. Roughly speaking the Bayer designation labels the stars within a constellation with Greek letters in descending order of brightness, but there are inconsistencies. The nomenclature goes back to Johann Bayer (1572–1625) and has its flaws.

The stars in Leo, in line-drawing order, are

  1. δ Leo
  2. Denebola (β Leo)
  3. θ Leo
  4. Regulus (α Leo)
  5. η Leo
  6. γ Leo
  7. ζ Leo
  8. μ Leo
  9. ε Leo

You can look up the coordinates of these stars here. Line-drawing order does not correspond to brightness order, so without a labeled star chart you’d have some research to do. My chart labels all the stars in Leo (the stick figure), but not, for example, in Virgo.

γ Leo is actually two stars, and Wikipedia ranks the brightness of the stars a little differently than Bayer did, which is understandable since brightness could not be objectively measured in his day. Wikipedia also inserts a few stars in between the stars listed above.

Here’s a plot of Leo using the data referenced above.