# Area and volume of hypersphere cap

A spherical cap is the portion of a sphere above some horizontal plane. For example, the polar ice cap of the earth is the region above some latitude. I mentioned in this post that the area above a latitude φ is

where R is the earth’s radius. Latitude is the angle up from the equator. If we use the angle θ down from the pole, we get

I recently ran across a generalization of this formula to higher-dimensional spheres in [1]. This paper uses the polar angle θ rather than latitude φ. Throughout this post we assume 0 ≤ θ ≤ π/2.

The paper also includes a formula for the volume of a hypersphere cap which I will include here.

## Definitions

Let S be the surface of a ball in n-dimensional space and let An(R) be its surface area.

Let Ix(a, b) be the incomplete beta function with parameters a and b evaluated at x. (This notation is arcane but standard.)

This is, aside from a normalizing constant, the CDF function of a beta(a, b) random variable. To make it into the CDF, divide by B(a, b), the (complete) beta function.

## Area equation

Now we can state the equation for the area of a spherical cap of a hypersphere in n dimensions.

Recall that we assume the polar angle θ satisfies 0 ≤ θ ≤ π/2.

It’s not obvious that this reduces to the equation at the top of the post when n = 3, but it does.

## Volume equation

The equation for the volume of the spherical cap is very similar:

where Vn(R) is the volume of a ball of radius R in n dimensions.

## Related posts

[1] Shengqiao Li. Concise Formulas for the Area and Volume of a Hyperspherical Cap. Asian Journal of Mathematics and Statistics 4 (1): 66–70, 2011.

# Extending harmonic numbers

For a positive integer n, the nth harmonic number is defined to be the sum of the reciprocals of the first n positive integers:

How might we extend this definition so that n does not have to be a positive integer?

## First approach

One way to extend harmonic numbers is as follows. Start with the equation

Then

Integrate both sides from 0 to 1.

So when x is an integer,

is a theorem. When x is not an integer, we take this as a definition.

## Second approach

then take the logarithm and derivative of both sides. This gives

where the digamma function ψ is defined to be the derivative of the log of the gamma function.

If x is an integer and we apply the identity above repeatedly we get

where γ is Euler’s constant. Then we can define

for general values of x.

## Are they equal?

We’ve shown two ways of extending the harmonic numbers. Are these two different extensions or are they equal? They are in fact equal, which follows from equation 12.16 in Whittaker and Watson, citing a theorem of Legendre.

Taking either approach as our definition we could, for example, compute the πth harmonic number (1.87274) or even the ith harmonic number (0.671866 + 1.07667i).

The digamma function satisfies an addition rule

which can be proved by taking the logarithm and derivative of Gauss’s multiplication rule for the gamma function.

Let z = x + 1/2 and add γ to both sizes. This shows that harmonic numbers satisfy the addition rule

# When does a function have an addition theorem?

## Motivating examples

The addition theorem for cosine says that

and the addition theorem for hyperbolic cosine is analogous, though with a sign change.

An addition theorem is a theorem that relates a function’s value at x + y to its values at x and at y. The squaring function satisfies a very simple addition theorem

and the Jacobi function sn satisfies a more complicated addition theorem.

## Defining an algebraic addition theorem

Which functions have addition theorems? Before we can answer this question we need to be more precise about what an addition theorem is. We’ve said that an addition theorem for φ relates φ(x + y) to φ(x) and φ(y). But what exactly do we mean by “relate”? What counts as a relation?

Also, the examples above don’t exactly satisfy this definition. The addition law for cosines, for example, relates cos(x + y) to the values of cos(x) and cos(y) but also to sin(x) and sin(y). Somehow that feels OK because sine and cosine are related. But here again we’re talking about things being related without saying exactly what we mean.

Weierstrass (1815–1897) made the idea of an addition theorem precise and classified functions having addition theorems. A function satisfies an algebraic addition theorem if there is a polynomial F in three variables such that

For example, if φ(x) = x² then

and so we could take F to be

Similarly, if φ(x) = cos x then

and so we could take F to be

## Classifying functions with algebraic addition theorems

Now for Weierstrass’ theorem. A meromorphic function φ(z) has an algebraic addition theorem if and only if it is an elliptic function of z, a rational function of z, or a rational function of exp(λz).

A meromorphic function is one that is analytic everywhere except at isolated singularities. To put it another way, we assume φ has a convergent power series everywhere in the complex plane except at isolated points.

The examples above illustrate all three cases of Weierstrass’ theorem. The function sn(z) is elliptic, the function z² is rational, and the functions cos(z) and cosh(z) are rational functions of exp(iz).

## Other kinds of addition theorems

Algebraic addition theorems are not the only kind of addition theorems. For example, Bessel functions satisfy a different kind of addition theorem:

This theorem relates the value of a Bessel function at x + y to the values of other Bessel functions at x and at y, but it is not an algebraic addition theorem because the right hand side is an infinite sum and because the Bessel functions are not algebraically related to each other.

# Circular, hyperbolic, and elliptic functions

This post will explore how the trigonometric functions and the hyperbolic trigonometric functions relate to the Jacobi elliptic functions.

There are six circular functions: sin, cos, tan, sec, csc, and cot.

There are six hyperbolic functions: just stick an ‘h’ on the end of each of the circular functions.

There are an infinite number of elliptic functions, but we will focus on 12 elliptic functions, the Jacobi functions.

## sn, sin, and tanh

The Jacobi functions have names like sn and cn that imply a relation to sine and cosine. This is both true and misleading.

Jacobi functions are functions of two variables, but we think of the second variable as a fixed parameter. So the function sn(z, m), for example, is a family of functions of z, with a parameter m. When that parameter is set to 0, sn coincides with sine, i.e.

sn(z, 0) = sin(z).

And for small values of m, sn(z, m) is a lot like sin(z). But when m = 1,

sn(z, 1) = tanh(z).

Think of m as a knob attached to the sn function. When we turn the knob to 0 we get the sine function. As we turn the knob up, sn becomes less like sine and more like hyperbolic tangent. The period becomes longer as a function of m, diverging to infinity as m approaches 1. Here’s a plot of one period of sn(x, m) for several values of m.

So while it’s true that sn can be thought of as a generalization of sine, it’s also a generalization of hyperbolic tangent.

Similarly, cn can be thought of as a generalization of cosine, but it’s also a generalization of hyperbolic secant.

All of the Jacobi functions are circular functions when m = 0 and hyperbolic functions when m = 1, with a caveat that we’ll get to shortly.

## The extra function

For the rest of this article, I’d like to say there are seven circular functions, making the constant function 1 an honorary trig function. I’d also like to call it a hyperbolic function and a Jacobi function.

With this addition, we can say that the seven circular functions consist of {sin, cos, 1} and all their ratios. And the seven hyperbolic functions consist of {sinh, cosh, 1} and all their ratios.

The 13 Jacobi functions consist of {sn, cn, dn, 1} and all their ratios.

## Circular functions are Jacobi functions

I mentioned above that sn(z, 0) = sin(z) and cn(z, 0) = cos(z). And the honorary Jacobi function 1 is always 1, regardless of m. So by taking ratios of sn, cn, and 1 we can obtain all ratios of sin, cos, and 1. Thus all circular functions correspond to a Jacobi function with m = 0.

There are more Jacobi functions than circular functions. We’ve shown that some Jacobi functions are circular functions when m = 0. Next we’ll show that all Jacobi functions are circular functions when m = 0.  For this we need one more fact:

dn(z. 0) = 1.

Never mind what dn is. For the purpose of this article it’s just one of the generators for the Jacobi functions. Since all the generators of the Jacobi functions are circular functions when m = 0, it follows that all Jacobi functions are circular functions when m = 0. Note that this would not be true if we hadn’t decided to call 1 a Jacobi function.

## Hyperbolic functions are Jacobi functions

We can repeat the basic argument above to show that when m = 1, hyperbolic functions are Jacobi functions and Jacobi functions are hyperbolic functions.

I mentioned above that sn(z, 1) = tanh(z) and cn(z, 1) = sech(z). I said above that the hyperbolic functions are generated by {sinh, cosh, 1} but they are also generated by {tanh, sech, 1} because

cosh(z) = 1/sech(z)

and

sinh(z) = tanh(z) / sech(z).

A set of generators for the hyperbolic functions are Jacobi functions with m = 1, so all hyperbolic functions and Jacobi functions with m = 1. To prove the converse we also need the fact that

dn(z, 1) = sech(z).

Since sn, cn, dn, 1 correspond to hyperbolic functions when m = 1, all Jacobi functions do as well.

## When Jacobi functions aren’t elliptic

The Jacobi functions are elliptic functions in general, but not when m = 0 or 1. So it’s not quite right to say that the circular and hyperbolic functions are special cases of the Jacobi elliptic functions. More precisely, they are special cases of the Jacobi functions, functions which are almost always elliptic, though not in the special cases discussed here. You might say that circular and hyperbolic functions are degenerate elliptic functions.

Elliptic functions are doubly periodic. They repeat themselves in two independent directions in the complex plane.

Circular functions are periodic along the real axis but not along the imaginary axis.

Hyperbolic functions are periodic along the imaginary axis but not along the real axis.

Jacobi functions have a horizontal period and a vertical period, and both are functions of m. When 0 < m < 1 both periods are finite. When m = 0 the vertical period becomes infinite, and when m = 1 the horizontal period becomes infinite.

# What good is a DE once you have the solution?

In an undergraduate course on differential equations, you see an equation, you find a solution. Wax on, wax off. It would be reasonable to assume that the differential equation is simply an obstacle to overcome and that once you have the solution, the differential equation has done its job and can be discarded.

It would seem backward to ask whether a function satisfies a differential equation. It is backward, but that doesn’t mean it’s a bad idea. It can be very useful to know that a function you’re interested in satisfies a nice differential equation. (It’s trivial to make up differential equations for any function; the key is to discover a nice differential equation, a simple relationship between a function and its derivatives.)

The previous post gave an example of this. There we said that although Halley’s root-finding method converges faster than Newton’s method, it also requires one more function evaluation, namely the second derivative of the function whose root you’re finding. But when that function satisfies a second order differential equation, you can get the second derivative for free once you’ve evaluated the function and its derivative. Well, at a deep discount at least if not free.

In that post we gave the example of a Bessel function. Once you have evaluated a Bessel function and its first derivative at a point. you can calculate the second derivative in much less time that it took to evaluate the Bessel function itself.

Bessel functions are not unique in this regard. Hypergeometric functions are a large and useful class of functions, and these functions all satisfy Riemann’s differential equation.

All the classic orthogonal polynomials—Chebyshev polynomials, Jacobi polynomials, Laguerre polynomials, Hermite polynomials, etc.—satisfy a linear second order differential equation. For example, the Legendre polynomials Pn satisfy

And its not just Bessel functions, hypergeometric functions, and orthogonal polynomials. Most special functions satisfy a useful differential equation. More specifically, a linear second order differential equation.

So far the only reason given for wanting to know what differential equation a function satisfies is to speed up calculations: evaluate two functions and get a third one for cheap. But there are many other reasons to care what differential equation a function satisfies. For example, a common way to show that two functions are equal is to show that they satisfy the same differential equation with the same initial conditions. This is applying a uniqueness theorem for differential equations, but many other equations about differential equations could be useful to apply.

# Bounds on the incomplete beta function (beta CDF)

The incomplete beta function is defined by

It is called incomplete because the limit of integration doesn’t necessarily run all the way up to 1. When z = 1 the incomplete beta function is simply the beta function

discussed in the previous post [1].

The incomplete beta function is proportional to the CDF of a beta random variable, and the proportionality constant is 1/B(x, y). That is, if X is a beta(a, b) random variable, then

The right-hand side of the equation above is often called the regularized incomplete beta function. This is what an analyst would call it. A statistician would call it the beta distribution function.

The previous post gave simple bounds on the (complete) beta function. This post gives bounds on the incomplete beta function. These bounds are more complicated but also more useful. The incomplete beta function is more difficult to work with than the beta function, and so upper and lower bounds are more appreciated.

One application of these bounds would be to speed up software that evaluates beta probabilities. If you only need to know whether a probability is above or below some threshold, you could first compute an upper or lower bound. Maybe computing the exact probability is unnecessary because you can decide from your bounds whether the probability is above or below your threshold. If not, then you compute the exact probability. This is less work if your bounds are usually sufficient, and more work if they’re usually not. In the former case this could result in a significant speed up because the incomplete beta function is relatively expensive to evaluate.

Cerone, cited in the previous post, gives two lower bounds, L1 and L2, and two upper bounds, U1 and U2, for B(x, y, z). Therefore the maximum of L1 and L2 is a lower bound and the minimum of U1 and U2 is an upper bound. The bounds are complicated, but easier to work with than the incomplete beta function.

where

## Related posts

[1] Incidentally, Mathematica overloads the Beta function just as we overload B. Mathematica’s Beta[x, y] corresponds to our B(x, y), but the three-argument version of Beta, the incomplete beta function, puts the limit of integration z first. That is, our B(x, y, z) corresponds to Beta[z, x, y] in Mathematica.

# Upper and lower bounds on the beta function

The beta function B(x, y) is defined by

and is the normalizing constant for the beta probability distribution. It is related to the gamma function via

The beta function comes up often in applications. It can be challenging to work with, however, and so estimates for the function are welcome.

The function 1/xy gives simple approximation and upper bound for B(x, y). Alzer [1] proved that when x > 1 and y > 1

where the constant b is defined by

Cerone [2] gives a different bound which varies with x and y and is usually better than Alzer’s bound. For x and y greater than 1, Cerone shows

where

and

Cerone’s bound is slightly larger in the worst case, near x = y = (3 + √5)/2, but is smaller in general.

The difference between B(x, y) and 1/xy is largest when x or y is small. We can visualize this with the Mathematica command

    Plot3D[Beta[x, y] - 1/(x y), {x, 0.5, 2.5}, {y, 0.5, 2.5}]

which produces the following plot.

The plot dips down in the corner where x and y are near 0.5 and curls upward on the edges where one of the variables is near 0.5 and the other is not.

Let’s look at B(x, y) and 1/xy at along a diagonal slice (3t, 4t).

This suggests that approximating B(x, y) with 1/xy works best when the arguments are either small or large, with the maximum difference being when the arguments are moderate-sized. In the plot we see B(3, 4) is not particularly close to 1/12.

Next lets look at 1/xyB(x, y) along the same diagonal slice.

This shows that the error bound C(x) C(y) is not too tight, but better than the constant bound except near the maximum of 1/xyB(x, y).

## Related posts

[1] H. Alzer. Monotonicity properties of the Hurwitz zeta function. Canadian Mathematical Bulletin 48 (2005), 333–339.

[2] P. Cerone. Special functions: approximations and bounds. Applicable Analysis and Discrete Mathematics, 2007, Vol. 1, No. 1, pp. 72–91

# nth root of n!

Last week the function

came up in a blog post as the solution to the equation n! = bn.

After writing that post I stumbled upon an article [1] that begins by saying, in my notation,

The function b(x) … has many applications in pure and applied mathematics.

The article mentions a couple applications but mostly focuses on approximations and bounds for the function b(x) and especially for the ratio b(x)/b(x + 1). In the article x is a positive real number, not necessarily an integer.

One of the results is the approximation

which is the beginning of an asymptotic series given in the paper.

Another result is the following upper and lower bounds on b(n).

Let’s look at some examples of these results using the following Python script.

    from numpy import exp, pi
from scipy.special import loggamma

b = lambda x: exp(loggamma(x+1)/x)

bratio = lambda x: b(x)/b(x+1)

bratio_approx = lambda x: (2*pi*x)**(0.5/(x*(x+1)))

def lower_bound_b(x):
r = exp(-0.5/x) * x*(2*pi*x)**(0.5/x)
r /= (1 + 1/x)**x
return r

def upper_bound_b(x):
return lower_bound_b(x)*exp(5/(12*x*x))

for x in [25, 100, 1000]:
print(bratio(x), bratio_approx(x))

for x in [25, 100, 1000]:
print(lower_bound_b(x), b(x), upper_bound_b(x))


The code exercising the ratio approximation prints

    0.964567 1.003897
0.990366 1.000319
0.999004 1.000004


and the code exercising the lower and upper bounds prints the following.

     10.170517  10.177141   10.177299
37.991115  37.992689   38.992698
369.491509 369.491663  369.491663


[1] Chao-Ping Chen and Cristinel Mortici. Asymptotic expansions and inequalities relating to the gamma function. Applicable Analysis and Discrete Mathematics, Vol. 16, No. 2 (October, 2022), pp. 379–399

# Drag equation exponent variation

The motion of a falling body of mass m is given by

where the term −kvr accounts for drag due to air resistance. One can derive r = 2 under simple physical assumptions, but if I remember correctly other values of r may be more realistic in certain circumstances. I don’t know much about the physics here; if you know about the use of other values of r, please let me know by leaving a comment.

## Terminal velocity

When r = 1 or r = 2 the differential equation above can be solved in terms of elementary functions, but otherwise it cannot. Nevertheless one can show that for all values of r the object reaches a terminal velocity, and calculate that velocity without explicitly solving the differential equation. William Waterhouse demonstrated this in a one-page article [1]. He rewrites the equation to look at time as a function of velocity rather than velocity as a function of time

and concludes

He notes that the integral diverges as v approaches

and so that is the terminal velocity, i.e. it takes an infinite amount of time to achieve this velocity. Waterhouse recommends this derivation as “a good example of deriving information about a problem without knowing an explicit solution.”

I would add that such an approach is the norm, not an exception. A closed-form solution to a differential equation in nice when you can get it, but usually not possible. And even when you can find a closed-form solution, you may be able to achieve your goal more directly by not using it.

## Hypergeometric solution

I suspected the differential equation could be solved for general values of r using special functions, and that is the case. Mathematica was able to evaluate the integral for t as a function of v in terms of a hypergeometric function.

When I asked Mathematica to solve the differential equation directly, it said that the solution is the inverse function of the function above. Apparently Waterhouse and Mathematica agree that it is easier to think of t as a function of v rather than the original formulation.

The notation 2F1 indicates there are two upper parameters and one lower parameter. In our application, the upper parameters are 1 and 1/r, the lower parameter is 1 + 1/r, and the function is evaluated at gkvr/m. You can find a brief introduction to hypergeometric functions here. A hypergeometric function 2F1 has a singularity at 1, and so we could derive the terminal velocity from the explicit solution.

## Mathematica implementation

Let c = gk/m. Then we can express velocity as a function of time in Mathematica by

    f[r_, c_, v_] := InverseFunction
[
#1 Hypergeometric2F1[1, 1/r, 1 + 1/r, c #1^r] &
][t]

and use this to make plots of the velocity for various values of c and r.

The following sets c = 2 and varies r over 1, 1.1, 1.2, … 2.

    Plot[Table[f[2, d/10, t], {d, 10, 20}], {t, 0, 4}, PlotRange -> All]

Here’s the output.

The terminal velocity decreases as r increases. The opposite is true for c < 1.

[1] William C. Waterhouse. A Fact about Falling Bodies. Mathematics Magazine, Vol. 44, No. 1 (Jan., 1971), pp. 33–34. The article straddles two pages, but takes up less than half of each page.

# Every factorial is a power

The previous post mentioned that 24! ≈ 1024 and 25! ≈ 1025.

For every n, there is some base b such that n! = bn. For example, 30! ≈ 1230.

It’s easy to find b [1]:

What’s interesting is that b is very nearly a linear function of n.

In hindsight it’s clear that this should be the case—it follows easily from Stirling’s approximation—but I didn’t anticipate this before I plotted it.

Now fix n and find b such that n! = bn. Since the relationship between n and b(n) is nearly linear, this suggests

which is true. It follows from the multiplication identity for the gamma function:

Let z = n + 1/2 so that the left side is (2n)!. On the right side, Γ(z + 1/2) = n! and Γ(z) is not too different from n!. The rest of the right side is 22n/√π.

So our observation that b(n) is nearly linear gave us a hint of Gauss’s multiplication formula.

[1] Numerically you would probably evaluate this function by calling a routine that computes log Γ(n + 1) directly without computing Γ(n + 1) first. This avoids overflow for large n.

This is why mathematical libraries will have not only gamma functions but also loggamma functions. The latter seems redundant, but it’s not.