Four generalizations of the Pythagorean theorem

Here are four theorems that generalize the Pythagorean theorem. Follow the links for more details regarding each equation.

1. Theorem by Apollonius for general triangles.

a^2 + b^2 = 2(m^2 + h^2)

2. Edsgar Dijkstra’s extension of the Pythagorean theorem for general triangles.

\text{sgn}(\alpha + \beta - \gamma) = \text{sgn}(a^2 + b^2 - c^2)

3. A generalization of the Pythagorean theorem to tetrahedra.

V_0^2 = \sum_{i=1}^n V_i^2

4. A unified Pythagorean theorem that covers spherical, plane, and hyperbolic geometry.

A(c) = A(a) + A(b) - \kappa \frac{A(a) \, A(b)}{2\pi}

Elementary symmetric polynomials and optimization

The mth elementary symmetric polynomial of degree n

e_m(x_1, x_2, \ldots, x_n)

is the sum of all terms containing a product of m variables. So, for example,

\begin{align*} e_1(w, x, y, z) &= w + x + y + z \\ e_2(w, x, y, z) &= wx + wy + wz + xy + xz + yz \\ e_3(w, x, y, z) &= xyz + wyz + wxz + wxy \\ e_4(w, x, y, z) &= wxyz \end{align*}
These polynomials came up in the previous post. The problem was choosing weights to minimize the variance of a weighted sum of random variables can be solved using elementary symmetric polynomials.

To state the optimization problem more generally, suppose you want to minimize

t_1^2 x_1 + t_2^2x_2 + \cdots + t_n^2 x_n

where the ti and xi are positive and the ti sum to 1. You can use Lagrange multipliers to show that the solution is

t_i = \frac{e_n(x_1, x_2, \cdots, x_n)}{x_i \,e_{n-1}(x_1, x_2, \cdots, x_n)}

Related posts

Weighting an average to minimize variance

Suppose you have $100 to invest in two independent assets, A and B, and you want to minimize volatility. Suppose A is more volatile than B. Then putting all your money on A would be the worst thing to do, but putting all your money on B would not be the best thing to do.

The optimal allocation would be some mix of A and B, with more (but not all) going to B. We will formalize this problem and determine the optimal allocation, then generalize the problem to more assets.

Two variables

Let X and Y be two independent random variables with finite variance and assume at least one of X and Y is not constant. We want to find t that minimizes

\text{Var}[tX + (1-t)Y]

subject to the constraint 0 ≤ t ≤ 1. Because X and Y are independent,

\text{Var}[tX + (1-t)Y] = t^2 \text{Var}[X] + (1-t)^2 \text{Var}[Y]

Taking the derivative with respect to t and setting it to zero shows that

t = \frac{\text{Var}[Y]}{\text{Var}[X] + \text{Var}[Y]}

So the smaller the variance on Y, the less we allocate to X. If Y is constant, we allocate nothing to X and go all in on Y.  If X and Y have equal variance, we allocate an equal amount to each. If X has twice the variance of Y, we allocate 1/3 to X and 2/3 to Y.

Multiple variables

Now suppose we have n independent random variables Xi for i running from 1 to n, and at least one of the variables is not constant. Then we want to minimize

\text{Var}\left[ \sum_{i=1}^n t_i X_i \right] = \sum_{i=1}^n t_i^2 \text{Var}[X_i]

subject to the constraint

\sum_{i=1}^n t_i = 1

and all ti non-negative. We can solve this optimization problem with Lagrange multipliers and find that

t_i \text{Var}[X_i] = t_j \text{Var}[X_j]

for all 1 ≤ i, jn. These (n − 1) equations along with the constraint that all the ti sum to 1 give us a system of equations whose solution is

t_i = \frac{\prod_{j \ne i} \text{Var}[X_j]}{\sum_{i = 1}^n \prod_{j \ne i} \text{Var}[X_j]}

Incidentally, the denominator has a name: the (n − 1)st elementary symmetric polynomial in n variables. More on this in the next post.

Related posts

Rolling correlation

Suppose you have data on the closing prices of two stocks over 1,000 days and you want to look at the correlation between the two asset prices over time in rolling 30 day windows.

It seems that the rolling correlation is periodic. peaking about every 50 days.

But this is an artifact of the rolling window, not a feature of the data. I created the two simulated stock time series by creating random walks. The price of the stock each day is the price the previous day plus a sample from a normal random variable with mean zero and variance 1.

import numpy as np
from scipy.stats import norm

n = 1000
x = np.cumsum(norm.rvs(size=n))
y = np.cumsum(norm.rvs(size=n))    

If you use a wider window, say 60 days, you’ll still see a periodic pattern in the rolling correlation, though with lower frequency.

Related posts

Analog of Heron’s formula on a sphere

The area of a triangle can be computed directly from the lengths of its sides via Heron’s formula.

A = \sqrt{s(s-a)(s-b)(s-c)}

Here s is the semiperimeter, s = (abc)/2.

Is there an analogous formula for spherical triangles? It’s not obvious there should be, but there is a formula by Simon Antoine Jean L’Huilier (1750–1840).

\tan^2 \frac{S}{4} = \tan \frac{s}{2} \tan \frac{s-a}{2} \tan \frac{s-b}{2} \tan \frac{s-c}{2}

Here we denote area by S for surface area, rather than A because in the context of spherical trigonometry A usually denotes the angle opposite side a. The same convention applies in plane trigonometry, but the potential for confusion is greater in L’Huilier’s formula since the area appears inside a tangent function.

Now tan θ ≈ θ for small θ, and so L’Huilier’s formula reduces to Heron’s formula for small triangles.

Imagine the Earth as a sphere of radius 1 and take a spherical triangle with one vertex at the north pole and two vertices on the equator 90° longitude apart. Then a = b = c = π/2 and s = 3π/4. Such a triangle takes of 1/8 of the Earth’s surface area of 4π, so the area S is π/2. You can verify that L’Huilier’s formula gives the correct area.

It’s not a proof, but it’s a good sanity check that L’Huilier’s formula is correct for small triangles and for at least one big triangle.

How much is a gigawatt?

There’s increasing talk of gigawatt data centers. Currently the largest data center, Switch’s Citadel Campus in Nevada, uses 850 megawatts of power. OpenAI’s Stargate data center, under construction, is supposed to use 1.2 gigawatts.

Gigawatt

An average French nuclear reactor produces about a gigawatt of power. If the US were allowed build nuclear reactors, we could simply build one reactor for every gigantic data center. Unfortunately, the Nuclear Regulatory Commission essentially prohibits the construction of profitable nuclear reactors.

An American home uses about 1200 watts of power, so a gigawatt of electricity could power 800,000 homes. So roughly, a gigawatt is a megahome.

Gigawatt-year

A gigawatt is a unit of power, not energy. Energy is power over some time period.

A gigwatt-year is about 3 × 1016 joules, or 30 petajoules.

A SpaceX Starship launch releases 50 terajoules of energy, so a gigawatt-year is 60 Starship launches.

A couple months ago I wrote about illustrating crypographic strength in terms of the amount of energy needed to break it, and how much water that much energy would boil. Let’s do something similar for a gigawatt-year.

It takes about 300 kilojoules of energy to boil a liter of water [1], so 30 petajoules would boil 100 billion liters of water. So a gigawatt-year of energy would be enough to boil Coniston Water, the third largest lake in England.

If you could convert a kilogram of matter to energy according to Emc², this would release 90 petajoules. So a gigawatt-year is the energy in about 300 grams of matter.

 

[1] In detail, boiling a liter of water is defined as increases the temperature from 20° C to 100° C at sea level.

 

Japanese polygon theorem

Here’s an interesting theorem that leads to some aesthetically pleasing images. It’s known as the Japanese cyclic polygon theorem.

For all triangulations of a cyclic polygon, the sum of inradii of the triangles is constant. Conversely, if the sum of inradii is independent of the triangulation, then the polygon is cyclic.

The image above shows two triangulations of an irregular hexagon. By the end of the post we’ll see the code that created these images, and we’ll see one more triangulations. And we’ll see that the sum of the radii of the circles is the same in each image.

Glossary

In case any of the terms in the theorem above are unfamiliar, here’s a quick glossary.

A triangulation of a polygon is a way to divide the polygon into triangles, with the restriction that the triangle vertices come from the polygon vertices.

A polygon is cyclic if there exists a circle that passes through all of its vertices. Incidentally, if a polygon has n + 2 sides, the number of possible triangulations is the nth Catalan number. More on that here.

The inradius of a triangle is the radius of the largest circle that fits inside the triangle. More on inner and outer radii here.

Equations for inradius and incenter

We’d like to illustrate the theorem by drawing some images and by adding up the inradii. This means we need to be able to find the inradius and the incenter.

The inradius of a triangle equals its area divided by its semiperimeter. The area we can find using Heron’s formula. The semiperimeter is half the perimeter.

The incenter is the weighted sum of the vertices, weighted by the lengths of the opposite sides, and normalized. If the vertices are A, B, and C, then the incenter is

\frac{aA + bB + cC}{a + b + c}

where A is the vertex opposite side a, B is the vertex opposite side b, and C is the vertex opposite side c.

Python code

The following Python code will draw a triangle with its incircle. Calling this function for each triangle in the triangulation will create the illustrations we’re after.

from numpy import *
import matplotlib.pyplot as plt

def draw_triangle_with_incircle(A, B, C):
    a = linalg.norm(B - C)
    b = linalg.norm(A - C)
    c = linalg.norm(A - B)
    s = (a + b + c)/2
    r = sqrt(s*(s-a)*(s-b)*(s-c))/s
    center = (a*A + b*B + c*C)/(a + b + c)
    plt.plot([A[0], B[0]], [A[1], B[1]], color="C0")
    plt.plot([B[0], C[0]], [B[1], C[1]], color="C0")
    plt.plot([A[0], C[0]], [A[1], C[1]], color="C0")
    t = linspace(0, 2*pi)
    plt.plot(r*cos(t) + center[0], r*sin(t) + center[1], color="C2")
    return r

Next, we pick six points not quite evenly spaced around a circle.

angle = [0, 50, 110, 160, 220, 315] # degrees
p = [array([cos(deg2rad(angle[i])), sin(deg2rad(angle[i]))]) for i in range(6)]

Now we draw three different triangulations of the hexagon defined by the points above. We also print the sum of the inradii to verify that each sum is the same.

def draw_polygon(triangles):
    rsum = 0
    for t in triangles:
        rsum += draw_triangle_with_incircle(p[t[0]], p[t[1]], p[t[2]])        
    print(rsum)
    plt.axis("off")
    plt.gca().set_aspect("equal")
    plt.show()

draw_polygon([[0,1,2], [0,2,3], [0,3,4], [0,4,5]])
draw_polygon([[0,1,2], [2,3,4], [4,5,0], [0,2,4]])
draw_polygon([[0,1,2], [0,2,3], [0,3,5], [3,4,5]])

This produces the two images at the top of the post the one below. All the inradii sum are 1.1441361217691244 with a little variation in the last decimal place.

 

Related posts

Tetrahedral analog of the Pythagorean theorem

A tetrahedron has four triangular faces. Suppose three of those faces come together like the corner of a cube, each perpendicular to the other. Let A1, A2, and A3 be the areas of the three triangles that meet at this corner and let A0 be the area of remaining face, the one opposite the right corner.

De Gua’s theorem, published in 1783, says

A0² = A1² + A2² + A3².

We will illustrate De Gua’s theorem using Python.

from numpy import *

def area(p1, p2, p3):
    return 0.5*abs(linalg.norm(cross(p1 - p3, p2 - p3)))

p0 = array([ 0, 0,  0])
p1 = array([11, 0,  0])
p2 = array([ 0, 3,  0])
p3 = array([ 0, 0, 25])

A0 = area(p1, p2, p3)
A1 = area(p0, p2, p3)
A2 = area(p0, p1, p3)
A3 = area(p0, p1, p2)

print(A0**2 - A1**2 - A2**2 - A3**2)

This prints 0, as expected.

Higher dimensions

A natural question is whether De Gua’s theorem generalizes to higher dimensions, and indeed it does.

Suppose you have a 4-simplex where one corner fits in the corner of a hypercube. The 4-simplex has 5 vertices. If you leave out any vertex, the remaining 4 points determine a tetrahedron. Let V0 be the volume of the tetrahedron formed by leaving out the vertex at the corner of the hypercube, and let V1, V2, V3, and V4 be the volumes of the tetrahedra formed by dropping out each of the other vertices. Then

V0² = V1² + V2² + V3² + V4².

You can extend the theorem to even higher dimensions analogously.

Let’s illustrate the theorem for the 4-simplex in Python. The volume of a tetrahedron can be computed as

V = det(G)1/2/6

where G is the Gram matrix computed in the code below.

def volume(p1, p2, p3, p4):
    u = p1 - p4
    v = p2 - p4
    w = p3 - p4

    # construct the Gram matrix
    G = array( [
            [dot(u, u), dot(u, v), dot(u, w)],
            [dot(v, u), dot(v, v), dot(v, w)],
            [dot(w, u), dot(w, v), dot(w, w)] ])

    return sqrt(linalg.det(G))/6

p0 = array([ 0, 0,  0, 0])
p1 = array([11, 0,  0, 0])
p2 = array([ 0, 3,  0, 0])
p3 = array([ 0, 0, 25, 0])
p4 = array([ 0, 0,  0, 7])

V0 = volume(p1, p2, p3, p4)
V1 = volume(p0, p2, p3, p4)
V2 = volume(p0, p1, p3, p4)
V3 = volume(p0, p1, p2, p4)
V4 = volume(p0, p1, p2, p3)

print(V0**2 - V1**2 - V2**2 - V3**2 - V4**2)

This prints -9.458744898438454e-11. The result is 0, modulo the limitations of floating point arithmetic.

Numerical analysis

Floating point arithmetic is generally accurate to 15 decimal places. Why is the numerical error above relatively large? The loss of precision is the usual suspect: subtracting nearly equal numbers. We have

V0 = 130978.7777777778

and

V1**2 + V2**2 + V3**2 + V4**2 = 130978.7777777779

Both results are correct to 16 decimal places, but when we subtract them we lose all precision.

The anti-Smith chart

As I’ve written about several times lately, the Smith chart is the image of a rectangular grid in the right half-plane under the function

f(z) = (z − 1)/(z + 1).

What would the image of a grid in the left half-plane look like?

For starters, since f maps the right half-plane to the interior of the unit circle, it must map the left-half plane to the exterior of the unit circle.

As we said before, the function f is a Möbius transformation, and so it takes generalized circles, i.e. either a circle or a line, to generalized circles. So the grid lines in the left half-plane are either mapped to lines or circles. Which is it?

The function f has a singularity at −1 and so the image of any line (or circle) through z = −1 is unbounded, i.e. a line, not a circle. Any line not passing through −1 has a bounded image, which must be a circle.

The line Re(z) = −1 in the z plane is mapped to the line Re(w) = 1 in the w plane. Otherwise a vertical line crossing the real axis at x is mapped to a circle passing through w = (x − 1)/(x + 1). The circle also passes through w = 1 because f(∞) = 1. The circle is symmetric about the real axis, and so this is enough information to uniquely determine the circle.

Note that (x − 1)/(x + 1) > 1 when x < −1 and so vertical lines with real part less than −1 are mapped to circles to the right of w = 1. When −1 < x < 0, vertical lines are mapped to circles to the left of w = 1.

The images of horizontal lines we’ve looked at before. These are all circles passing through w = 1 and tangent to the circles that are images of vertical lines. But this time instead of taking the portion of the circles inside the unit circle, we take the portion outside the unit circle.

And without further ado, we present the anti-Smith chart, the image of a grid in the left half plane.