Means of means bounding the logarithmic mean

The geometric, logarithmic, and arithmetic means of a and b are defined as follows.

\begin{align*} G &= \sqrt{ab} \\ L &= \frac{b - a}{\log b - \log a} \\ A &= \frac{a + b}{2} \end{align*}

A few days ago I mentioned that GLA. The logarithmic mean slips between the geometric and arithmetic means.

Or to put it another way, the logarithmic mean is bounded by the geometric and arithmetic means. You can bound the logarithmic mean more tightly with a mixture of the geometric and arithmetic means.

In [1] the authors show that

G^{2/3} A^{1/3} \leq L \leq \tfrac{2}{3}G + \tfrac{1}{3}A

Note that the leftmost expression is the geometric mean of G, G, and A, and the rightmost expression is the arithmetic mean of G, G, and A. We can write this as

G(G, G, A) \leq L \leq A(G, G, A)

where G with no argument is the geometric mean of a and b and G with three arguments is the geometric mean of those arguments, and similarly for A.

The following plot shows how well these means of means bound the logarithmic mean. We let a = 1 and let b vary from 1 to 10o.

The upper bound is especially tight for moderate values of b. When I first made the plot I let b run up to 10 and there were apparently only four curves in the plot. I had to pick a larger value of b before the curves for L and (2G + A)/3 to be distinguished.

Related posts

[1] Graham Jameson and Peter R. Mercer. The Logarithmic Mean Revisited. The American Mathematical Monthly, Vol. 126, No. 7, pp. 641-645

Ky Fan’s inequality

Let

x = (x_1, x_2, x_3, \ldots, x_n)

with each component satisfying 0 < xi ≤ 1/2. Define the complement x′ by taking the complement of each entry.

x' = (1 - x_1, 1 - x_2, 1 - x_3, \ldots, 1 - x_n)

Let G and A represent the geometric and arithmetic mean respectively.

Then Ky Fan’s inequality says

\frac{G(x)}{G(x')} \leq \frac{A(x)}{A(x')}

Now let H be the harmonic mean. Since in general HGA, you might guess that Ky Fan’s inequality could be extended to

\frac{H(x)}{H(x')} \leq \frac{G(x)}{G(x')} \leq \frac{A(x)}{A(x')}

and indeed this is correct.

Source: Jósef Sándor. Theory and Means and Their Inequalities.

Integral representations of means

The average of two numbers, a and b, can be written as the average of x over the interval [a, b]. This is easily verified as follows.

\frac{1}{b-a}\int_a^b x\, dx = \frac{1}{b-a} \left( \frac{b^2}{2} - \frac{a^2}{2}\right) = \frac{a+b}{2}

The average is the arithmetic mean. We can represent other means as above if we generalize the pattern to be

\varphi^{-1}\left(\,\text{average of } \varphi(x) \text{ over } [a, b] \,\right )

For the arithmetic mean, φ(x) = x.

Logarithmic mean

If we set φ(x) = 1/x we have

\left(\frac{1}{b-a} \int_a^b x^{-1}\, dx \right)^{-1} = \left(\frac{\log b - \log a}{b - a} \right)^{-1} = \frac{b - a}{\log b - \log a}

and the last expression is known as the logarithmic mean of a and b.

Geometric mean

If we set φ(x) = 1/x² we have

\left(\frac{1}{b-a} \int_a^b x^{-2}\, dx \right)^{-1/2} = \left(\frac{1}{b-a}\left(\frac{1}{a} - \frac{1}{b} \right )\right)^{-1/2} = \sqrt{ab}

which gives the geometric mean of a and b.

Identric mean

In light of the means above, it’s reasonable ask what happens if we set φ(x) = log x. When we do we get a more arcane mean, known as the identric mean.

The integral representation of the identric mean seems natural, but when we compute the integral we get something that looks arbitrary.

\begin{align*} \exp\left( \frac{1}{b-a} \int_a^b \log x\, dx \right) &= \exp\left( \left.\frac{1}{b-a} (x \log x - x)\right|_a^b \right) \\ &= \exp\left( \frac{b \log b - a \log a - b + a}{b-a} \right) \\ &= \frac{1}{e} \left( \frac{b^b}{a^a} \right)^{1/(b-a)} \end{align*}

The initial expression looks like something that might come up in application. The final expression looks artificial.

Because the latter is more compact, you’re likely to see the identric mean defined by this expression, then later you might see the integral representation. This is unfortunate since the integral representation makes more sense.

Order of means

It is well known that the geometric mean is no greater than the arithmetic mean. The logarithmic and identric means squeeze in between the geometric and arithmetic means.

If we denote the geometric, logarithmic, identric, and arithmetic means of a and b by G, L, I, and A respectively,

G \leq L \leq I \leq A

Related posts

Sierpiński’s inequality

Let An, Gn and Hn be the arithmetic mean, geometric mean, and harmonic mean of a set of n numbers.

When n = 2, the arithmetic mean times the harmonic mean is the geometric mean squared. The proof is simple:

A_2(x, y) H_2(x, y) = \left(\frac{x + y}{2}\right)\left(\frac{2}{\frac{1}{x} + \frac{1}{y}} \right ) = xy = G_2(x,y)^2

When n > 2 we no longer have equality. However, W. Sierpiński, perhaps best known for the Sierpiński’s triangle, proved that an inequality holds for all n. Given

x = (x_1, x_2, \ldots, x_n)

we have the inequality

H_n(x)^{n-1}\, A_n(x) \leq G_n(x)^n \leq A_n(x)^{n-1}\, H_n(x)

Related posts

[1] W. Sierpinski. Sur une inégalité pour la moyenne alrithmétique, géometrique, et harmonique. Warsch. Sitzunsuber, 2 (1909), pp. 354–357.

sin(cos(x)) versus cos(sin(x))

A few days ago I wrote about sufficient conditions for f(g(x)) to bound g(f(x)). This evening I stumbled on an analogous theorem.

For real numbers γ and δ,

cos(γ sin(x)) > sin(δ cos(x))

for all real x provided

γ² + δ² < (π/2)².

Source: American Mathematical Monthly. February 2009. Solution to problem 11309, page 184.

The reference gives two proofs of the theorem above.

Here’s a quick and dirty Python script that suggests the theorem and its converse are both true.

from numpy import *
import matplotlib.pyplot as plt

N = 200
xs = linspace(0, pi, N)
ds = linspace(-0.5*pi, 0.5*pi, N)
gs = linspace(-0.5*pi, 0.5*pi, N)

def f(x, d, g): return cos(g*sin(x)) - sin(d*cos(x))

for d in ds:
    for g in gs:
        if all(f(xs, d, g) > 0):
            plt.plot(d, g, 'bo')
            if d**2 + g**2 > (pi/2)**2:
                print(d, g)
                
plt.gca().set_aspect("equal")
plt.show()

This produces a big blue disk of radius π/2, confirming that the condition

γ² + δ² < (π/2)²

is sufficient. Furthermore, it prints nothing, which suggests the condition is also necessary.

f(g(x)) versus g(f(x))

I stumbled upon a theorem today that I feel like I’ve needed in the past, though I can’t remember any particular applications. I’m writing it up here as a note to my future self should the need reappear.

The theorem gives sufficient conditions to conclude

f(g(x)) ≤ g(f(x))

and uses this to prove, for example, that

arcsin( sinh(x/2) ) ≤ sinh( arcsin(x)/2 )

on the interval [0, 1].

If you think of any applications, please leave a comment.

Here’s the theorem, found in [1].

Let f be continuous with domain 0 ≤ x < 1 or 0 ≤ x ≤ 1, f(0) = 0, f(1) > 1 (including the possibility that f(1) = +∞); let g be continuous with domain the range of f, and g(1) ≤ 1. Let f(x)/x and g(x)/x be strictly increasing on their domains. Finally let f(x) ≠ x for 0 < x < 1. Then f(g(x)) ≤ g(f(x)) for 0 < x < 1.

[1] Ralph P. Boas. Inequalities for a Collection. Mathematics Magazine, January 1979, Vol. 52, No. 1, pp. 28–31

Maclaurin’s inequality

This afternoon I wrote a brief post about Terence Tao’s new paper A Maclaurin type inequality. That paper builds on two classical inequalities: Newton’s inequality and Maclaurin’s inequality. The previous post expanded a bit on Newton’s inequality. This post will do the same for Maclaurin’s inequality.

As before, let x be a list of real numbers and define Sn(x) to be the average over all products of n elements from x. Maclaurin’s inequality says that Sn(x)1/n is decreasing function of n.

S1(x) ≥ S2(x)1/2S3(x)1/3 ≥ …

We can illustrate this using the Python code used in the previous post with a couple minor changes. We change the definition of ys to

   ys = [S(xs, n)**(1/n) for n in ns]

and change the label on the vertical axis accordingly.

Looks like a decreasing function to me.

Newton’s inequality and log concave sequences

The previous post mentioned Newton’s inequality. This post will explore this inequality.

Let x be a list of real numbers and define Sn(x) to be the average over all products of n elements from x. Newton’s inequality says that

Sn−1 Sn+1S²n

In more terminology more recent than Newton, we say that the sequence Sn is log-concave.

The name comes from the fact that if the elements of x are positive, and hence the S‘s are positive, we can take the logarithm of both sides of the inequality and have

(log Sn−1 + log Sn+1)/2 ≤ log Sn

which is the discrete analog of saying log Sk is concave as a function of k.

Let’s illustrate this with some Python code.

from itertools import combinations as C
from numpy import random
from math import comb
import matplotlib.pyplot as plt

def S(x, n):
    p = lambda c: prod([t for t in c])
    return sum(p(c) for c in C(x, n))/comb(N, n)

random.seed(20231010)
N = 10
xs = random.random(N)
ns = range(1, N+1)
ys = [log(S(xs, n)) for n in ns]

plt.plot(ns, ys, 'o')
plt.xlabel(r"$n$")
plt.ylabel(r"$\log S_n$")
plt.show()

This produces the following plot.

This plot looks nearly linear. It’s plausible that it’s concave.

Cosine similarity does not satisfy the triangle inequality

The previous post looked at cosine similarity for embeddings of words in vector spaces. Word embeddings like word2vec map words into high-dimensional vector spaces in such a way that related words correspond to vectors that are roughly parallel. Ideally the more similar the words, the smaller the angle between their corresponding vectors.

The cosine similarity of vectors x and y is given by

\cos(\theta) = \frac{\mathbf{x} \cdot \mathbf{y}}{ ||\mathbf{x} || \,||\mathbf{y} || }

If the vectors are normalized to have length 1 (which word embeddings are typically not) then cosine similarity is just dot product. Vectors are similar when their cosine similarity is near 1 and dissimilar when their cosine similarity is near 0.

If x is similar to y and y is similar to z, is x similar to z? We could quantify this as follows.

Let cossim(x, y) be the cosine similarity between x and y.

If cossim(x, y)= 1 − ε1, and cossim(y, z) =1 − ε2, is cossim(x, z) at least 1 − ε1 − ε2? In other words, does the complement of cosine similarity satisfy the triangle inequality?

Counterexample

The answer is no. I wrote a script to search for a counterexample by generating random points on a sphere. Here’s one of the examples it came up with.

    x = [−0.9289  −0.0013   0.3704]
    y = [−0.8257   0.3963   0.4015]
    z = [−0.6977   0.7163  −0.0091]

Let d1 = 1 − cossim(x, y), d2 = 1 − cossim(y, z), and d3 be 1 − cossim(x, z).

Then d1 = 0.0849, d2 = 0.1437, and d3 = 0.3562 and so d3 > d1 + d2.

Triangle inequality

The triangle inequality does hold if we work with θs rather than their cosines. The angle θ between two vectors is the distance between these two vectors interpreted as points on a sphere and the triangle inequality does hold on the sphere.

Approximate triangle inequality

If the cosine similarity between x and y is close to 1, and the cosine similarity between y and z is close to 1, then the cosine similarity between x and z is close to one, though the triangle inequality may not hold. I wrote about this before in the post Nearly parallel is nearly transitive.

I wrote in that post that the law of cosines for spherical trigonometry says

cos c = cos a cos b + sin a sin b cos γ

where γ is the angle between the arcs a and b. If cos a = 1 − ε1, and cos b = 1 − ε2, then

cos a cos b = (1 − ε1)(1 − ε2) = 1 − ε1 − ε2 + ε1 ε2

If ε1 and ε2 are small, then their product is an order of magnitude smaller. Also, the term

sin a sin b cos γ

is of the same order of magnitude. So if 1 − cossim(x, y) =  ε1 and 1 − cossim(y, z) =  ε2 then

1 − cossim(x, z) =  ε1 + ε1 + O1 ε2)

Is the triangle inequality desirable?

Cosine similarity does not satisfy the triangle inequality, but do we want a measure of similarity between words to satisfy the triangle inequality? You could argue that semantic similarity isn’t transitive. For example, lion is similar to king in that a lion is a symbol for a king. And lion is similar to house cat in that both are cats. But king and house cat are not similar. So the fact that cosine similarity does not satisfy the triangle inequality might be a feature rather than a bug.

Related posts

Lehman’s inequality, circuits, and LaTeX

Let A, B, C, and D be positive numbers. Then Lehman’s inequality says

\frac{(A+B)(C+D)}{A+B+C+D} \geq \frac{AC}{A+C} + \frac{BD}{B+D}

Proof by circuit

This inequality can be proved analytically, but Lehman’s proof is interesting because he uses electrical circuits [1].

Let A, B, C, and D be the resistances of resistors arranges as in the circuit on the left.

Resistors R1 and R2 in series have a combined resistance of

R = R_1 + R_2

and the same resistors in parallel have a combined resistance of

R = \frac{R_1 R_2}{R_1 + R_2}

This means the circuit on the left has combined resistance

\frac{(A+B)(C+D)}{A+B+C+D}

The resistance of the circuit on the right is

\frac{AC}{A+C} + \frac{BD}{B+D}

Adding a short cannot increase resistance, so the resistance of the circuit on the right must be the same or lower than the resistance of the one on the left. Therefore

\frac{(A+B)(C+D)}{A+B+C+D} \geq \frac{AC}{A+C} + \frac{BD}{B+D}

Drawing circuits in LaTeX

I drew the circuits above using the circuitikz package in LaTeX. Here’s the code to draw the circuit on the left.

    \documentclass{article}
    
    \usepackage{tikz}
    \usepackage{circuitikz}
    
    \begin{document}
    
    \begin{figure}[h!]
      \begin{center}
        \begin{circuitikz}
          \draw (0,0)
          to[R=$B$] (0,2) 
          to[R=$A$] (0,4)
          to[short] (2,4) 
          to[R=$C$] (2,2)
          to[R=$D$] (2,0)
          to[short] (0,0);
          \draw (1,4)
          to[short] (1, 5);
          \draw (1, 0)
          to[short] (1, -1);
          %\draw (0, 2)
          %to[short] (2, 2);
        \end{circuitikz}
      \end{center}
    \end{figure}
    
    \end{document}

The code to draw the second circuit removes the %’s commenting out the code that draws the short between the two parallel arms of the circuit.

[1] Alfred Lehman proves a more general result using circuits in SIAM Review, Vol. 4, No. 2 (April 1962) pp. 150–151. Fazlollah Reza gives a purely mathematical proof on pages 151 and 152 of the same journal.