An AI Odyssey, Part 3: Lost Needle in the Haystack

While shopping on a major e-commerce site, I wanted to get an answer to an obscure question about a certain product.

Not finding the answer immediately on the product page, I thought I’d try clicking the AI shopping assistant helper tool to ask this question.

I waited with anticipation for an answer I would expect be more informative and useful than a standard search result. But it was not to be. The AI tool had nothing worthwhile.

Then I decided on an old-fashioned keyword search across all the product reviews. And, lo and behold, I immediately found several credible reviews addressing my question.

It is not good usability when multiple search mechanisms exist but only one of them is reliable. And it is surprising that a retrieval-based approach (e.g., RAG) could not at least match the effectiveness of a simple keyword search over reviews.

Models are capable, but effective integration can be lacking. Without improvements for cases like this, customers will not be satisfied users of these new AI tools.

Related posts

Computing sine and cosine of complex arguments with only real functions

Suppose you have a calculator or math library that only handles real arguments but you need to evaluate sin(3 + 4i). What do you do?

If you’re using Python, for example, and you don’t have NumPy installed, you can use the built-in math library, but it will not accept complex inputs.

>>> import math
>>> math.sin(3 + 4j)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: must be real number, not complex

You can use the following identities to calculate sine and cosine for complex arguments using only real functions.

\begin{align*} \sin(x + iy) &= \sin x \cosh y + i \cos x \sinh y \\ \cos(x + iy) &= \cos x \cosh y - i \sin x \sinh y \\ \end{align*}

The proof is very simple: just use the addition formulas for sine and cosine, and the following identities.

\begin{align*} \sin iz &= i \sinh z \\ \cos iz &= \cosh z \end{align*}

The following code implements sine and cosine for complex arguments using only the built-in Python functions that accept real arguments. It then tests these against the NumPy versions that accept complex arguments.

from math import *
import numpy as np

def complex_sin(z):
    x, y = z.real, z.imag
    return sin(x)*cosh(y) + 1j*cos(x)*sinh(y)

def complex_cos(z):
    x, y = z.real, z.imag
    return cos(x)*cosh(y) - 1j*sin(x)*sinh(y)

z = 3 + 4j
mysin = complex_sin(z)
mycos = complex_cos(z)
npsin = np.sin(z)
npcos = np.cos(z)
assert(abs(mysin - npsin) < 1e-14)
assert(abs(mycos - npcos) < 1e-14)

Related posts

Lebesgue constants

I alluded to Lebesgue constants in the previous post without giving them a name. There I said that the bound on order n interpolation error has the form

ch^{n+1} + \lambda \delta

where h is the spacing between interpolation points and δ is the error in the tabulated values. The constant c depends on the function f being interpolated, and to a lesser extent on n. The constant λ is independent of f but depends on n and on the relative spacing between the interpolation nodes. This post will look closer at λ.

Given a set of n + 1 nodes T

a = x_0 < x_1 < x_2 < \cdots < x_{n-1} < x_n = b

define

\ell_j(x) := \prod_{\begin{smallmatrix}i=0\\ j\neq i\end{smallmatrix}}^{n} \frac{x-x_i}{x_j-x_i}

Then the Lebesgue function is defined by

\lambda_n(x) = \sum_{j=0}^n |\ell_j(x)|

and the Lebesgue constant for the grid is the maximum value of the Lebesgue function

\Lambda_n(T)=\max_{x\in[a,b]} \lambda_n(x)

The values of Λ are difficult to compute, but there are nice asymptotic expressions for Λ when the grid is evenly spaced:

\Lambda_n \sim \frac{2^{n+1}}{n \log n}

When the grid points are at the roots of a Chebyshev polynomial then

\Lambda_n \approx \frac{2}{\pi} \log(n + 1) + 1

The previous post mentioned the cases n = 11 and n = 29 for evenly spaced grids. The corresponding values of Λ are approximately 155 and 10995642. So 11th order interpolation is amplifying the rounding error in the tabulated points by a factor of 155, which might be acceptable. But 29th order interpolation is amplifying the rounding error by a factor of over 10 million.

The corresponding values of Λ for Chebyshev-spaced nodes are 2.58 and 3.17. Chebyshev spacing is clearly better for high-order interpolation, when you have that option.

How much precision can you squeeze out of a table?

Richard Feynman said that almost everything becomes interesting if you look into it deeply enough. Looking up numbers in a table is certainly not interesting, but it becomes more interesting when you dig into how well you can fill in the gaps.

If you want to know the value of a tabulated function between values of x given in the table, you have to use interpolation. Linear interpolation is often adequate, but you could get more accurate results using higher-order interpolation.

Suppose you have a function f(x) tabulated at x = 3.00, 3.01, 3.02, …, 3.99, 4.00 and you want to approximate the value of the function at π. You could approximate f(π) using the values of f(3.14) and f(3.15) with linear interpolation, but you could also take advantage of more points in the table. For example, you could use cubic interpolation to calculate f(π) using f(3.13), f(3.14), f(3.15), and f(3.16). Or you could use 29th degree interpolation with the values of f at 3.00, 3.01, 3.02, …, 3.29.

The Lagrange interpolation theorem lets you compute an upper bound on your interpolation error. However, the theorem assumes the values at each of the tabulated points are exact. And for ordinary use, you can assume the tabulated values are exact. The biggest source of error is typically the size of the gap between tabulated x values, not the precision of the tabulated values. Tables were designed so this is true [1].

The bound on order n interpolation error has the form

c hn + 1 + λ δ

where h is the spacing between interpolation points and δ is the error in the tabulated values. The value of c depends on the derivatives of the function you’re interpolating [2]. The value of λ is at least 1 since λδ is the “interpolation” error at the tabulated points.

The accuracy of an interpolated value cannot be better than δ in general, and so you pick the value of n that makes c hn + 1 less than δ. Any higher value of n is not helpful. And in fact higher values of n are harmful since λ grows exponentially as a function of n [3].

See the next post for mathematical details regarding the λs.

Examples

Let’s look at a specific example. Here’s a piece of a table for natural logarithms from A&S.

Here h = 10−3, and so linear interpolation would give you an error on the order of h² = 10−6. You’re never going to get error less than 10−15 since that’s the error in the tabulated values, so 4th order interpolation gives you about as much precision as you’re going to get. Carefully bounding the error would require using the values of c and λ above that are specific to this context. In fact, the interpolation error is on the order of 10−8 using 5th order interpolation, and that’s the best you can do.

I’ll briefly mention a couple more examples from A&S. The book includes a table of sine values, tabulated to 23 decimal places, in increments of h = 0.001 radians. A rough estimate would suggest 7th order interpolation is as high as you should go, and in fact the book indicates that 7th order interpolation will give you 9 figures of accuracy,

Another table from A&S gives values of the Bessel function J0 in with 15 digit values in increments of h = 0.1. It says that 11th order interpolation will give you four decimal places of precision. In this case, fairly high-order interpolation is useful and even necessary. A large number of decimal places are needed in the tabulated values relative to the output precision because the spacing between points is so wide.

Related posts

[1] I say were because of course people rarely look up function values in tables anymore. Tables and interpolation are still widely used, just not directly by people; computers do the lookup and interpolation on their behalf.

[2] For functions like sine, the value of c doesn’t grow with n, and in fact decreases slowly as n increases. But for other functions, c can grow with n, which can cause problems like Runge phenomena.

[2] The constant λ grows exponentially with n for evenly spaced interpolation points, and values in a table are evenly spaced. The constant λ grows only logarithmically for Chebyshev spacing, but this isn’t practical for a general purpose table.

From Mendeleev to Fourier

The previous post looked at an inequality discovered by Dmitri Mendeleev and generalized by Andrey Markov:

Theorem (Markov): If P(x) is a real polynomial of degree n, and |P(x)| ≤ 1 on [−1, 1] then |P′(x)| ≤ n² on [−1, 1].

If P(x) is a trigonometric polynomial then Bernstein proved that the bound decreases from n² to n.

Theorem (Bernstein): If P(x) is a trigonometric polynomial of degree n, and |P(z)| ≤ 1 on [−π, π] then |P′(x)| ≤ n on [−π, π].

Now a trigonometric polynomial is a truncated Fourier series

T(x) = a_0 + \sum_{n=1}^N a_n \cos nx + \sum_{n=1}^N b_n \sin nx

and so the max norm of the T′ is no more than n times the max norm of T.

This post and the previous one were motivated by Terence Tao’s latest post on Bernstein theory.

Related posts

Mendeleev’s inequality

Dmitri Mendeleev is best known for creating the first periodic table of chemical elements. He also discovered an interesting mathematical theorem. Empirical research led him to a question about interpolation, which in turn led him to a theorem about polynomials and their derivatives.

I ran across Mendeleev’s theorem via a paper by Boas [1]. The opening paragraph describes what Mendeleev was working on.

Some years after the chemist Mendeleev invented the periodic table of the elements he made a study of the specific gravity of a solution as a function of the percentage of the dissolved substance. This function is of some practical importance: for example, it is used in testing beer and wine for alcoholic content, and in testing the cooling system of an automobile for concentration of anti-freeze; but present-day physical chemists do not seem to find it as interesting as Mendeleev did.

Mendeleev fit his data by patching together quadratic polynomials, i.e. he used quadratic splines. A question about the slopes of these splines lead to the following.

Theorem (Mendeleev): Let P(x) be a quadratic polynomial on [−1, 1] such that |P(x)| ≤ 1. Then |P′(x)| ≤ 4.

Mendeleev showed his result to mathematician Andrey Markov who generalized it to the following.

Theorem (Markov): If P(x) is a real polynomial of degree n, and |P(x)| ≤ 1 on [−1, 1] then |P′(x)| ≤ n² on [−1, 1].

Both inequalities are sharp with equality if and only if P(x) = ±Tn(x), the nth Chebyshev polynomial. In the special case of Mendeleev’s inequality, equality holds for

T2(x) = 2x² − 1.

Andrey Markov’s brother Vladimir proved an extension of Andrey’s theorem to higher derivatives,

Related posts

[1] R. P. Boas, Jr. Inequalities for the Derivatives of Polynomials. Mathematics Magazine, Vol. 42, No. 4 (Sep., 1969), pp. 165–174

Set intersection and difference at the command line

A few years ago I wrote about comm, a utility that lets you do set theory at the command line. It’s a really useful little program, but it has two drawbacks: the syntax is hard to remember, and the input files must be sorted.

If A and B are two sorted lists,

    comm A B

prints A − B, B − A, and A ∩ B. You usually don’t want all three, and so comm lets you filter the output. It’s a little quirky in that you specify what you don’t want instead of what you do. And you have to remember that 1, 2, and 3 correspond to A − B, B − A, and A ∩ B respectively.

Venn diagram of comm parameters

A couple little scripts can hide the quirks. I have a script intersect

    comm -12 <(sort "$1") <(sort "$2")

and another script setminus

    comm -23 <(sort "$1") <(sort "$2")

that sort the input files on the fly and eliminate the need to remember comm‘s filtering syntax.

The setminus script computes A − B. To find B − A call the script with the arguments reversed.

Embedded regex flags

The hardest part of using regular expressions is not crafting regular expressions per se. In my opinion, the two hardest parts are minor syntax variations between implementations, and all the environmental stuff outside of regular expressions per se.

Embedded regular expression modifiers address one of the environmental complications by putting the modifier in the regular expression itself.

For example, if you want to make a grep search case-insensitive, you pass it the -i flag. But if you want to make a regex case-insensitive inside a Python program, you pass a function the argument re.IGNORECASE. But if you put (?i) at the beginning of your regular expression, then the intention to make the match case-insensitive is embedded directly into the regex. You could use the regex in any environment that supports (?i) without having to know how to specify modifiers in that environment.

I was debugging a Python script this morning that worked under one version of Python and not under another. The root of the problem was that it was using re.findall() with several huge regular expression that had embedded modifiers. That was OK up to Python 3.5, then it was a warning between versions 3.6 and 3.10, and it’s an error in versions 3.11 and later.

The problem isn’t with all embedded modifiers, only global modifiers that don’t appear at the beginning of the regex. Older versions of Python, following Perl’s lead, would let you put a modifier like (?i) in the middle of a regex, and apply the modifier from that point to the end of the expression. In the latest versions of Python, you can either place the modifier at the beginning of the regex, or use a scoped modifier like (?:…) in the middle of the expression.

I didn’t want to edit the regular expressions in my code—some had over a thousand characters—so I changed re.findall() to regex.findall(). The third-party regex module is generally more Perl-compatible than Python’s standard re module.

A lesser-known characterization of the gamma function

The gamma function Γ(z) extends the factorial function from integers to complex numbers. (Technically, Γ(z + 1) extends factorial.) There are other ways to extend the factorial function, so what makes the gamma function the right choice?

The most common answer is the Bohr-Mollerup theorem. This theorem says that if f: (0, ∞) → (0, ∞) satisfies

  1. f(x + 1) = x f(x)
  2. f(1) = 1
  3. log f is convex

then f(x) = Γ(x). The theorem applies on the positive real axis, and there is a unique holomorphic continuation of this function to the complex plane.

But the Bohr-Mollerup theorem is not the only theorem characterizing the gamma function. Another theorem was by Helmut Wielandt. His theorem says that if f is holomorphic in the right half-plane and

  1. f(z + 1) = z f(z)
  2. f(1) = 1
  3. f(z) is bounded for {z: 1 ≤ Re z ≤ 2}

then f(x) = Γ(x). In short, Weilandt replaces the log-convexity for positive reals with the requirement that f is bounded in a strip of the complex plane.

You might wonder what is the bound alluded to in Wielandt’s theorem. You can show from the integral definition of Γ(z) that

|Γ(z)| ≤ |Γ(Re z)|

for z in the right half-plane. So the bound on the complex strip {z: 1 ≤ Re z ≤ 2} equals the bound on the real interval [1, 2], which is 1.

Tighter bounds on alternating series remainder

The alternating series test is part of the standard calculus curriculum. It says that if you truncate an alternating series, the remainder is bounded by the first term that was left out. This fact goes by in a blur for most students, but it becomes useful later if you need to do numerical computing.

To be more precise, assume we have a series of the form

  \sum_{i=1}^\infty (-1)^i a_i

where the ai are positive and monotonically converge to zero. Then the tail of the series is bounded by its first term:

\left|R_n\right| = \left| \sum_{i=n+1}^\infty (-1)^i a_i \right| \leq a_{n+1}

The more we can say about the behavior of the ai the more we can say about the remainder. So far we’ve assumed that these terms go monotonically to zero. If their differences

\Delta a_i = a_i - a_{i+1}

also go monotonically to zero, then we have an upper and lower bound on the truncation error:

\frac{a_{n+1}}{2} \leq |R_n| \leq \frac{a_n}{2}

If the differences of the differences,

\Delta^2 a_i = \Delta (\Delta a_i)

also converge monotonically to zero, we can get a larger lower bound and a smaller upper bound on the remainder. In general, if the differences up to order k of the ai go to zero monotonically, then the remainder term can be bounded as follows.

\frac{a_{n+1}}{2}
+\frac{\Delta a_{n+1}}{2^2}
+\cdots+
\frac{\Delta^k a_{n+1}}{2^{k+1}}
< \left|R_n\right| <
\frac{a_n}{2}
-\left\{
\frac{\Delta a_n}{2^2}
+\cdots+
\frac{\Delta^k a_n}{2^{k+1}}
\right\}.

Source: Mark B. Villarino. The Error in an Alternating Series. American Mathematical Monthly, April 2018, pp. 360–364.

Related posts