Solving hard problems

We help companies solve hard problems in mathematics, statistics, and computing. Let’s explore how we might work together.

Peaks of Sinc

Posted on 18 January 2021 by John

Yesterday I wrote two posts about finding the peaks of the sinc function. Both focused on numerical methods, the first using a contraction mapping and the second using Newton’s method. This post will focus on the locations of the peaks rather than ways of computing them.

The first post included this discussion of the peak locations.

The peaks of sin(x)/x are approximately at the same positions as sin(x), and so we use (2n + 1/2)π as our initial guess. In fact, all our peaks will be a little to the left of the corresponding peak in the sine function because dividing by x pulls the peak to the left. The larger x is, the less it pulls the root over.

This post will refine the observation above. The paragraph above suggests that for large n, the nth peak is located at approximately (2n + 1/2)π. This is a zeroth order asymptotic approximation. Here we will give a first order asymptotic approximation.

For a fixed positive n, let

θ = (2n + 1/2)π

and let

x = θ + ε

be the location of the nth peak. We will improve or approximation of the location of x by estimating ε.

As described in the first post in this series, setting the derivative of the sinc function to zero says x satisfies

x cos x – sin x = 0.

Therefore

(θ + ε) cos(θ + ε) = sin(θ + ε)

Applying the sum-angle identities for sine and cosine shows

(θ + ε) (cos θ cos ε – sin θ sin ε) = sin θ cos ε + cos θ sin ε

Now sin θ = 1 and cos θ = 0, so

-(θ + ε) sin ε = cos ε.

tan ε = -1/(θ + ε).

So far our calculations are exact. You could, for example, solve the equation above for ε using a numerical method. But now we’re going to make a couple very simple approximations [1]. On the left, we will approximate tan ε with ε. On the right, we will approximate ε with 0. This gives us

ε ≈ -1/θ = -1/(2n + 1/2)π.

This says the nth peak is located at approximately

θ – 1/θ

where θ = (2n + 1/2)π. This refines the earlier statement “our peaks will be a little to the left of the corresponding peak in the sine function.” As n gets larger, the term we subtract off gets smaller. This makes the statement above “The larger x is, the less it pulls the root over” more precise.

Now let’s see visually how well our approximation works. The graph below plots the error in approximating the nth peak location by θ and by θ – 1/θ.

Note the log scale. The error in approximating the location of the 10th peak by 20.5π is between 0.1 and 0.01, and the error in approximating the location by 20.5π – 1/20.5π is approximately 0.000001.

[1] As pointed out in the comments, you could a slightly better approximation by not being so simple. Instead of approximating 1/(θ + ε) by 1/θ you could use 1/θ – ε/θ² and solve for ε.

Gell-Mann amnesia and its opposite

Posted on 18 January 2021 by John

Michael Crichton coined the term Gell-Mann Amnesia effect to describe forgetting how unreliable a source is in one area when you trust it in another area. In Crichton’s words:

Briefly stated, the Gell-Mann Amnesia effect is as follows. You open the newspaper to an article on some subject you know well. In Murray [Gell-Mann]’s case, physics. In mine, show business. You read the article and see the journalist has absolutely no understanding of either the facts or the issues. Often, the article is so wrong it actually presents the story backward—reversing cause and effect. I call these the “wet streets cause rain” stories. Paper’s full of them.

In any case, you read with exasperation or amusement the multiple errors in a story, and then turn the page to national or international affairs, and read as if the rest of the newspaper was somehow more accurate about Palestine than the baloney you just read. You turn the page, and forget what you know.

I think about the Gell-Mann Amnesia effect when I read news stories that totally botch science or statistics. Most of the time when I read a news story that touches on something I happen to know about, it’s at best misleading and at worst just plain wrong.

Yesterday I had the opposite experience. I was trying out a new podcast, not one focused on science or statistics, that was mostly correct when it touched on statistical matters that I’ve looked into. They didn’t bat 1000, but they did better than popular news sites. That increased my estimate of how likely the podcast is to be accurate about other matters.

By the way, why is the effect named after the Nobel Prize-winning physicist Murray Gell-Mann? Crichton explained

I refer to it by this name because I once discussed it with Murray Gell-Mann, and by dropping a famous name I imply greater importance to myself, and to the effect, than it would otherwise have.

Rate of convergence for Newton’s method

Posted on 17 January 2021 by John

In the previous post I looked at the problem of finding the peaks of the sinc function.

In this post we use this problem to illustrate how two formulations of the same problem can behave very differently with Newton’s method.

The previous post mentioned finding the peaks by solving either

x cos x – sin x = 0

or equivalently

tan x – x = 0

It turns out that the former is much better suited to Newton’s method. Newton’s method applied to the first equation will converge quickly without needing to start particularly close to the root. Newton’s method applied to the second equation will fail to converge at all unless the method beings close to the root, and even then the method may not be accurate.

Here’s why. The rate of convergence in solving

f(x) = 0

with Newton’s method is determined by the ratio of the second derivative to the first derivative

| f ‘ ‘ (x) / f ‘ (x) |

near the root.

Think of the second derivative as curvature. Dividing by the first derivative normalizes the scale. So convergence is fast when the curvature relative to the scale is small. Which makes sense intuitively: When a function is fairly straight, Newton’s method zooms down to the root. When a function is more curved, Newton’s method requires more steps.

The following table gives the absolute value of the ratio of the second derivative to the first derivative at the first ten peaks, using both equations. The bound on the error in Newton’s method is proportional to this ratio.

    |------+------------+------------|
    | peak | Equation 1 | Equation 2 |
    |------+------------+------------|
    |    1 |      0.259 |       15.7 |
    |    2 |      0.142 |       28.3 |
    |    3 |      0.098 |       40.8 |
    |    4 |      0.075 |       53.4 |
    |    5 |      0.061 |       66.0 |
    |    6 |      0.051 |       78.5 |
    |    7 |      0.044 |       91.1 |
    |    8 |      0.039 |      103.7 |
    |    9 |      0.034 |      116.2 |
    |   10 |      0.031 |      128.8 |
    |------+------------+------------|

The error terms are all small for the first equation, and they get smaller as we look at peaks further from the origin. The error terms for the second equation are all large, and get larger as we look at peaks further out.

For the third and final post in this series, see Peaks of Sinc.

Reverse iteration root-finding

Posted on 17 January 2021 by John

The sinc function is defined by

sinc(x) = sin(x)/x.

This function comes up constantly in signal processing. Here’s a plot.

We would like to find the location of the function’s peaks. Let’s focus first on the first positive peak, the one that’s somewhere between 5 and 10. Once we can find that one, the rest will be easy.

If you take the derivative of sinc and set it to zero, you find that the peaks must satisfy

x cos x – sin x = 0

which means that

tan x = x.

So our task reduces to finding the fixed points of the tangent function. One way to find fixed points is simply to iterate the function. We pick a starting point near the peak we’re after, then take its tangent, then take the tangent of that, etc.

The peak appears to be located around 7.5, so we’ll use that as a starting point. Then iterates of tangent give

     2.7060138667726910
    -0.4653906051625444
    -0.5021806478408769
    -0.5491373258198057
    -0.6119188887713993
    -0.7017789436750164
    -0.8453339618848119
    -1.1276769374114777
    -2.1070512803092996
     1.6825094538261074

That didn’t work at all. That’s because tangent has derivative larger than 1, so it’s not a contraction mapping.

The iterates took us away from the root we were after. This brings up an idea: is there some way to iterate a negative number of times? Well, sorta. We can run our iteration backward.

Instead of solving

tan x = x

we could equivalently solve

arctan x = x.

Since iterating tangent pushes points away, iterating arctan should bring them closer. In fact, the derivative of arctan is less than 1, so it is a contraction mapping, and we will get a fixed point.

Let’s start again with 7.5 and iterate arctan. This quickly converges to the peak we’re after.

    7.721430101677809
    7.725188823982156
    7.725250798474231
    7.725251819823800
    7.725251836655669
    7.725251836933059
    7.725251836937630
    7.725251836937706
    7.725251836937707
    7.725251836937707

There is a little complication: we have to iterate the right inverse tangent function. Since tangent is periodic, there are infinitely many values of x that have a particular tangent value. The arctan function in NumPy returns a value between -π/2 and π/2. So if we add 2π to this value, we get values in an interval including the peak we’re after.

Here’s how we can find all the peaks. The peak at 0 is obvious, and by symmetry we only need to find the positive peaks; the nth negative peak is just the negative of the nth positive peak.

The peaks of sin(x)/x are approximately at the same positions as sin(x), and so we use (2n + 1/2)π as our initial guess. In fact, all our peaks will be a little to the left of the corresponding peak in the sine function because dividing by x pulls the peak to the left. The larger x is, the less it pulls the root over.

The following Python code will find the nth positive peak of the sinc function.

    def iterate(n, num_iterates = 6):
        x = (2*n + 0.5)*np.pi
        for _ in range(num_iterates):
            x = np.arctan(x) + 2*n*np.pi
    return x

My next post will revisit the problem in this post using Newton’s method.

Low-tech transparency

Posted on 15 January 2021 by John

I recently received two large data files from a client, with names like foo.xlsx and foo.csv. Presumably these are redundant; the latter is probably an export of the former. I did a spot check and that seems to be the case.

Then I had a bright idea: use pandas to make sure the two files are the same. It’s an elegant solution: import both files as data frames, then use the compare() function to verify that they’re the same.

Except it didn’t work. I got a series of mysterious and/or misleading messages as I tried to track down the source of the problem, playing whack-a-mole with the data. There could be any number of reason why compare() might not work on imported data: character encodings, inferred data types, etc.

So I used brute force. I exported the Excel file as CSV and compared the text files. This is low-tech, but transparent. It’s easier to compare text files than to plumb the depths of pandas.

One of the problems was that the data contained heights, such as 5'9". This causes problems with quoting, whether you enclose strings in single or double quotes. A couple quick sed one-liners resolved most of the mismatches. (Though not all. Of course it couldn’t be that simple …)

It’s easier to work with data in a high-level environment like pandas. But it’s also handy to be able to use low-level tools like diff and sed for troubleshooting.

I suppose someone could write a book on how to import CSV files. If all goes well, it’s one line of code. Then there are a handful of patterns that handle the majority of remaining cases. Then there’s the long tail of unique pathologies. As Tolstoy would say, happy data sets are all alike, but unhappy data sets are each unhappy in their own way.

Zeros of trigonometric polynomials

Posted on 14 January 2021 by John

A periodic function has at least as many real zeros as its lowest frequency Fourier component. In more detail, the Sturm-Hurwitz theorem says that

$f(x) = \sum_{k=n}^N a_k \sin kx + b_k \cos kx$

has at least 2n zeros in the interval [0, 2π) if a_n and b_n are not both zero. You could take N to be infinity if you’d like.

Note that the lowest frequency term

$a_n \sin nx + b_n \cos nx$

can be written as

$c \sin(nx + \varphi)$

for some amplitude c and phase φ as explained here. This function clearly has 2n zeros in each period. The remarkable thing about the Sturm-Hurwitz theorem is that adding higher frequency components can increase the number of zeros, but it cannot decrease the number of zeros.

To illustrate this theorem, we’ll look at a couple random trigonometric polynomials with n = 5 and N = 9 and see how many zeros they have. Theory says they should have at least 10 zeros.

The first has 16 zeros:

And the second has 12 zeros:

(It’s difficult to see just how many zeros there are in the plots above, but if we zoom in by limiting the vertical axis we can see the zeros more easily. For example, we can see that the second plot does not have a zero between 4 and 5; it almost reaches up to the x-axis but doesn’t quite make it.)

Here’s the code that made these plots.

    import matplotlib.pyplot as plt
    import numpy as np
    
    n = 5
    N = 9
    np.random.seed(20210114)
    
    for p in range(2):
        a = np.random.random(size=N+1)
        b = np.random.random(size=N+1)
    
        x = np.linspace(0, 2*np.pi, 200)
        y = np.zeros_like(x)
        for k in range(n, N+1):
            y += a[k]*np.sin(k*x) + b[k]*np.cos(k*x)
        plt.plot(x, y)
        plt.grid()
        plt.savefig(f"sturm{p}.png")
        plt.ylim([-0.1, 0.1])
        plt.savefig(f"sturm{p}zoom.png")
        plt.close()

Just-in-case revisited

Posted on 13 January 2021 by John

Just-in-time learning means learning something just when you need it. The alternative is just-in-case, learning something in case you need it. I discussed this in an earlier post, and today I’d like to add a little to that discussion.

There are some things you need to know (or at least be familiar with) before you have a chance to use them. Here’s a variation on that idea: some things you need to have practiced before you need them in order to overcome an effort barrier.

Suppose you tell yourself that you’ll learn to use Photoshop or GIMP when you need to. Then you need to edit a photo. Faced with the prospect of learning either of these software packages, you might decide that the photo in question looks good enough after all.

There are things that in principle you could learn just-in-time, though in practice this is not psychologically feasible. The mental “activation energy” is too high. Some things you need to practice before hand, not because you couldn’t look them up when needed, but because they would be too daunting to learn when needed.

Related post: Bicycle skills

Data pathology

Posted on 12 January 2021 by John

Pathologist

This post is an expansion of something I wrote on Twitter:

Data scientists often complain that the bulk of their work is data cleaning.

But if you see data cleaning as the work, not just an obstacle to the work, it can be interesting.

You could think of it as data pathology, a kind of analysis before the intended analysis.

Anything you have to do before you can get down to what you thought you would be doing can be irritating. And if you bid a project not anticipating preliminary difficulties, it can be costly as well.

But if you anticipate problems with dirty data, these problems can be fun to solve. They may offer more variety than the “real work.” These problems bring up some interesting questions.

How are the data actually represented and why?
Can you automate the cleaning process, or at least automate some of it?
Is the the corruption in the data informative, i.e. does it reveal something in addition to the data itself?

Courses in mathematical statistics will not prepare you to answer any of these questions. Beginning data scientists may find this sort of forensic work irritating because they aren’t prepared for it. Maybe it’s a job for regular expressions rather than regression.

Or maybe they can do it well but would rather get on to something else.

Or maybe this sort of data pathology work would be more interesting with a change of attitude, seeing it as important work that requires experience and provides opportunities for creativity.

More stability, less stress

Posted on 11 January 2021 by John

It’s been eight years since I started my consulting business. Two of the things I love about having my own business are the stability and the reduced stress. This may sound like a joke, but I’m completely serious.

Having a business is ostensibly less stable and more stressful than having a salaried job, but at a deeper level it can be more stable and less stressful.

If you are an employee, you have one client. If you lose that client, you lose 100% of your income. If you have a business with a dozen clients, losing a client or two at the same time is disappointing, but it’s not devastating.

As for stress, I prefer the stress of owning a business to the stresses of employment. My net stress level dropped when I went out on my own. My sleep, for example, improved immediately.

At first I never knew where the next project was coming from. But I found this less stressful than office politics, questioning the value of my work, a lack of correlation between my efforts and my rewards, etc.

If you’re thinking of striking out on your own, I wish you well. Here is some advice I wrote a few years ago that you may find helpful.

Quaternion square roots

Posted on 6 January 2021 by John

If y is a quaternion, how many solutions does x² = y have? That is, does every quaternion have a square root, and if so, how many square roots does it have?

A quaternion can have more than two roots. There is a example right in the definition of quaternions:

i² = j² = k² = -1

and so -1 has at least three square roots. It follows that every negative real number has at least three square roots.

Negative reals

In fact, every square root of -1 corresponds to a square root of any other negative number. To see this, let p be a positive real number and let r = √p > 0. If x is a square root of -1, then rx is a square root of –p. Conversely, if x is a square root of –p, then x/r is a square root of -1.

So all negative real numbers have the same number of square roots in the quaternions, and that number is at least 3. In fact, it turns out that all negative real numbers have infinitely many quaternion roots. A calculation shows that

(a + bi + cj + dk)² = -1

if and only if b² + c² + d² = 1. So there are as many square roots of -1 as there are points on a sphere.

Not negative reals

If a quaternion is not a negative real number, then the story is much more familiar: the equation x² = y has one solution if y = 0 and two solutions otherwise.

To summarize, x² = y has

one solution if y = 0,
infinitely many solutions if y is negative and real, and
two solutions otherwise.

It’s easy to show that 0 has only one root: |x²| = |x|², and so if x² = 0, then |x| = 0 and so x = 0. Here the norm of a quaternion is defined by

|a + bi + cj + dk| = √(a² + b² + c² + d²).

If p > 0, the solutions to x² = –p are bi + cj + dk where b² + c² + d² = p.

What’s left is to show that if y is not zero and not a negative real, then x² = y has two solutions.

Polar form and calculating roots

Quaternions have a polar form analogous to the polar form for complex numbers. That is, we can write a quaternion y in the form

y = r(cos θ + u sin θ)

where r is real and positive, θ is real, and u is a unit quaternion with zero real part. We can find r, θ, and u as follows. Write y as

a + bi + cj + dk = a + q

so that a is the real part and q is the pure quaternion part. Then we can find r, θ, and u by

r = |y|
cos θ = a/|y|
u = q / |q|

Then the two roots are

x = ±√r (cos θ/2 + u sin θ/2).

Python code

First we’ll implement the algorithm above, then show that it produces the same results as the Python package quaternionic.

First, our code to compute x satisfying x² = y.

    import numpy as np

    np.random.seed(20210106)

    def norm(q):
        return sum(t*t for t in q)**0.5

    y = np.random.random(size=4)
    r = norm(y)
    theta = np.arccos(y[0]/r)

    u = y + 0 # make a copy
    u[0] = 0
    u /= norm(u)

    x = np.sin(theta/2)*u
    x[0] = np.cos(theta/2)
    x *= r**0.5

And now we check our results using quaternionic.

    import quaternionic

    qx = quaternionic.array(x)
    qy = quaternionic.array(y)

    print(f"y   = {y}")
    print(f"x*x = {qx*qx}")       # should equal y
    print(f"x   = {x}")           # our root
    print(f"x   = {np.sqrt(qy)}") # root via package

This produces

    y   = [0.61615367 0.07612092 0.09606777 0.11150865]
    x*x = [0.61615367 0.07612092 0.09606777 0.11150865]
    x   = [0.79189641 0.04806243 0.06065678 0.07040609]
    x   = [0.79189641 0.04806243 0.06065678 0.07040609]