Accessible math posts

Posted on 17 May 2020 by John

Several people have told me they can’t understand most of my math posts, but they subscribe because they enjoy the occasional math post that they do understand. If you’re in that boat, thanks for following, and I wanted to let you know there have been a few posts lately that are more accessible than you might think.

The posts I refer to don’t require any advanced math to follow, though they may require some advanced math to fully appreciate.

For example, my latest post on means and sums only requires pre-calculus math to understand, though it introduces ideas that usually aren’t introduced until they’re needed in more advanced courses.

A few days ago I started what turned into a series of four blog posts, all consequences of combining the binomial theorem and Euler’s theorem. Here are the four posts:

These posts do not require calculus, and should be mostly accessible with high school math.

Chebyshev polynomials come up several times in these posts. High school math will not prepare you to appreciate the significance of these particular polynomials, but you can follow along by ignoring the name and just thinking “OK, they’re polynomials. I know what a polynomial is.”

Sum and mean inequalities move in opposite directions

Posted on 17 May 2020 by John

It would seem that sums and means are trivially related; the mean is just the sum divided by the number of items. But when you generalize things a bit, means and sums act differently.

Let x be a list of n non-negative numbers,

$x = (x_1, x_2, \ldots, x_n)$

and let r > 0 [*]. Then the r-mean is defined to be

$M_r(x) = \left( \frac{1}{n} \sum_{i=1}^n x_i^r \right)^{1/r}$

and the r-sum is define to be

$S_r(x) = \left( \sum_{i=1}^n x_i^r \right)^{1/r}$

These definitions come from the classic book Inequalities by Hardy, Littlewood, and Pólya, except the authors use the Fraktur forms of M and S. If r = 1 we have the elementary mean and sum.

Here’s the theorem alluded to in the title of this post:

As r increases, M_r(x) increases and S_r(x) decreases.

If x has at least two non-zero components then M_r(x) is a strictly increasing function of r and S_r(x) is a strictly decreasing function of r. Otherwise M_r(x) and S_r(x) are constant.

The theorem holds under more general definitions of M and S, such letting the sums be infinite and inserting weights. And indeed much of Hardy, Littlewood, and Pólya is devoted to studying variations on M and S in fine detail.

Here are log-log plots of M_r(x) and S_r(x) for x = (1, 2).

Plot of M_r and S_r

Note that both curves asymptotically approach max(x), M from below and S from above.

[*] Note that r is only required to be greater than 0; analysis books typically focus on r ≥ 1.

Pretending OOP never happened

Posted on 15 May 2020 by John

I ran across someone recently who says the way to move past object oriented programming (OOP) is to go back to simply telling the computer what to do, to clear OOP from your mind like it never happened. I don’t think that’s a good idea, but I also don’t think it’s possible.

Object oriented programming, for all its later excesses, was a big step forward in software engineering. It made it possible to develop much larger programs than before, maybe 10x larger. Someone might counter by saying that programs had to be 10x larger because of all the overhead of OOP, but that’s not the case. OOP does add some overhead, and the amount of overhead grew over time as tools and frameworks became more complicated, but OOP made it possible to write programs that could not have been written before.

OOP provides a way for programmers to organize their code. It may not be the best way, depending on the problem, but the way to move past OOP is to replace it with another discipline. And I imagine most people who have learned and then rejected OOP do just that, whether they realize it or not. Maybe they retain some organizational patterns that they learned in the context of OOP.

That has been my experience. I hardly ever write classes anymore; I write functions. But I don’t write functions quite the way I did before I spent years writing classes.

And while I don’t often write classes, I do often use classes that come from libraries. Sometimes these objects seem like they’d be better off as bare functions, but I imagine the same libraries would be harder to use if no functions were wrapped in objects.

There are many alternatives to OOP for organizing code. I generally like functional programming, but in my experience there’s a hockey stick effort curve as you try to push the purity to 100%. James Hague said it well:

100% pure functional programming doesn’t work. Even 98% pure functional programming doesn’t work. But if the slider between functional purity and 1980s BASIC-style imperative messiness is kicked down a few notches — say to 85% — then it really does work. You get all the advantages of functional programming, but without the extreme mental effort and unmaintainability that increases as you get closer and closer to perfectly pure.

It’s possible, and a good idea, to develop large parts of a system in purely functional code. But someone has to write the messy parts that interact with the outside world.

Elementary approximation to “impossible” integral

Posted on 14 May 2020 by John

The previous post looked an integral that is “impossible” in the sense that it cannot be computed in closed form. It can be integrated in terms of special functions, and it can easily be computed numerically to as much accuracy as anyone would want.

In this post I’ll present a simple approximation that calculus students should find accessible.

The basic idea is to tweak a Taylor polynomial into something we can take the square root of. Specifically, we will replace the Taylor series

$\sin \theta = \theta - \frac{\theta^3}{6} + \frac{\theta^5}{120} - \cdots$

with

$\sin \theta \approx \theta - \frac{\theta^3}{6} + \frac{\theta^5}{144} = \theta\left(1 - \frac{\theta^2}{12}\right)^2$

If θ is small, θ⁵ is extremely small, and so the approximation should be work well for small x, and it doesn’t need to work for x very large because the integrand is only real for x less than π.

Define

$f(x) = \int_0^x \sqrt{\sin \theta} \, d\theta$

Then

$\begin{align*} f(x) &= \int_0^x \sqrt{\theta - \frac{\theta^3}{6} + \frac{\theta^5}{120} - \cdots } \,\, d\theta \\ &\approx \int_0^x \sqrt{\theta} \sqrt{ \left(1 - \frac{\theta^2}{12}\right)^2 } \, d\theta \\ &= \int_0^x \sqrt{\theta} \left(1 - \frac{\theta^2}{12}\right) \, d\theta \\ &= \frac{2}{3} x^{3/2} - \frac{1}{42} x^{7/2} \end{align*}$

Here’s a plot showing how good the approximation is.

plot

So the approximation is good to about seven figures for x < 0.2, and even for x as large as 1.5 the approximation is still good to three figures.

Since the sin θ is positive for 0 ≤ θ ≤ π, it’s natural to evaluate f(x) for 0 ≤ x ≤ π. Our approximation error would continue to increase if we directly used values of x greater than π/2. But because sine is symmetric about π/2, we can avoid this problem: If π/2 ≤ x ≤ π, then we can compute f(x) as 2f(π/2) – f(π – x).

To integrate the impossible integral

Posted on 13 May 2020 by John

Don Quixote

In the Broadway musical Man of La Mancha, Don Quixote sings

To dream the impossible dream
To fight the unbeatable foe
To bear with unbearable sorrow
To run where the brave dare not go

Yesterday my daughter asked me to integrate the impossible integral, and this post has a few thoughts on the quixotic quest to run where the brave calculus student dare not go.

The problem apparently required computing the indefinite integral of the square root of sine:

$\int \sqrt{\sin x} \,dx$

I say apparently for reasons that will soon be clear.

My first thought was that some sort of clever u-substitution would work. This was coming from a homework problem, so I was in the mindset of expecting every problem to be doable. If I ran across this integral while I had my consultant hat on rather than my father/tutor hat, I would have thought “It probably doesn’t have an elementary closed form because most integrals don’t.”

It turns out I should have had my professional hat on, because the integral does indeed not have an elementary closed form. There is a well-known function which is an antiderivative of √sin, but it’s not an elementary function. (“Well-known” relative to professional mathematicians, not relative to calculus students.)

You can evaluate the integral as

$\int \sqrt{\sin x} \,dx = -2 E\left(\frac{\pi - 2x}{4}\, \middle|\, 2 \right) + C$

where E is Legendre’s “elliptic integral of the second kind.”

I assume it wasn’t actually necessary to compute the antiderivative, though I didn’t see the full problem. In the artificial world of the calculus classroom, everything works out nicely. And this is a shame.

For one thing, it creates a false impression. Most integrals are “impossible” in the sense of not having an elementary closed form. Students who go on to use calculus for anything more than artificial homework problems may incorrectly assume they’ve done something wrong when they encounter an integral in the wild.

For another, it’s a missed opportunity. Or maybe two or three missed opportunities. These may or may not be appropriate to bring up, depending on how receptive the audience is. (But we said at the beginning we would “run where the brave calculus student dare not go.”)

It’s a missed opportunity to emphasize what integrals really are, to separate meaning from calculation technique. Is the integral of √sin impossible? Not in the sense that it doesn’t exist. Yes, there are functions whose derivative is √sin, at least if we limit ourselves to a range where sine is not negative. No, the integral does not exist in the sense of a finite combination of functions a calculus student would have seen.

It’s also a missed opportunity to show that we can define new functions as the solution to problems we can’t solve otherwise. The elliptic integrals mentioned above are functions defined in terms of integrals that cannot be computed in more elementary terms. By giving the function a name, we can compare where it comes up in other contexts. For example, I wrote a few months ago about the problem of finding the length of a serpentine wall. This also pops out of a calculus problem that doesn’t have an elementary solution, and in fact it also has a solution in terms of elliptic integrals.

Finally, it’s a missed opportunity to give a glimpse of the wider mathematical landscape. If not everything elementary function has an elementary antiderivative, do they at least have antiderivatives in terms of special functions such as elliptic integrals? No, but that’s a good question.

Any continuous function has an antiderivative, and that antiderivative might be an elementary function, or it might be a combination of elementary functions and familiar special functions. Or it might be an obscure special function, something not exactly common, but something that has been named and studied before. Or it might be a perfectly valid function that hasn’t been given a name yet. Maybe it’s too specialized to deserve a name, or maybe you’ve discovered something that comes up repeatedly and deserves to be cataloged.

This touches more broadly on the subject of what functions exist versus what functions have been named. Students implicitly assume these two categories are the same. Here’s an example of the same phenomenon in probability. It also touches on the gray area between what has been named and what hasn’t, and how you decide whether something deserves to be named.

Update: The next post gives a simple approximation to the integral in this post.

Extracting independent random bits from dependent sources

Posted on 10 May 2020 by John

Sometimes you have a poor quality source of randomness and you want to refine it into a better source. You might, for example, want to generate cryptographic keys from a hardware source that is more likely to produce 1’s than 0’s. Or maybe your source of bits is dependent, more likely to produce a 1 when it has just produced a 1.

Von Neumann’s method

If the bits are biased but independent, a simple algorithm by John von Neumann will produce unbiased bits. Assume each bit of input is a 1 with probability p and a 0 otherwise. As long as 0 < p < 1 and the bits are independent, it doesn’t matter what p is [1].

Von Neumann’s algorithm works on consecutive pairs bits. If the pair is 00 or 11, don’t output anything. When you see 10 output a 1, and when you see 01 output a 0. This will output a 0 or a 1 with equal probability.

Manuel Blum’s method

Manuel Blum built on von Neumann’s algorithm to extract independent bits from dependent sources. So von Neumann’s algorithm can remove bias, but not dependence. Blum’s algorithm can remove bias and dependence, if the dependence takes the form of a Markov chain.

Blum’s assumes each state in the Markov chain can be exited two ways, labeled 0 and 1. You could focus on a single state and use von Neumann’s algorithm, returning 1 when the state exits by the 1 path the first time and 0 the second time, and return 0 when it does the opposite. But if there are many states, this method is inefficient since it produces nothing when the chain is in all the other states.

Blum basically applies the same method to all states, not just one state. But there’s a catch! His paper shows that the naive implementation of this idea is wrong. He gives a pseudo-theorem and a pseudo-proof that the naive algorithm is correct before showing that it is not. Then he shows that the correct algorithm really is correct.

This is excellent exposition. It would have been easier to say nothing of the naive algorithm, or simply mention in passing that the most obvious approach is flawed. But instead he went to the effort of fully developing the wrong algorithm (with a warning up-front that it will be shown to be wrong).

[1] It matters for efficiency. The probability that a pair of input bits will produce an output bit is 2p(1 – p), so the probability is maximized when p = 0.5. If p = 0.999, for example, the method will still work correctly, but you’ll have to generate 500 pairs of input on average to produce each output bit.

Convert PostScript to PDF locally

Posted on 10 May 2020 by John

There are numerous websites that will let you upload a PostScript (.ps) file and download it as a PDF. Some of these online file format conversion sites look sketchy, and the ones that seem more reputable don’t always do a good job.

If you want to convert PostScript to PDF locally, I’d suggest trying the Ghostscript utility ps2pdf. I’ve had good luck with it.

To convert the file input.ps to the file output.pdf you simply say

    ps2pdf input.ps output.pdf

If ps2pdf is not installed on your computer, you can run

    sudo apt install ghostscript

on Linux.

Windows installation

On Windows, you can install Ghostscript by first installing WSL then using the command above.

I’m half joking. That advice sounds like something the stereotypical arrogant Unix user would say: the way to run software on Windows is to first install Linux, then run the software in Linux. You can install a Windows port of Ghostscript directly on Windows.

Source http://dilbert.com/strip/1995-06-24

But in all seriousness, I’ve used Windows ports of Unix-first software for a long time with mixed success. Sometimes the ports work seamlessly, but often they fail or half work. I find it less frustrating to use WSL (Windows Subsystem for Linux) to run software on Windows that was first written for Unix.

WSL is not a virtual machine per se but an implementation of Linux using the Windows API. It starts up quickly, and it’s well integrated with Windows. I have no complaints with it.

Chebyshev’s other polynomials

Posted on 8 May 2020 by John

There are two sequences of polynomials named after Chebyshev, and the first is so much more common that when authors say “Chebyshev polynomial” with no further qualification, they mean Chebyshev polynomials of the first kind. These are denoted with T_n, so they get Chebyshev’s initial [1]. The Chebyshev polynomials of the second kind are denoted U_n.

Chebyshev polynomials of the first kind are closely related to cosines. It would be nice to say that Chebyshev polynomials of the second kind are to sines what Chebyshev polynomials of the first kind are to cosines. That would be tidy, but it’s not true. There are relationships between the two kinds of Chebyshev polynomials, but they’re not that symmetric.

It is true that Chebyshev polynomials of the second kind satisfy a relation somewhat analogous to the relation

$\cos n\theta = T_n(\cos\theta)$

for his polynomials of the first kind, and it involves sines:

$\frac{\sin n\theta}{\sin \theta} = U_{n-1}(\cos\theta)$

We can prove this with the equation we’ve been discussing in several posts lately, so there is yet more juice to squeeze from this lemon.

Once again we start with the equation

$\cos n\theta + i\sin n\theta = \sum_{j=0}^n {n\choose j} i^j(\cos\theta)^{n-j} (\sin\theta)^j$

and take the complex part of both sides. The odd terms of the sum contribute to the imaginary part, so we can assume j = 2k + 1. We make the replacement

$(\sin\theta)^{2k+1} = (1 - \cos^2\theta)^k \sin\theta$

and so we’re left with a polynomial in cos θ, except for an extra factor of sin θ in every term.

This shows that sin nθ / sin θ, is a polynomial in cos θ, and in fact a polynomial of degree n − 1. Given the theorem that

$\frac{\sin n\theta}{\sin \theta} = U_{n-1}(\cos\theta)$

it follows that the polynomial in question must be U_n−1.

More juice in the lemon

Posted on 8 May 2020 by John

There’s more juice left in the lemon we’ve been squeezing lately.

A few days ago I first brought up the equation

$\cos n\theta + i\sin n\theta = \sum_{j=0}^n {n\choose j} i^j(\cos\theta)^{n-j} (\sin\theta)^j$

which holds because both sides equal exp(inθ).

Then a couple days ago I concluded a blog post by noting that by taking the real part of this equation and replacing sin²θ with 1 – cos²θ one could express cos nθ as a polynomial in cos θ,

$\cos n\theta = \sum {n\choose 2k} (-1)^k (\cos\theta)^{n-2k} (1 - \cos^2\theta)^k$

and in fact this polynomial is the nth Chebyshev polynomial T_n since these polynomials satisfy

$\cos n\theta = T_n(\cos\theta)$

Now in this post I’d like to prove a relationship between Chebyshev polynomials and sines starting with the same raw material. The relationship between Chebyshev polynomials and cosines is well known, even a matter of definition depending on where you start, but the connection to sines is less well known.

Let’s go back to the equation at the top of the post, replace n with 2n + 1, and take the imaginary part of both sides. The odd terms of the sum contribute to the imaginary part, so we sum over 2ℓ+ 1.

$\begin{align*} \sin((2n+1)\theta) &= \sum_{\ell} (-1)^\ell {2n+1 \choose 2\ell + 1} (\cos\theta)^{2n-2\ell} (\sin\theta)^{2\ell + 1} \\ &= \sum_k (-1)^{n-k}{2n + 1 \choose 2k} (\sin \theta)^{2n + 1 -2k} (\cos\theta)^{2k} \\ &= (-1)^n \sum_k (-1)^k{2n + 1 \choose 2k} (\sin \theta)^{2n + 1 -2k} (1 -\sin^2\theta)^{k} \end{align*}$

Here we did a change of variables k = n – ℓ.

The final expression is the expression we began with, only evaluated at sin θ instead of cos θ. That is,

$\sin((2n+1)\theta) = (-1)^n T_{2n+1}(\sin\theta)$

So for all n we have

$\cos n\theta = T_n(\cos\theta)$

and for odd n we also have

$\sin n\theta = \pm \,\,T_n(\sin\theta)$

The sign is positive when n is congruent to 1 mod 4 and negative when n is congruent to 3 mod 4.

CRM consulting gig

Posted on 6 May 2020 by John

This morning I had someone from a pharmaceutical company call me with questions about conducting a CRM dose-finding trial and I mentioned it to my wife.

Then this afternoon she was reading a book in which there was a dialog between husband and wife including this sentence:

He launched into a technical explanation of his current consulting gig—something about a CRM implementation.

You can’t make this kind of thing up. A few hours before reading this line, my wife had exactly this conversation. However, I doubt the author and I had the same thing in mind.

In my mind, CRM stands for Continual Reassessment Method, a Bayesian method for Phase I clinical trials, especially in oncology. We ran a lot of CRM trials while I worked in biostatistics at MD Anderson Cancer Center.

For most people, presumably including the author of the book quoted above, CRM stands for Customer Relationship Management software.

Like my fictional counterpart, I know a few things about CRM implementation, but it’s a different kind of CRM.

Month: May 2020

Accessible math posts

Sum and mean inequalities move in opposite directions

Related posts

Pretending OOP never happened

More programming posts

Elementary approximation to “impossible” integral

To integrate the impossible integral

Extracting independent random bits from dependent sources

Von Neumann’s method

Manuel Blum’s method

Related posts

Convert PostScript to PDF locally

Windows installation

Related posts

Chebyshev’s other polynomials

More special function posts

More juice in the lemon

More Chebyshev posts

CRM consulting gig

More clinical trial posts