The post Refinements to the prime number theorem first appeared on John D. Cook.

]]>This means that in the limit as *x* goes to infinity, the relative error in approximating π(*x*) with *x*/log(*x*) goes to 0. However, there is room for improvement. The relative approximation error goes to 0 faster if we replace *x*/log(*x*) with li(*x*) where

The prime number theorem says that for large *x*, the error in approximating π(*x*) by li(*x*) is small relative to π(*x*) itself. It would appear that li(*x*) is not only an approximation for π(*x*), but it is also an upper bound. That is, it seems that li(*x*) > π(*x*). However, that’s not true for all *x*.

Littlewood proved in 1914 that there is some *x* for which π(*x*) > li(*x*). We still don’t know a specific number *x* for which this holds, though we know such numbers exist. The smallest such *x* is the definition of Skewes’ number. The number of digits in Skewes’ number is known to be between 20 and 317, and is believed to be close to the latter.

Littlewood not only proved that li(*x*) – π(*x*) is sometimes negative, he proved that it changes sign infinitely often. So naturally there is interest in estimating li(*x*) – π(*x*) for very large values of *x*.

A new result was published a few days ago [1] refining previous bounds to prove that

for all *x* > exp(2000).

When *x* = exp(2000), the right side is roughly 10^{857} and π(*x*) is roughly 10^{865}, and so the relative error is roughly 10^{-8}. That is, the li(*x*) approximation to π(*x*) is accurate to 8 significant figures, and the accuracy increases as *x* gets larger.

***

[1] Platt and Trudgian. The error term in the prime number theorem. Mathematics of Computation. November 16, 2020. https://doi.org/10.1090/mcom/3583

The post Refinements to the prime number theorem first appeared on John D. Cook.

]]>The post Minimizing random Boolean expressions first appeared on John D. Cook.

]]>and so the possibilities explode as *n* increases. We could do *n* = 3 and 4, but 5 would be a lot of work, and 6 is out of the question.

So we do what we always do when a space is too big to explore exhaustively: we explore at random.

The Python module we’ve been using, `qm`

, specifies a function of *n* Boolean variables in terms of the set of product terms on which the function evaluates to 1. These product terms can be encoded as integers, and so a Boolean function of *n* variables corresponds to a subset of the integers 0 through 2^{n} – 1.

We can generate a subset of these numbers by generating a random mask consisting of 0s and 1s, and keeping the numbers where the mask value is 1. We could do this with code like the following.

N= 2**n x = np.arange(N) mask = np.random.randint(2, size=N) ones = set(mask*x)

There’s a small problem with this approach: the set `ones`

always contains 0. We want it to contain 0 if and only if the 0th mask value is a 1.

The following code generates a Boolean expression on *n* variables, simplifies it, and returns the length of the simplified expression [1].

def random_sample(n): N = 2**n x = np.arange(N) mask = np.random.randint(2, size=N) ones = set(mask*x) if mask[0] == 0: ones.remove(0) return len(qm(ones=ones, dc={}))

We can create several random samples and make a histogram with the following code.

def histogram(n, reps): counts = np.zeros(2**n+1, dtype=int) for _ in range(reps): counts[random_sample(n)] += 1 return counts

The data in the following graph comes from calling `histogram(5, 1000)`

.

Note that the length of the random expressions is distributed symmetrically around 16 (half of 2^{5}). So minimization turns a distribution centered around 16 into a distribution centered around 8.

The code is slow because the Quine-McCluskey algorithm is slow, and our Python implementation of the algorithm isn’t as fast as it could be. But Boolean minimization is an NP problem, so no exact algorithm is going to scale well. To get faster results, we could switch to something like the Expresso Heuristic Logic Minimizer, which often gets close to a minimum expression.

***

[1] The code above will fail if the set of terms where the function is 1 is empty. However this is extremely unlikely: we’d expect it to happen once in every 2^(2^*n*) times and so when *n* = 5 this is less than one time in four billion. The fully correct approach would be to call `qm`

with `zeros=x`

when `ones`

is empty.

The post Minimizing random Boolean expressions first appeared on John D. Cook.

]]>The post How much can Boolean expressions be simplified? first appeared on John D. Cook.

]]>`qm`

. In this post we’d like to look at how much the minimization process shortens expressions.
Witn *n* Boolean variables, you can create 2^*n* terms that are a product of distinct variables. You can specify a Boolean function by specifying the subset of such terms on which it takes the value 1, and so there are 2^(2^*n*) Boolean functions on *n* variables. For very small values of *n* we can can minimize every possible Boolean function.

To do this, we need a way to iterate through the power set (set of all subsets) of the integers up to 2^*n*. Here’s a function to do that, borrowed from itertools recipes.

from itertools import chain, combinations def powerset(iterable): xs = list(iterable) return chain.from_iterable( combinations(xs, n) for n in range(len(xs) + 1))

Next, we use this code to run all Boolean functions on 3 variables through the minimizer. We use a matrix to keep track of how long the input expressions are and how long the minimized expressions are.

from numpy import zeros from qm import q n = 3 N = 2**n tally = zeros((N,N), dtype=int) for p in powerset(range(N)): if not p: continue # qm can't take an empty set i = len(p) j = len(qm(ones=p, dc={})) tally[i-1, j-1] += 1

Here’s a table summarizing the results [1].

The first column gives the number of product terms in the input expression and the subsequent columns give the number of product terms in the output expressions.

For example, of the expressions of length 2, there were 12 that could be reduced to expressions of length 1 but the remaining 16 could not be reduced. (There are 28 possible input expressions of length 2 because there are 28 ways to choose 2 items from a set of 8 things.)

There are no nonzero values above the main diagonal, i.e. no expression got longer in the process of minimization. Of course that’s to be expected, but it’s reassuring that nothing went obviously wrong.

We can repeat this exercise for expressions in 4 variables by setting *n* = 4 in the code above. This gives the following results.

We quickly run into a wall as *n* increases. Not only does the Quine-McCluskey algorithm take about twice as long every time we add a new variable, the number of possible Boolean functions grows even faster. There were 2^(2^3) = 256 possibilities to explore when *n* = 3, and 2^(2^4) = 65,536 when *n* = 4.

If we want to explore all Boolean functions on five variables, we need to look at 2^(2^5) = 4,294,967,296 possibilities. I estimate this would take over a year on my laptop. The `qm`

module could be made more efficient, and in fact someone has done that. But even if you made the code a billion times faster, six variables would still be out of the question.

To explore functions of more variables, we need to switch from exhaustive enumeration to random sampling. I may do that in a future post. (Update: I did.)

***

[1] The raw data for the tables presented as images is available here.

The post How much can Boolean expressions be simplified? first appeared on John D. Cook.

]]>The post Minimizing boolean expressions first appeared on John D. Cook.

]]>We will write AND like multiplication, OR like addition, and use primes for negation. For example,

*wx* + *z*‘

denotes

(*w* AND *x*) OR (NOT *z*).

You may notice that the expression

*wx*‘*z + wxz*

can be simplified to *wz*, for example, but it’s not feasible to simplify complicated expressions without a systematic approach.

One such approach is the Quine-McCluskey algorithm. Its run time increases exponentially with the problem size, but for a small number of terms it’s quick enough [1]. We’ll show how to use the Python module qm which implements the algorithm.

How are you going to pass a Boolean expression to a Python function? You could pass it an expression as a string and expect the function to parse the string, but then you’d have to specify the grammar of the little language you’ve created. Or you could pass in an actual Python function, which is more work than necessary, especially if you’re going to be passing in a lot of expressions.

A simpler way is pass in the set of places where the function evaluates to 1, encoded as numbers.

For example, suppose your function is

*wxy*‘*z* + *w*‘*xyz*‘

This function evaluates to 1 when either the first term evaluates to 1 or the second term evaluates to 1. That is, when either

(*w*, *x*, *y*, *z*) = (1, 1, 0, 1)

or

(*w*, *x*, *y*, *z*) = (0, 1, 1, 0).

Interpreting the left sides as binary numbers, you could specify the expression with the set {13, 6} which describes where the function is 1.

If you prefer, you could express your numbers in binary to make the correspondence to terms more explicit, i.e. `{0b1101,`

`0b110}`

.

One more thing before we use `qm`

: your Boolean expression might not be fully specified. Maybe you want it to be 1 on some values, 0 on others, and you don’t care what it equals on the rest.

The `qm`

module lets you specify these with arguments `ones`

, `zeroes`

, and `dc`

. If you specify two out of these three sets, `qm`

will infer the third one.

For example, in the code below

from qm import qm print(qm(ones={0b111, 0b110, 0b1101}, dc={}))

we’re asking `qm`

to minimize the expression

*xyz* + *xyz*‘ + *wxy*‘*z.*

Since the don’t-care set is empty, we’re saying our function equals 0 everywhere we haven’t said that it equals 1. The function prints

['1101', '011X']

which corresponds to

*wxy*‘*z* + *w*‘*xy,*

the X meaning that the fourth variable, *z*, is not part of the second term.

Note that the minimized expression is not unique: we could tell by inspection that

*xyz* + *xyz*‘ + *wxy*‘*z.*

could be reduced to

*xy* + *wxy*‘*z.*

Also, our code defines a minimum expression to be one with the fewest sums. Both simplifications in this example have two sums. But *xy* + *wxy*‘*z* is simpler than *wxy*‘*z* + *w*‘*xy* in the sense of having one less term, so there’s room for improvement, or at least discussion, as to how to quantify the complexity of an expression.

In the next post I use `qm`

to explore how much minimization reduces the size of Boolean expressions.

***

[1] The Boolean expression minimization problem is in NP, and so no known algorithm that always produces an exact answer will scale well. But there are heuristic algorithms like Espresso and its variations that usually provide optimal or near-optimal results.

The post Minimizing boolean expressions first appeared on John D. Cook.

]]>The post Rotating symbols in LaTeX first appeared on John D. Cook.

]]>The symbol is U+214B in Unicode.

I was looking into how to produce this character in LaTeX when I found that the package `cmll`

has two commands that produce this character, one semantic and one descriptive: `\parr`

and `\invamp`

[1].

This got me to wondering how you might create a symbol like the one above if there wasn’t one built into a package. You can do that by using the `graphicx`

package and the `\rotatebox`

command. Here’s how you could roll your own par operator:

\rotatebox[origin=c]{180}{\&}

There’s a backslash in front of the & because it’s a special character in LaTeX. If you wanted to rotate a K, for example, there would be no need for a backslash.

The `\rotatebox`

command can rotate any number of degrees, and so you could rotate an ampersand 30° with

\rotatebox[origin=c]{30}{\&}

to produce a tilted ampersand.

[1] The name `\parr`

comes from the fact that the operator is sometimes pronounced “par” in linear logic. (It’s not simply `\par`

because LaTeX already has a command `\par`

for inserting a paragraph break.)

The name `\invamp`

is short for “inverse ampersand.” Note however that the symbol is not an inverted ampersand in the sense of being a reflection; it is an ampersand rotated 180°.

The post Rotating symbols in LaTeX first appeared on John D. Cook.

]]>The post The smallest number with a given number of divisors first appeared on John D. Cook.

]]>16 = 2^{4}

and the divisors of 16 are 2^{k} where *k* = 0, 1, 2, 3, or 4.

This approach generalizes: For any prime *q*, the smallest number with *q* divisors is 2^{q-1}.

Now suppose you want to find the smallest number with 6 divisors. One candidate would be 32 = 2^{5}, but you could do better. Instead of just looking at numbers divisible by the smallest prime, you could consider numbers that are divisible by the two smallest primes. And in fact

12 = 2^{2} 3

is the smallest number with 6 divisors.

This approach also generalizes. If *h* is the product of 2 primes, say *h* = *pq* where *p* ≥ *q*, then the smallest number with *h* divisors is

2^{p-1} 3^{q-1}.

The divisors come from letting the exponent on 2 range from 0 to *p*-1 and letting the exponent on 3 range from 0 to *q*-1.

For example, the smallest number with 35 divisors is

5184 = 2^{7-1} 3^{5-1}.

Note that we did not require *p* and *q* to be different. We said *p* ≥ *q*, and not *p* > *q*. And so, for example, the smallest number with 25 divisors is

1296 = 2^{5-1} 3^{5-1}.

Now, suppose we want to find the smallest number with 1001 divisors. The number 1001 factors as 7*11*13, which has some interesting consequences. It turns out that the smallest number with 1001 divisors is

2^{13-1} 3^{11-1} 5^{7-1}.

Does this solution generalize? **Usually, but not always**.

Let *h* = *pqr* where *p*, *q*, and *r* are primes with *p* ≥ *q* ≥ *r*. Then the smallest number with *h* divisors is

2^{p-1} 3^{q-1} 5^{r-1}

with one exception. The smallest number with 8 divisors would be 30 = 2*3*5 if the theorem always held, but in fact the smallest number with 8 divisors is 24.

In [1] M. E. Gorst examines the exceptions to the general pattern. We’ve looked at the smallest number with *h* divisors when *h* is the product of 1, or 2, or 3 (not necessarily distinct) primes. Gorst considers values of *h* equal to the product of up to 6 primes.

We’ve said that the pattern above holds for all *h* the product of 1 or 2 primes, and for all but one value of *h* the product of 3 primes. There are two exceptions for *h* the product of 4 primes. That is, if *h* = *pqrs* where *p* ≥ *q* ≥ *r* ≥ *s* are primes, then the smallest number with *h* divisors is

2^{p-1} 3^{q-1} 5^{r-1} *7*^{s-1}

with two exceptions. The smallest number with 2^{4} divisors is 2^{3} × 3 × 5, and the smallest number with 3 × 2^{3} divisors is 2^{3} × 3^{2} × 5.

When *h* is the product of 5 or 6 primes, there are infinitely many exceptions, but they have a particular form given in [1].

The result discussed here came up recently in something I was working on, but I don’t remember now what. If memory serves, which it may not, I wanted to assume something like what is presented here but wasn’t sure it was true.

***

[1] M. E. Grost. The Smallest Number with a Given Number of Divisors. The American Mathematical Monthly, September 1968, pp. 725-729.

The post The smallest number with a given number of divisors first appeared on John D. Cook.

]]>The post Good news from Pfizer and Moderna first appeared on John D. Cook.

]]>That’s great news. The vaccines may turn out to be less than 90% effective when all is said and done, but even so they’re likely to be far more effective than expected.

But there’s other good news that might be overlooked: **the subjects in the control groups did well too**, though not as well as in the active groups.

The infection rate was around 0.4% in the Pfizer control group and around 0.6% in the Moderna control group.

There were 11 severe cases of COVID in the Moderna trial, out of 30,000 subjects, all in the control group.

There were 0 severe cases of COVID in the Pfizer trial in either group, out of 43,000 subjects.

The post Good news from Pfizer and Moderna first appeared on John D. Cook.

]]>The post I think I'll pass first appeared on John D. Cook.

]]>Anyone who has spent a career using some skill ought to blow away an exam intended for people who have been learning that skill for a semester.

However, after thinking about it more, I’m pretty sure I’d *pass* the test in question, but I’m not at all sure I’d ace it. Academic exams often test unimportant material that is in the short term memory of both the instructor and the students.

When I was in middle school, I remember a question that read

It is a long way from ________ to ________.

I made up two locations that were far apart but my answer was graded as wrong.

My teacher was looking for a direct quote from a photo caption in our textbook that said it was a long way from Timbuktu to some place I can’t remember.

That stuck in my mind as the canonical example of a question that doesn’t test subject matter knowledge but tests the incidental minutia of the course itself [1]. A geography professor would stand no better chance of giving the expected answer than I did.

Almost any time you see a question asking for “the 3 reasons” for something or “the 5 consequences” of this or that, it’s likely a Timbuktu question. In open-world contexts [2], I’m suspicious whenever I see “the” followed by a specific number.

In some contexts you can make exhaustive lists—it makes sense to talk about the 3 branches of the US government or the 5 Platonic solids, but it doesn’t make sense to talk about the 4 causes of World War I. Surely historians could come up with more than 4 causes, and there’s probably no consensus regarding what the 4 most important causes are.

There’s a phrase **teaching to the test** for when the goal is not to teach the subject per se but to prepare the students to pass a standardized test related to the subject. The phenomena discussed here is sort of the opposite, **testing to the teaching**.

When you ask students for the 4 causes of WWI, you’re asking for the 4 causes *given in lecture* or the 4 causes *in the text book*. You’re not testing knowledge of WWI per se but knowledge of the course materials.

[1] Now that I’m in middle age rather than middle school, I could say that the real question was not geography but psychology. The task was to reverse-engineer from an ambiguous question what someone was thinking. That is an extremely valuable skill, but not one I possessed in middle school.

[2] A closed world is one in which the rules are explicitly known, finite, and exhaustive. Chess is a closed world. Sales is not. Academia often puts a box around some part of an open world so it can think of it as a closed world.

The post I think I'll pass first appeared on John D. Cook.

]]>The post Probability of commuting first appeared on John D. Cook.

]]>A related question would be to ask how often quaternions do commute, i.e. the probability that *xy* – *yx* = 0 for randomly chosen *x* and *y*.

There’s a general theorem for this [1]. For a discrete non-abelian group, the probability that two elements commute, chosen uniformly at random, is never more than 5/8 for any group.

To put it another way, in a finite group either all pairs of elements commute with each other or no more than 5/8 of all pairs commute, with no possibilities in between. You can’t have a group, for example, in which exactly 3 out of 4 pairs commute.

What if we have an infinite group like the quaternions?

Before we can answer that, we’ve got to say how we’d compute probabilities. With a finite group, the natural thing to do is make every point have equal probability. For a (locally compact) infinite group the natural choice is Haar measure.

Subject to some technical conditions, Haar measure is the only measure that interacts as expected with the group structure. It’s unique up to a constant multiple, and so it’s unique when we specify that the measure of the whole group has to be 1.

For compact non-abelian groups with Haar measure, we again get the result that no more than 5/8 of pairs commute.

[1] W. H. Gustafson. What is the Probability that Two Group Elements Commute? The American Mathematical Monthly, Nov., 1973, Vol. 80, No. 9, pp. 1031-1034.

The post Probability of commuting first appeared on John D. Cook.

]]>The post Test for divisibility by 13 first appeared on John D. Cook.

]]>- A number is divisible by 2 if its last digit is divisible by 2.
- A number is divisible by 3 if the sum of its digits is divisible by 3.
- A number is divisible by 4 if the number formed by its last two digits is divisible by 4.
- A number is divisible by 5 if its last digit is divisible by 5.
- A number is divisible by 6 if it is divisible by 2 and by 3.

There is a rule for divisibility by 7, but it’s a little wonky. Let’s keep going.

- A number is divisible by 8 if the number formed by its last three digits is divisible by 8.
- A number is divisible by 9 if the sum of its digits is divisible by 9.
- A number is divisible by 10 if its last digit is 0.

There’s a rule for divisibility by 11. It’s a little complicated, though not as complicated as the rule for 7. I describe the rule for 11 in the penultimate paragraph here.

A number is divisible by 12 if it’s divisible by 3 and 4. (It matters here that 3 and 4 are relatively prime. It’s not true, for example, that a number is divisible by 12 if it’s divisible by 2 and 6.)

But what do you do when you get to 13?

We’re going to **kill three birds with one stone** by presenting a rule for testing divisibility by 13 that also gives new rules for testing divisibility by 7 and 11. So if you’re trying to factor a number by hand, this will give a way to test three primes at once.

To test divisibility by 7, 11, and 13, write your number with digits grouped into threes as usual. For example,

11,037,989

Then think of each group as a separate number — e.g. 11, 37, and 989 — and take the alternating sum, starting with a + sign on the last term.

989 – 37 + 11

The original number is divisible by 7 (or 11 or 13) if this alternating sum is divisible by 7 (or 11 or 13 respectively).

The alternating sum in our example is 963, which is clearly 9*107, and not divisible by 7, 11, or 13. Therefore 11,037,989 is not divisible by 7, 11, or 13.

Here’s another example. Let’s start with

4,894,498,518

The alternating sum is

518 – 498 + 894 – 4 = 910

The sum takes a bit of work, but less work than dividing a 10-digit number by 7, 11, and 13.

The sum 910 factors into 7*13*10, and so it is divisible by 7 and by 13, but not by 11. That tells us 4,894,498,518 is divisible by 7 and 13 but not by 11.

The heart of the method is that 7*11*13 = 1001. If I subtract a multiple of 1001 from a number, I don’t change its divisibility by 7, 11, or 13. More than that, I don’t change its **remainder** by 7, 11, or 13.

The steps in the method amount to adding or subtracting multiples of 1001 and dividing by 1000. The former doesn’t change the remainder by 7, 11, or 13, but the latter multiplies the remainder by -1, hence the alternating sum. (1000 is congruent to -1 mod 7, mod 11, and mod 13.) See more formal argument in footnote [1].

So not only can we test for divisibility by 7, 11, and 13 with this method, we can also find the remainders by 7, 11, and 13. The original number and the alternating sum are congruent mod 1001, so they are congruent mod 7, mod 11, and mod 13.

In our first example, *n* = 11,037,989 and the alternating sum was *m* = 963. The remainder when *m* is divided by 7 is 4, so the remainder when *n* is divided by 7 is also 4. That is, *m* is congruent to 4 mod 7, and so *n* is congruent to 4 mod 7. Similarly, *m* is congruent to 6 mod 11, and so *n* is congruent to 6 mod 11. And finally *m* is congruent to 1 mod 13, so *n* is congruent to 1 mod 13.

- Divisibility rules in hexadecimal
- Fermat’s factoring trick
- How long does it take to find large primes?

[1] The key calculation is

The post Test for divisibility by 13 first appeared on John D. Cook.

]]>The post Some mathematical art first appeared on John D. Cook.

]]>Here’s a plot reproduced from [1], with some color added (the default colors matplotlib uses for multiple plots).

The plot above was based on a the gamma function. Here are a few plots replacing the gamma function with another function.

Here’s *x*/sin(*x*):

Here’s *x*^{5}:

And here’s tan(*x*):

Here’s how the plots were created. For a given function *f*, plot the parametric curves given by

See [1] for what this has to do with circles and coordinates.

The plots based on a function *g*(*x*) are given by setting *f*(*x*) = *g*(*x*) + *c* where *c* = -10, -9, -8, …, 10.

[1] Elliot Tanis and Lee Kuivinen, Circular Coordinates and Computer Drawn Designs. Mathematics Magazine. Vol 52 No 3. May, 1979.

The post Some mathematical art first appeared on John D. Cook.

]]>The post Counting triangles with integer sides first appeared on John D. Cook.

]]>The authors in [1] developed an algorithm for finding *T*(*N*). The following Python code is a direct implementation of that algorithm.

def T(N :int): if N < 3: return 0 base_cases = {4:0, 6:1, 8:1, 10:2, 12:3, 14:4} if N in base_cases: return base_cases[N] if N % 2 == 0: R = N % 12 if R < 4: R += 12 return (N**2 - R**2)//48 + T(R) if N % 2 == 1: return T(N+3)

If you’re running a version of Python that doesn’t support type hinting, just delete the `:int`

in the function signature.

Since this is a recursive algorithm, we should convince ourselves that it terminates. In the branch for even `N`

, the number *R* is an even number between 4 and 14 inclusive, and so it’s in the `base_cases`

dictionary.

In the odd branch, we recurse on `N+3`

, which is a little unusual since typically recursive functions decrease their argument. But since `N`

is odd, `N+3`

is even, and we’ve already shown that the even branch terminates.

The code `(N**2 - R**2)//48`

raises a couple questions. Is the numerator divisible by 48? And if so, why specify integer division (`//`

) rather than simply division (`/`

)?

First, the numerator is indeed divisible by 48. *N* is congruent to *R* mod 12 by construction, and so *N* – *M* is divisible by 12. Furthermore,

*N*² – *R*² = (*N* – *R*)(*N* + *R*).

The first term on the right is divisible by 12, so if the second term is divisible by 4, the product is divisible by 48. Since *N* and *R* are congruent mod 12, *N +* *R* is congruent to 2*R* mod 12, and since *R* is even, 2*R* is a multiple of 4 mod 12. That makes it a multiple of 4 since 12 is a multiple of 4.

So if (*N*² – *R*²)/48 is an integer, why did I write Python code that implies that I’m taking the integer *part* of the result? Because otherwise the code would sometimes return a floating point value. For example, `T(13)`

would return 5.0 rather than 5.

Here’s a plot of *T*(*N*).

[1] J. H. Jordan, Ray Walch and R. J. Wisner. Triangles with Integer Sides. The American Mathematical Monthly, Vol. 86, No. 8 (Oct., 1979), pp. 686-689

The post Counting triangles with integer sides first appeared on John D. Cook.

]]>The post Ripples and hyperbolas first appeared on John D. Cook.

]]>*y*‘ = sin(*xy*).

The authors recommend having students explore numerical solutions to this equation and discover theorems about its solutions.

Their paper gives numerous theorems relating solutions and the hyperbolas *xy* = *a*: how many times a solution crosses a hyperbola, at what angle, under what conditions a solution can be tangent to a hyperbola, etc.

The plot above is based on a plot in the original paper, but easier to read. It wasn’t so easy to make nice plots 40 years ago. In the original plot the solutions and the asymptotes were plotted with the same thickness and color, making them hard to tell apart.

[1] Wendell Mills, Boris Weisfeiler and Allan M. Krall. Discovering Theorems with a Computer: The Case of *y*‘ = sin(*xy*). The American Mathematical Monthly, Nov., 1979, Vol. 86, No. 9 (Nov., 1979), pp. 733-739

The post Ripples and hyperbolas first appeared on John D. Cook.

]]>The post Informative stopping first appeared on John D. Cook.

]]>For example, suppose Alice wants to convince Bob that π has a greater proportion of even digits than odd digits.

Alice: I’ll show you that π has more even digits than odd digits by looking at the firstNdigits. How big would you likeNto be?

Bob: At least 1,000. Of course more data is always better.

Alice: Right. And how many more even than odd digits would you find convincing?

Bob: If there are at least 10 more evens than odds, I’ll believe you.

Alice: OK. If you look at the first 2589 digits, there are 13 more even digits than odd digits.

Now if Alice wanted to convince Bob that there are more odd digits, she could do that too. If you look at the first 2077 digits, 13 more are odd than even.

No matter what two numbers Bob gives, Alice can find find a sample size that will give the result she wants. Here’s Alice’s Python code.

from mpmath import mp import numpy as np N = 3000 mp.dps = N+2 digits = str(mp.pi)[2:] parity = np.ones(N, dtype=int) for i in range(N): if digits[i] in ['1', '3', '5', '7', '9']: parity[i] = -1 excess = parity.cumsum() print(excess[-1]) print(np.where(excess == 13)) print(np.where(excess == -13))

The number `N`

is a guess at how far out she might have to look. If it doesn’t work, she increases it and runs the code again.

The array `parity`

contains a 1 in positions where the digits of π (after the decimal point) are even and a -1 where they are odd. The cumulative sum shows how many more even than odd digits there have been up to a given point, a negative number meaning there have been more odd digits.

Alice thought that stopping when there are exactly 10 more of the parity she wants would look suspicious, so she looked for places where the difference was 13.

Here are the results:

[ 126, 128, 134, …, 536, 2588, … 2726] [ 772, 778, 780, …, 886, 2076, … 2994]

There’s one minor gotcha. The array `excess`

is indexed from zero, so Alice reports 2589 rather than 2588 because the 2589th digit has index 2588.

Bob’s mistake was that he specified a minimum sample size. By saying “at least 1,000” he gave Alice the freedom to pick the sample size to get the result she wanted. If he specified an exact sample size, there probably would be either more even digits or more odd digits, but there couldn’t be both. And if he were more sophisticated enough, he could pick an excess value that would be unlikely given that sample size.

- Stopping trials of ineffective drugs sooner
- Finding coffee in pi
- Balancing profit and learning in A/B testing

[1] This does not contradict the likelihood principle; it says that informative stopping rules should be incorporated into the likelihood function.

The post Informative stopping first appeared on John D. Cook.

]]>The post Expert determination for CCPA first appeared on John D. Cook.

]]>California’s CCPA regulation has been amended to say that data considered deidentified under HIPAA is considered deidentified under CCPA. The amendment was proposed last year and was finally signed into law on September 25, 2020.

This is good news because it’s relatively clear what deidentification means under HIPAA compared to CCPA. In particular, HIPAA has two well-established alternatives for determining that data have been adequately deidentified:

- Safe Harbor, or
- Expert determination.

The latter is especially important because most useful data doesn’t meet the requirements of Safe Harbor.

I provide companies with HIPAA expert determination. And now by extension I can provide expert determination under CCPA.

I’m not a lawyer, and so nothing I write should be considered legal advice. But I work closely with lawyers to provide expert determination. If you would like to discuss how I could help you, let’s talk.

The post Expert determination for CCPA first appeared on John D. Cook.

]]>The post Category theory for programmers made easier first appeared on John D. Cook.

]]>Unfortunately, there are couple unnecessary difficulties anyone wanting to understand monads etc. is likely to face immediately. One is some deep set theory.

“A category is a collection of objects …”

“You mean like a set?”

“Ah, well, no. You see, Bertrand Russell showed that …”

There are reasons for such logical niceties, but they don’t matter to someone who wants to understand programming patterns.

Another complication is morphisms.

“As I was saying, a category is a collection of objects and morphisms between objects …”

“You mean like

functions?”“Well, they

mightbe functions, but more generally …”

Yes, Virginia, morphisms are functions. It’s true that they might not always be functions, but they will be functions in every example you care about, at least for now.

**Category theory is a framework for describing patterns** in function composition, and so that’s why things like monads find their ultimate home in category theory. But doing category theory rigorously requires some setup that people eager to get into applications don’t have to be concerned with.

Patrick Honner posted on Twitter recently that his 8-year-old child asked him what area is. My first thought on seeing that was that a completely inappropriate answer would be that this is a deep question that wasn’t satisfactorily settled until the 20th century using measure theory. My joking response to Patrick was

Well, first we have to define σ-algebras. They’re kinda like topologies, but closed under countable union and intersection instead of arbitrarily union and finite intersection. Anyway, a measure is a …

It would be ridiculous to answer a child this way, and it is nearly as ridiculous to burden a programmer with unnecessary logical nuance when they’re trying to find out why something is called a functor, or a monoid, or a monad, etc.

I saw an applied category theory presentation that began with “A category is a graph …” That sweeps a lot under the rug, but it’s not a bad conceptual approximation.

So my advice to programmers learning category theory is to focus on the arrows in the diagrams. Think of them as functions; they probably are in your application [1]. Think of category theory as a framework for describing patterns. The rigorous foundations can be postponed, perhaps indefinitely, just as an 8-year-old child doesn’t need to know measure theory to begin understanding area.

[1] The term “contravariant functor” has unfortunately become deprecated. In more modern presentations, all functors are covariant, but some are covariant in an opposite category. That does make the presentation more slick, but at the cost of turning arrows around that used to represent functions and now don’t really. In my opinion, category theory would be more approachable if we got rid of all “opposite categories” and said that functors come in two flavors, covariant and contravariant, at least in introductory presentations.

The post Category theory for programmers made easier first appeared on John D. Cook.

]]>The post Is every number a random Fibonacci number? first appeared on John D. Cook.

]]>*f*_{1} = *f*_{2} = 1,

and

*f*_{n} = *f*_{n-1} ± *f*_{n-2}

for *n* > 2, where the sign is chosen randomly to be +1 or -1.

**Conjecture**: Every integer can appear in a random Fibonacci sequence.

Here’s why I believe this might be true. The values in a random Fibonacci sequence of length *n* are bound between –*F*_{n-3} and *F*_{n}.[1] This range grows like *O*(φ^{n}) where φ is the golden ratio. But the number of ways to pick + and – signs in a random Fibonacci equals 2^{n}.

By the pigeon hole principle, some choices of signs must lead to the same numbers: if you put 2^{n} balls in φ^{n} boxes, some boxes get more than one ball since φ < 2. That’s not quite rigorous since the range is *O*(φ^{n}) rather than exactly φ^{n}, but that’s the idea. The graph included in the previous post shows multiple examples where different random Fibonacci sequences overlap.

Now the pigeon hole principle doesn’t show that the conjecture is true, but it suggests that there could be enough different sequences that it might be true. The fact that the ratio of balls to boxes grows exponentially doesn’t hurt either.

**Empirically**, it appears that as you look at longer and longer random Fibonacci sequences, gaps in the range are filled in.

The following graphs consider all random Fibonacci sequences of length *n*, plotting the smallest positive integer and the largest negative integer not in the range. For the negative integers, we take the absolute value. Both plots are on a log scale.

First positive number missing:

Absolute value of first negative number missing:

The span between the largest and smallest possible random Fibonacci sequence value is growing exponentially with *n*, and the range of consecutive numbers in the range is apparently also growing exponentially with *n*.

The following Python code was used to explore the gaps.

import numpy as np from itertools import product def random_fib_range(N): r = set() x = np.ones(N, dtype=int) for signs in product((-1,1), repeat=(N-2)): for i in range(2, N): b = signs[i-2] x[i] = x[i-1] + b*x[i-2] r.add(x[i]) return sorted(list(r)) def stats(r): zero_location = r.index(0) # r is sorted, so these are the min and max values neg_gap = r[0] // minimum pos_gap = r[-1] // maximum for i in range(zero_location-1, -1, -1): if r[i] != r[i+1] - 1: neg_gap = r[i+1] - 1 break for i in range(zero_location+1, len(r)): if r[i] != r[i-1] + 1: pos_gap = r[i-1] + 1 break return (neg_gap, pos_gap) for N in range(5,25): r = random_fib_range(N) print(N, stats(r))

**Update**: Nathan Hannon gives a simple proof of the conjecture by induction in the comments.

You can create the series (1, 2) and (1, 3). Now assume you can create (1, *n*). Then you can create (1, *n*+2) via (1, *n*, *n+1,* 1, n+2). So you can create any positive even number starting from (1, 2) and any odd positive number from (1, 3).

You can do something analogous for negative numbers via (1, *n,* *n*-1, -1, *n*-2, *n*-3, -1, *2-n,* 3-*n*, 1, *n*-2).

This proof can be used to create an upper bound on the time required to hit a given integer, and a lower bound on the probability of hitting a given integer during a random Fibonacci sequence.

Nathan’s construction requires more steps to produce new negative numbers, but that is consistent with the range of random Fibonacci sequences being wider on the positive side, [-*F*_{n-3}, *F*_{n}].

***

[1] To minimize the random Fibonacci sequence, you can chose the signs so that the values are 1, 1, 0, -1, -1, -2, -3, -5, … Note that absolute value of this sequence is the ordinary Fibonacci sequence with 3 extra terms spliced in. That’s why the lower bound is –*F*_{n-3}.

The post Is every number a random Fibonacci number? first appeared on John D. Cook.

]]>The post Random Fibonacci numbers first appeared on John D. Cook.

]]>*F*_{n} = *F*_{n-1} + *F*_{n-2}.

A random Fibonacci sequence *f* is defined similarly, except the addition above is replaced with a subtraction with probability 1/2. That is, *f*_{1} = *f*_{2} = 1, and for *n* > 2,

*f*_{n} = *f*_{n-1} + *b* *f*_{n-2}

where *b* is +1 or -1, each with equal probability.

Here’s a graph a three random Fibonacci sequences.

Here’s the Python code that was used to produce the sequences above.

import numpy as np def rand_fib(length): f = np.ones(length) for i in range(2, length): b = np.random.choice((-1,1)) f[i] = f[i-1] + b*f[i-2] return f

It’s easy to see that the *n*th random Fibonacci number can be as large as the *n*th ordinary Fibonacci number if all the signs happen to be positive. But the numbers are typically much smaller.

The *n*th (ordinary) Fibonacci number asymptotically approaches φ^{n} is the golden ratio, φ = (1 + √5)/2 = 1.618…

Another way to say this is that

The *n*th random Fibonacci number does not have an asymptotic value—it wanders randomly between positive and negative values—but with probability 1, the absolute values satisfy

This was proved in 1960 [1].

Here’s a little Python code to show that we get results consistent with this result using simulation.

N = 500 x = [abs(rand_fib(N)[-1])**(1/N) for _ in range(10)] print(f"{np.mean(x)} ± {np.std(x)}")

This produced

1.1323 ± 0.0192

which includes the theoretical value 1.1320.

**Update**: The next post looks at whether every integer appear in a random Fibonacci sequence. Empirical evidence suggests the answer is yes.

[1] Furstenberg and Kesten. Products of random matrices. Ann. Math. Stat. 31, 457-469.

The post Random Fibonacci numbers first appeared on John D. Cook.

]]>The post Edsger Dijkstra, blogger first appeared on John D. Cook.

]]>I’ve been thinking about Edsger Dijkstra lately because I suspect some of the ideas he developed will be useful for a project I’m working on.

While searching for some of Dijkstra’s writings I ran across the article Edsger Dijkstra: The Man Who Carried Computer Science on His Shoulders. It occurred while reading this article that Dijkstra was essentially a blogger before there were blogs.

Here is a description of his writing from the article:

… Dijkstra’s research output appears respectable, but otherwise unremarkable by current standards. In this case, appearances are indeed deceptive. Judging his body of work in this manner misses the mark completely. Dijkstra was, in fact, a highly prolific writer, albeit in an unusual way.

In 1959, Dijkstra began writing a series of private reports. Consecutively numbered and with his initials as a prefix, they became known as EWDs. He continued writing these reports for more than forty years. The final EWD, number 1,318, is dated April 14, 2002. In total, the EWDs amount to over 7,700 pages. Each report was photocopied by Dijkstra himself and mailed to other computer scientists.

His large collection of small articles sounds a lot like a blog to me.

You can find Dijkstra’s “blog” here.

The post Edsger Dijkstra, blogger first appeared on John D. Cook.

]]>The post Gruntled vs disgruntled first appeared on John D. Cook.

]]>Here’s a comparison of the frequency of *gruntled* vs *disgruntled* from 1860 to 2000.

In 2000, *disgruntled* was about 200x more common than *gruntled* in the books in Google’s English corpus.

But if you look further back, *gruntled* was used a little more often.

But it turns out that the people who were gruntled in the 19th century were chiefly British. If we look at just the American English corpus, no one was gruntled.

There’s a rise in the frequency of disgruntled as you look backward from 1815, which prompted me to look further back. Looking at just the American English corpus, a lot of people were disgruntled between 1766 and 1776 for some reason.

The post Gruntled vs disgruntled first appeared on John D. Cook.

]]>The post Doing well first appeared on John D. Cook.

]]>I’m busy, though my rate of blogging is fairly independent of how busy I am. Sometimes being busy gives me lots of ideas of things to blog about.

Many small businesses have been crushed this year, but I’m grateful that my business has grown despite current events. For now I have all the work I care to do, and a promising stream of projects in the pipeline. Of course things could change suddenly, but ever was it so.

On a more personal note, my family is also doing well and growing. My daughter is getting married soon. It will be a small wedding with live streaming, quite different from our latest family wedding but we’re looking forward to it just as much.

The post Doing well first appeared on John D. Cook.

]]>The post More fun with quatrefoils first appeared on John D. Cook.

]]>*r* = *a* + |cos(2θ)|

Here are some examples of how these curves look for varying values of *a*.

As *a* increases, the curves get rounder. We can quantify this by looking at the angle between the tangents on either side of the cusps. By symmetry, we can pick any one of the four cusps, so we’ll work with the one at θ = π/4 for convenience.

The slopes of the tangent lines are the left and right derivatives

Now the derivative of

*a* + |cos(2θ)|

with respect to θ at θ = π/4 is 2 from one size and -2 from the other.

Sine and cosine are equal at π/4, they cancel out in the ratio above and so the two derivatives, the slopes of the two tangent lines, are (2+a)/(2-*a*) and (2-*a*)/(2+*a*). The slopes are reciprocals of each other, which is what we’d expect since the quatrefoils are symmetric about the line θ = π/4.

The angles of the two tangent lines are the inverse tangents of the slopes, and so the angle between the two tangent lines is

Note that as *a* goes to zero, so does the angle between the tangent lines.

Here’s a plot of the angle as a function of *a*.

You could start with a desired angle and solve the equation above numerically for the value of *a* that gives the angle. From the graph above, it looks like if we wanted the curves to intersect at 90° we should pick *a* around 2. In fact, we should pick *a* exactly equal to 2. There the slopes are (2+2)/(2-2) = ∞ and (2-2)/(2+2) = 0, i.e. one tangent line is perfectly vertical and the other is perfectly horizontal.

The post More fun with quatrefoils first appeared on John D. Cook.

]]>The post The word problem first appeared on John D. Cook.

]]>The word problem is essentially about whether you can always apply algebraic rules in an automated way. The reason it is called the word problem is that you start by a description of your algebraic system in terms of symbols (“letters”) and concatenations of symbols (“words”) subject to certain rules, also called relations.

For example, you can describe a group by saying it contains *a* and *b*, and it satisfies the relations

*a*² = *b*²

and

*a*^{-1}*b**a* = *b*^{-1}.

A couple things are implicit here. We’ve said this a group, and since every element in a group has an inverse, we’ve implied that *a*^{-1} and *b*^{-1} are in the group as well. Also from the definition of a group comes the assumption that multiplication is associative, that there’s an identity element, and that inverses work like they’re supposed to.

In the example above, you could derive everything about the group from the information given. In particular, someone could give you two words—strings made up of *a*, *b*, *a*^{-1}, and *b*^{-1}—and you could determine whether they are equal by applying the rules. But in general, this is not possible for groups.

The bad news is that in general this isn’t possible. In computer science terminology, the word problem is *undecidable*. There is no algorithm that can tell whether two words are equal given a list of relations, at least not in general. There are special cases where the word problem is solvable, but a general algorithm is not possible.

I presented the word problem above in the context of groups, but you could look at the word problem in more general contexts, such as semigroups. A semigroup is closed under some associative binary operation, and that’s it. There need not be any inverses or even an identity element.

Here’s a concrete example of a semigroup whose word problem has been proven to be undecidable. As before we have two symbols, *a* and *b*. And because we are in a semigroup, not a group, there are no inverses. Our semigroup consists of all finite sequences make out of *a*‘s and *b*‘s, subject to these five relations.

*a**b**a*^{2}*b*^{2} = *b*^{2}*a*^{2}*b**a*

*a*^{2}*b**a**b*^{2}*a* = *b*^{2}*a*^{3}*b**a*

*a**b**a*^{3}*b*^{2} = *a**b*^{2}*a**b**a*^{2}

*b*^{3}*a*^{2}*b*^{2}*a*^{2}*b**a* = *b*^{3}*a*^{2}*b*^{2}*a*^{4}

*a*^{4}*b*^{2}*a*^{2}*b**a* = *b*^{2}*a*^{4}

Source: Term Rewriting and All That

When I first saw groups presented this as symbols and relations, I got my hopes up that a large swath of group theory could be automated. A few minutes later my naive hopes were dashed. So in my mind I thought “Well, then this is hopeless.”

But that is not true. Sometimes the word problem *is* solvable. It’s like many other impossibility theorems. There’s no fifth degree analog of the quadratic equation *in* *general*, but there are fifth degree polynomials whose roots can be found in closed form. There’s no program that can tell whether *any arbitrary program* will halt, but that doesn’t mean you can’t tell whether *some* programs halt.

It didn’t occur to me at the time that it would be worthwhile to explore the boundaries, learning which word problems can or cannot be solved. It also didn’t occur to me that I would run into things like the word problem in practical applications, such as simplifying symbolic expressions and optimizing their evaluation. Undecidable problems lurk everywhere, but you can often step around them.

The post The word problem first appeared on John D. Cook.

]]>The post Real-time analytics first appeared on John D. Cook.

]]>Whom the gods would destroy, they first give real-time analytics.

Having more up-to-date information is only valuable up to a point. Past that point, you’re more likely to be distracted by noise. The closer you look at anything, the more irregularities you see, and the more likely you are to over-steer [1].

I don’t mean to imply that the noise isn’t real. (More on that here.) But there’s a temptation to pay more attention to the small variations you don’t understand than the larger trends you believe you do understand.

I became aware of this effect when simulating Bayesian clinical trial designs. The more often you check your stopping rule, the more often you will stop [2]. You want to monitor a trial often enough to shut it down, or at least pause it, if things change for the worse. But monitoring too often can cause you to stop when you don’t want to.

A long time ago I wrote about the graph below.

The graph looks awfully jagged, until you look at the vertical scale. The curve represents the numerical difference between two functions that are exactly equal in theory. As I explain in that post, the curve is literally smoother than glass, and certainly flatter than a pancake.

[1] See The Logic of Failure for a discussion of how over-steering is a common factor in disasters such as the Chernobyl nuclear failure.

[2] Bayesians are loathe to talk about things like α-spending, but when you’re looking at stopping *frequencies*, frequentist phenomena pop up.

The post Real-time analytics first appeared on John D. Cook.

]]>The post Naive modeling first appeared on John D. Cook.

]]>In his book The Algorithm Design Manual, Steven Skiena has several sections called “War Stories” where he talks about his experience designing algorithms for clients.

Here’s an excerpt of a story about finding the best airline ticket prices.

“Look,” I said at the start of the first meeting. “This can’t be so hard. Consider a graph … The path/fare can be found with Dijkstra’s shorted path algorithm. Problem solved!” I announced waving my hand with a flourish.

The assembled cast of the meeting nodded thoughtfully, then burst out laughing.

Skiena had greatly underestimated the complexity of the problem, but he learned, and was able to deliver a useful solution.

This reminds me of a story about a calculus professor who wrote a letter to a company that sold canned food explaining how they could use less metal for the same volume by changing the dimensions of their can. Someone wrote back thanking him for his suggestion listing reasons why the optimization problem was far more complicated than he had imagined. If anybody has a link to that story, please let me know.

**Related post**: Bring out your equations!

The post Naive modeling first appeared on John D. Cook.

]]>The post Opening Windows files from bash and eshell first appeared on John D. Cook.

]]>On the Windows command line, you can type the name of a file and Windows will open the file with the default application associated with its file extension. For example, typing `foo.docx`

and pressing Enter will open the file by that name using Microsoft Word, assuming that is your default application for `.docx`

files.

Unix shells don’t work that way. The first thing you type at the command prompt must be a command, and `foo.docx`

is not a command. The Windows command line generally works this way too, but it makes an exception for files with recognized extensions; the command is inferred from the extension and the file name is an argument to that command.

When you’re running bash on Windows, via WSL (Windows Subsystem for Linux), you can run the Windows utility `start`

which will open a file according to its extension. For example,

cmd.exe /C start foo.pdf

will open the file `foo.pdf`

with your default PDF viewer.

You can also use `start`

to launch applications without opening a particular file. For example, you could launch Word from bash with

cmd.exe /C start winword.exe

Eshell is a shell written in Emacs Lisp. If you’re running Windows and you do not have access to WSL but you do have Emacs, you can run eshell inside Emacs for a Unix-like environment.

If you try running

start foo.pdf

that will probably not work because eshell does not use the windows PATH environment.

I got around this by creating a Windows batch file named `mystart.bat`

and put it in my path. The batch file simply calls `start`

with its argument:

start %

Now I can open `foo.pdf`

from eshell with

mystart foo.pdf

The solution above for bash

cmd.exe /C start foo.pdf

also works from eshell.

(I just realized I said two contradictory things: that eshell does not use your path, and that it found a bash file in my path. I don’t know why the latter works. I keep my batch files in `c:/bin`

, which is a Unix-like location, and maybe eshell looks there, not because it’s in my Windows path, but because it’s in what it would expect to be my path based on Unix conventions. I’ve searched the eshell documentation, and I don’t see how to tell what it uses for a path.)

The post Opening Windows files from bash and eshell first appeared on John D. Cook.

]]>The post Generating all primitive Pythagorean triples with linear algebra first appeared on John D. Cook.

]]>*a*² + *b*² = *c*².

A primitive Pythagorean triple (PPT) is a Pythagorean triple whose elements are relatively prime. For example, (50, 120, 130) is a Pythagorean triple, but it’s not primitive because all the entries are divisible by 10. But (5, 12, 13) is a primitive Pythagorean triple.

A method of generating all PPTs has been known since the time of Euclid, but I recently ran across a different approach to generating all PPTs [1].

Let’s standardize things a little by assuming our triples have the form (*a*, *b*, *c*) where *a* is odd, *b* is even, and *c* is the hypotenuse [2]. In every PPT one of the sides is even and one is odd, so we will assume the odd side is listed first.

It turns out that all PPTs can be found by multiplying the column vector [3, 4, 5] repeatedly by matrices *M*_{0}, *M*_{1}, or *M*_{2}. In [1], Romik uses the sequence of matrix multiplications needed to create a PPT as trinary number associated with the PPT.

The three matrices are given as follows.

Note that all three matrices have the same entries, though with different signs. If you number the columns starting at 1 (as mathematicians commonly do and computer scientists may not) then *M*_{k} is the matrix whose *k*th column is negative. There is no 0th column, so *M*_{0} is the matrix with no negative columns. The numbering I’ve used here differs from that used in [1].

For example, the primitive Pythagorean triple [5, 12, 13] is formed by multiplying [3, 4, 5] on the left by *M*_{2}. The PPT [117, 44, 125] is formed by multiplying [3, 4, 5] by

*M*_{1} *M*_{1} *M*_{2}.

[1] The dynamics of Pythagorean triples by Dan Romik

[2] Either *a* is odd and *b* is even or vice versa, so we let *a* be the odd one.

If *a* and *b* were both even, *c* would be even, and the triple would not be primitive. If *a* and *b* were both odd, *c*² would be divisible by 2 but not by 4, and so it couldn’t be a square.

The post Generating all primitive Pythagorean triples with linear algebra first appeared on John D. Cook.

]]>The post Playing around with a rational rose first appeared on John D. Cook.

]]>*r* = cos(*k*θ)

where *k* is a positive integer. If *k* is odd, the resulting graph has *k* “petals” and if *k* is even, the plot has 2*k* petals.

Sometimes the term *rose* is generalized to the case of non-integer *k*. This is the sense in which I’m using the phrase “rational rose.” I’m not referring to an awful piece of software by that name [1]. This post will look at a particular rose with *k* = 2/3.

My previous post looked at

*r* = cos(2θ/3)

and gave the plot below.

Unlike the case where *k* is an integer, the petals overlap.

In this post I’d like to look at two things:

- The curvature in the figure above, and
- Differences between polar plots in Python and Mathematica

The graph above has radius 1 since cosine ranges from -1 to 1. The curve is made of arcs that are approximately circular, with the radius of these approximating circles being roughly 1/2, sometimes bigger and sometimes smaller. So we would expect the curvature to oscillate roughly around 2. (The curvature of a circle of radius *r* is 1/*r*.)

The curvature can be computed in Mathematica as follows.

numerator = D[x[t], {t, 1}] D[y[t], {t, 2}] - D[x[t], {t, 2}] D[y[t], {t, 1}] denominator = (D[x[t], t]^2 + D[y[t], t]^2)^(3/2) Simplify[numerator / denominator]

This produces

A plot shows that the curvature does indeed oscillate roughly around 2.

The minimum curvature is 13/9, which the curve takes on at polar coordinate (1, 0), as well as at other points. That means that the curve starts out like a circle of radius 9/13 ≈ 0.7.

The maximum curvature is 3 and occurs at the origin. There the curve is approximately a circle of radius 1/3.

To make the plot we’ve been focusing on, I plotted

*r* = cos(2θ/3)

in Mathematica, but in matplotlib I had to plot

*r* = |cos(2θ/3)|.

In both cases, θ runs from 0 to 8π. To highlight the differences in the way the two applications make polar plots, let’s plot over 0 to 2π with both.

Mathematica produces what you might expect.

PolarPlot[Cos[2 t/3], {t, 0, 2 Pi}]

Matplotlib produces something very different. It handles negative *r* values by moving the point *r* = 0 to a circle in the middle of the plot. Notice the *r*-axis labels at about 22° running from -1 to 1.

theta = linspace(0, 2*pi, 1000) plt.polar(theta, cos(2*theta/3))

Note also that in Mathematica, the first argument to `PolarPlot`

is *r*(θ) and the second is the limits on θ. In matplotlib, the first argument is θ and the second argument is *r*(θ).

Note that in this particular example, taking the absolute value of the function being plotted was enough to make matplotlib act like I expected. That’s only happened true when plotted over the entire range 0 to 8π. In general you have to do more work than this. If we insert absolute value in the plot above, still plotting from 0 to 2π, we do not reproduce the Mathematca plot.

plt.polar(theta, abs(cos(2*theta/3)))

[1] Rational Rose was horribly buggy when I used it in the 1990s. Maybe it’s not so buggy now. But I imagine I still wouldn’t like the UML-laden style of software development it was built around.

The post Playing around with a rational rose first appeared on John D. Cook.

]]>The post Quatrefoils first appeared on John D. Cook.

]]>There’s no single shape known as a quatrefoil. It’s a family of shapes that look something like the figure above.

I wondered how you might write a fairly simple mathematical equation to draw a quatrefoil. Some quatrefoils are just squares with semicircles glued on their edges. That’s no fun.

Here’s a polar equation I came up with that looks like a quatrefoil, if you ignore the interior lines.

This is the plot of *r* = cos(2θ/3).

**Update**: Based on a suggestion in the comments, I’ve written another post on quatrefoils using an equation that has a parameter to control the shape.

The post Quatrefoils first appeared on John D. Cook.

]]>The post Kronecker sum first appeared on John D. Cook.

]]>In the process of looking around for other matrix products, I ran across the **Kronecker sum**. I’ve seen Kronecker *products* many times, but I’d never heard of Kronecker *sums.*

The Kronecker sum is defined in terms of the Kronecker product, so if you’re not familiar with the latter, you can find a definition and examples here. Essentially, you multiply each scalar element of the first matrix by the second matrix *as a block matrix*.

The Kronecker product of an *m* × *n* matrix *A* and a *p* × *q* matrix *B* is a *mp* × *nq* matrix *K* = *A *⊗* B*. You could think of *K* as an *m* × *n* matrix whose entries are *p* × *q* blocks.

So, what is the Kronecker sum? It is defined for two square matrices, an *n* × *n* matrix *A* and an *m* × *m* matrix *B*. The sizes of the two matrices need not match, but the matrices do need to be square. The Kronecker sum of *A* and *B* is

*A* ⊕ *B* = *A* ⊗ *I*_{m} + *I*_{n} ⊗ *B*

where *I*_{m} and *I*_{n} are identity matrices of size *m* and *n* respectively.

Does this make sense dimensionally? The left side of the (ordinary) matrix addition is *nm* × *nm*, and so is the right side, so the addition makes sense.

However, the Kronecker sum is not commutative, and usually things called “sums” are commutative. Products are not always commutative, but it goes against convention to call a non-commutative operation a sum. Still, the Kronecker sum is kinda like a sum, so it’s not a bad name.

I don’t have any application in mind (*yet*) for the Kronecker sum, but presumably it was defined for a good reason, and maybe I’ll run an application, maybe even on the project alluded to at the beginning.

There are several identities involving Kronecker sums, and here’s one I found interesting:

exp( *A* ) ⊗ exp( *B* ) = exp( *A *⊕ *B* ).

If you haven’t seen the exponential of a matrix before, basically you stick your matrix into the power series for the exponential function.

First, let’s define a couple matrices *A* and *B*.

We can compute the Kronecker sums

*S* = *A* ⊕ *B*

and

*T* = *B* ⊕ *A*

with Mathematica to show they are different.

A = {{1, 2}, {3, 4}} B = {{1, 0, 1}, {1, 2, 0}, {2, 0, 3}} S = KroneckerProduct[A, IdentityMatrix[3]] + KroneckerProduct[IdentityMatrix[2], B] T = KroneckerProduct[B, IdentityMatrix[2]] + KroneckerProduct[IdentityMatrix[3], A]

This shows

and so the two matrices are not equal.

We can compute the matrix exponentials of *A* and *B* with the Mathematica function `MatrixExp`

to see that

(I actually used `MatrixExp[N[A]]`

and similarly for *B* so Mathematica would compute the exponentials numerically rather than symbolically. The latter takes forever and it’s hard to read the result.)

Now we have

and so the two matrices are equal.

Even though the identity

exp( *A* ) ⊗ exp( *B* ) = exp( *A *⊕ *B* )

may look symmetrical, it’s not. The matrices on the left do not commute in general. And not only are *A* ⊕ *B* and *B* ⊕ *A* different in general, their exponentials are also different. For example

The post Kronecker sum first appeared on John D. Cook.

]]>