Advanced questions about a basic diagram

Unit circle trig diagram

I saw a hand-drawn version of the diagram above yesterday and noticed that the points were too evenly distributed. That got me to thinking: is there any objective way to say that this famous diagram is in some sense complete? If you were to make a diagram with more points, what would they be?

Simple numbers

The numbers on the diagram are all simple. Once we’re more precise about what it means for these numbers to be “simple,” we can answer the questions above.

The angles in the diagram are all rational parts of a circle, that is, rational multiples of 2π. For the rest of the post, I’ll say “rational angle” to mean a rational multiple of 2π.

The sines and cosines all involve only one square root, i.e. no nested roots. A more useful way to express this is that all the values are the roots of a quadratic polynomial with integer coefficients.

Completeness

Could we add more rational angles whose sines and cosines are roots of quadratics? Maybe the chart would be too cluttered to put in a textbook, but would it be possible in principle? Could there be some chart analogous to the one above that has, for example, (1 + √7)/5 as one of the labels?

The angles in the common unit circle diagram, integer multiples of π/4 and π/6, are the only rational angles with sines and cosines that are roots of a quadratic polynomial with integer coefficients. That is, these are the only rational angles that have sines and cosines that are algebraic of degree 2. In that sense the diagram is complete.

The number (1 + √7)/5 is algebraic of degree 2 [1] but isn’t on our exhaustive list of possible algebraic values of degree 2. So even if you were to try numbers of the form pπ/q for very large integers p and q, you’ll never get a sine or cosine equal to (1 + √7)/5.

In 1933 Lehmer [2] showed how to classify all rational angles whose sines or cosines are algebraic of given degree. His theorem proves that the only rational angles whose sine is algebraic of degree 2 are integer multiples of π/4 and π/6.

Interestingly, there is another rational angle whose cosine is algebraic of degree 2:

cos(π/5) = (1 + √5)/4

So we could extend the unit circle diagram to include multiples of π/5, but only the cosine would be algebraic of degree 2. The sines are more complicated. For example,

sin(π/5) = √(5/8 + √(5)/8)

which is algebraic of degree 4.

Higher degrees

There are no rational angles whose sine is algebraic of degree 3, so going up to degree 3 wouldn’t help.

If we go up to degree 4 then we could add multiples of π/5, π/8, and π/12. These all have sines and cosines that are algebraic of degree 4.

Related posts

[1] (1 + √7)/5  is a root of 25x² − 10x = 6 = 0.

[2] D. H. Lehmer. A Note on Trigonometric Algebraic Numbers. The American Mathematical Monthly , March 1933, Vol. 40, No. 3, pp. 165–166

The Borwein integrals

The Borwein integrals introduced in [1] are a famous example of how proof-by-example can go wrong.

Define sinc(x) as sin(x)/x. Then the following equations hold.

 \begin{align*} \int_0^\infty \text{sinc}(x) \,dx &= \frac{\pi}{2} \\ \int_0^\infty \text{sinc}(x) \, \text{sinc}\left(\frac{x}{3}\right) \,dx &= \frac{\pi}{2} \\ \int_0^\infty \text{sinc}(x)\, \text{sinc}\left(\frac{x}{3}\right) \,\text{sinc}\left(\frac{x}{5}\right) \,dx &= \frac{\pi}{2} \\ \vdots &\phantom{=} \\ \int_0^\infty \text{sinc}(x) \, \text{sinc}\left(\frac{x}{3}\right) \cdots \text{sinc}\left(\frac{x}{13}\right) \,dx &= \frac{\pi}{2} \\ \end{align*}

However

\int_0^\infty \text{sinc}(x) \, \text{sinc}\left(\frac{x}{3}\right) \cdots \text{sinc}\left(\frac{x}{15}\right) \,dx = \frac{\pi}{2} - \delta

where δ ≈ 2.3 × 10−11.

This is where many presentations end, concluding with the moral that a pattern can hold for a while and then stop. But I’d like to go just a little further.

Define

B(n) = \int_0^\infty \prod_{k=0}^{n} \text{sinc}\left(\frac{x}{2k+1}\right) \, dx.

Then B(n) = π/2 for n = 1, 2, 3, …, 6 but not for n = 7, though it almost holds for n = 7. What happens for larger values of n?

The Borwein brothers proved that B(n) is a monotone function of n, and the limit as n → ∞ exists. In fact the limit is approximately π/2 − 0.0000352.

So while it would be wrong to conclude that B(n) = π/2 based on calculations for n ≤ 6, this conjecture would be approximately correct, never off by more than 0.0000352.

[1] David Borwein and Jonathan Borwein. Some Remarkable Properties of Sinc and Related Integrals. The Ramanujan Journal, 3, 73–89, 2001.

This-way-up and Knuth arrows

I was looking today at a cardboard box that had the “this way up” symbol on it and wondered whether there is a Unicode value for it.

ISO 7000 symbol 0623 This way up

Apparently not. But there is an ISO code for it: ISO 7000 symbol 0623. It’s an international standard symbol for indicating how to orient a package. The name says it all: this way up.

There is a similar symbol in math and computer science: ↑↑. This is so-called up-arrow notation, introduced by Donald Knuth in 1976 [1].

In Knuth’s notation, ↑ indicates exponentiation, i.e. repeated multiplication, and ↑↑ indicates repeated exponentiation. There’s a little ambiguity here: we have to clarify in what order we apply exponentiation. Knuth stipulated that ↑↑ is right-associative, i.e.

b \uparrow \uparrow n &=& b \underbrace{\uparrow(b \uparrow \cdots(b\uparrow b))}_{n \text{ copies of } b} = \underbrace{b^{b^{.\,^{.\,^{.\,^b}}}}}_{n \text{ copies of } b}

So, for example, 5 ↑↑ 3 equals 5125, not 1255.

In general, n arrows means to repeatedly apply n-1 arrows. If we use ↑n as a shortcut for writing n up arrows, then we can define ↑n recursively as meaning we apply ↑n−1 n times.

What I find most interesting about Knuth’s notation is how rarely it is used. I don’t think it’s because anyone object’s to Knuth’s notation; it’s just that there isn’t much need for what the notation represents. It’s primary use may be theoretical computer science. There you sometimes want to construct functions that grow ridiculously fast, such as Ackermann’s function, and functions of the form an b are good for that.

This is curious. Multiplication is repeated addition, exponentiation is repeated multiplication, and so repeated exponentiation seems like a natural extension. I won’t say that it’s unnatural, but it is very uncommon.

Related posts

[1] Donald E. Knuth. “Mathematics and Computer Science: Coping with Finiteness”. Science. 194 (4271): 1235–1242.

Factoring pseudoprimes

Fermat’s little theorem says that if p is a prime number, then for any positive integer b < p we hve

bp-1 = 1 (mod p).

This theorem gives a necessary but not sufficient condition for a number to be prime.

Fermat’s primality test

The converse of Fermat’s little theorem is not always true, but it’s often true. That is, if there exists some base 1 < b < n such that

bn-1 = 1 (mod n)

then n is likely to be prime. There are examples where the equation above holds for a pair (b, n) even though n is not prime, and in that case n is called a pseudoprime to the base b.

If you’re searching for large primes, say for use in encryption, then you’d begin by applying Fermat’s little theorem with a few small values of b. This is because although Fermat’s test can’t prove that a number is prime, it can prove that a number is not prime.

For a small example, suppose you wanted to test whether 50621 is prime. You could start by applying Fermat’s test with b = 2 as in the following Python code.

>>> n = 50621
>>> 2**(n−1) % n
9605

Since the result is not 1, we know 50621 is not prime. This doesn’t tell us what the factors of 50621 are, but we know that it has nontrivial factors. We say 2 is a witness that the number 50621 is not prime.

Next, let’s see whether 294409 might be prime.

>>> n = 294409
>>> 2**(n−1) % n
1

This tells us 294409 might be prime. It has passed a test that filters out a lot of composite numbers. What now? We could try other values of b: 3, 5, 7, 11, …. This will not resolve the question of whether 294409 is prime unless we keep going until we try 37. And in fact 37 is the smallest factor of 294409. Our number 294409 is a Carmichael number, a composite number n that passes Fermat’s primality test for all bases b relatively prime to n.

Note that it would be more efficient to use pow(b, n−1, n) rather than 2**(n−1) % n because the former takes advantage of the fact that we don’t need to compute 2n−1 per se and can reduce all intermediate calculations mod n.

Factoring pseudoprimes

Now suppose we have a number n that has passed Fermat’s primality test for some base b and we suspect that n is a pseudoprime. If we want to (try to) factor n, knowing that it is a pseudoprime to the base b gives us a head start. We can exploit the fact that we know b to factor n in polynomial time, unless n is a strong pseudoprime.

Suppose we have a number n that we suspect is a pseudoprime to the base b, and we’re smart enough to at least check that n is an odd number, then we begin by pulling out all the factors of 2 that we can from n − 1:

n − 1 = 2e f.

Next consider the set of numbers

bkf

for k = 1, 2, 4, …, 2e. Let x be the smallest of these numbers which is not congruent to 1 mod n. The existence of such an x is essentially the definition of strong pseudoprime [1].

Then gcd(x − 1, n) and gcd(x + 1, n) are factors of n. This is theorem 10.4 of [2].

Python example

Let n = 873181. This is a pseudoprime to the base b = 3, which we can confirm by seeing that pow(3, n−1, n) returns 1.

Now 873180 is divisible by 4 but not by 8, so e = 2. So the theorem above says we should compute

>>> b, e = 3, 2
>>> [pow(b, f*2**k, n) for k in range(e+1)]

This produces [2643, 1, 1]. So x = 2643,

>> x = 2643
>>> from sympy import gcd
>>> gcd(x−1, n)
1321
>>> gcd(x+1, n)
661

shows that 1321 and 661 are both factors of 873181.

Related posts

[1] Definition of strong pseudoprime. A strong pseudo prime to base b is a composite odd integer m such that if m − 1 = 2ef with f odd, then either bf = 1 (mod m) or bf2c ≡ −1 (mod m) for some 0 ≤ c < e.

[2] The Joy of Factoring by Samuel S. Wagstaff, Jr.

Is Low Precision Arithmetic Safe?

The popularity of low precision arithmetic for computing has exploded since the 2017 release of the Nvidia Volta GPU. The half precision tensor cores of Volta offered a massive 16X performance gain over double precision for key operations. The “race to the bottom” for lower precision computations continues: some have even solved significant problems using 1-bit precision arithmetic hardware ([1], [2]). And hardware performance is getting even better: the Nvidia H100 tensor core-enabled FP16 is a full 58X faster than standard FP64, and 1-bit precision is yet another 16X faster than this, for total speedup of over 900X for algorithms that can use it [3].

This eye-popping speedup certainly draws attention. However, in scientific computing, low precision arithmetic has typically been seen as unsafe for modeling and simulation codes. Indeed, lower precision can sometimes be used to advantage [4], commonly in a “mixed precision” setting in which only parts of the calculation are done in low precision. However, in general anything less than double precision is considered inadequate to model complex physical phenomena with fidelity (see, e.g., [5]).

In response, developers have created tools to measure the safety of reduced precision arithmetic in application codes [6]. Some tools can even identify which variables or arrays can be safely demoted to lower precision without loss of accuracy in the final result. However, use of these tools in a blind fashion, not backed by some kind of reasoning process, can be hazardous.

An example will illustrate this. The conjugate gradient method for linear system solving and optimization [7] and the closely related Lanczos method for eigenvalue problem solving [8] showed great promise following their invention in the early 1950s. However, they were considered unsafe due to catastrophic roundoff errors under floating point arithmetic—even more pronounced as floating point precision is reduced. Nonetheless, Chris Paige showed in his pioneering work in the 1970s [9] that the roundoff error, though substantial, did not preclude the usefulness of the methods when properly used. The conjugate gradient method has gone on to become a mainstay in scientific computing.

Notice that no tool could possibly arrive at this finding, without a careful mathematical analysis of the methods. A tool would detect inaccuracy in the calculation but could not certify that these errors could cause no harm to the final result.

Some might propose instead a purely data-driven approach: just try low precision on some test cases, if it works then use low precision in production. This approach is fraught with peril, however: the test cases may not capture all situations that could be encountered in production.

For example, one might test an aerodynamics code only on smooth flow regimes, but production runs may encounter complex flows with steep gradients—that low precision arithmetic cannot correctly model. Academic papers that test low precision methods and tools must rigorously evaluate in challenging real-world scenarios like this.

Sadly, computational science teams frequently don’t have the time to evaluate their codes for potential use of lower precision arithmetic. Tools could certainly help. Also, libraries that encapsulate mixed precision methods can provide benefits to many users. A great success story here is mixed precision dense linear solvers, founded on the solid theoretical work of Nick Highnam and colleagues [10], which has found its way into libraries such as [11].

So the final answer is, “it depends.” Each new case must be looked at carefully, and a determination made based on some combination of analysis and testing.

References

[1] Zhang, Y., Garg, A., Cao, Y., Lew, Ł., Ghorbani, B., Zhang, Z. and Firat, O., 2023. Binarized Neural Machine Translation. arXiv preprint arXiv:2302.04907, https://openreview.net/forum?id=XAyPlfmWpu

[2] Lagergren, J., Cashman, M., Vergara, V.G.M., Eller, P.R., Gazolla, J.G.F.M., Chhetri, H.B., Streich, J., Climer, S., Thornton, P., Joubert, W. and Jacobson, D., 2023. Climatic clustering and longitudinal analysis with impacts on food, bioenergy, and pandemics. Phytobiomes Journal, 7(1), pp.65-77, https://apsjournals.apsnet.org/doi/10.1094/PBIOMES-02-22-0007-R.

[3] “NVIDIA H100 Tensor Core GPU Datasheet,” https://resources.nvidia.com/en-us-tensor-core/nvidia-tensor-core-gpu-datasheet.

[4] G. Alvarez et al., “New algorithm to enable 400+ TFlop/s sustained performance in simulations of disorder effects in high-Tc superconductors,” SC ’08: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, Austin, TX, USA, 2008, pp. 1-10, doi: 10.1109/SC.2008.5218119.

[5] Spafford, K., Meredith, J., Vetter, J., Chen, J., Grout, R., Sankaran, R. (2010). Accelerating S3D: A GPGPU Case Study. In: Lin, HX., et al. Euro-Par 2009 – Parallel Processing Workshops. Euro-Par 2009. Lecture Notes in Computer Science, vol 6043. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14122-5_16.

[6] “Mixed precision analysis tools,” https://scholar.google.com/scholar?q=mixed+precision+analysis+tools

[7] Hestenes, M.R. and Stiefel, E., 1952. Methods of conjugate gradients for solving linear systems. Journal of research of the National Bureau of Standards49(6), pp.409-436, https://nvlpubs.nist.gov/nistpubs/jres/049/jresv49n6p409_a1b.pdf.

[8] Cornelius Lanczos, An Iteration Method for the Solution of the Eigenvalue Problem of Linear Differential and Integral Operators, Journal of Research of the National Bureau of Standards Vol. 45, No. 4, October 1950, https://nvlpubs.nist.gov/nistpubs/jres/045/4/V45.N04.A01.pdf.

[9] Paige, Christopher C.. “The computation of eigenvalues and eigenvectors of very large sparse matrices.” (1971), https://www.cs.mcgill.ca/~chris/pubClassic/PaigeThesis.pdf.

[10] Higham, N.J., Pranesh, S. and Zounon, M., 2019. Squeezing a matrix into half precision, with an application to solving linear systems. SIAM Journal on Scientific Computing41(4), pp.A2536-A2551, https://epubs.siam.org/doi/abs/10.1137/18M1229511.

[11] Lu, Hao; Matheson, Michael; Wang, Feiyi; Joubert, Wayne; Ellis, Austin; Oles, Vladyslav. “OpenMxP-Opensource Mixed Precision Computing,” https://www.osti.gov/biblio/1961398.

How likely is a random variable to be far from its center?

There are many answers to the question in the title: How likely is a random variable to be far from its center?

The answers depend on how much you’re willing to assume about your random variable. The more you can assume, the stronger your conclusion. The answers also depend on what you mean by “center,” such as whether you have in mind the mean or the mode.

Chebyshev’s inequality says that the probability of a random variable X taking on a value more than k standard deviations from its mean is less than 1/k². This of course assumes that X has a mean and a standard deviation.

If we assume further that X is unimodal, and k ≥ √(8/3), then the conclusion of Chebyshev’s inequality can be strengthened to saying that the probability of X being more than k standard deviations from its mean is less than 4/9k². This is the Vysochanskiĭ-Petunin inequality. More on this inequality here.

If k ≤ √(8/3) the Vysochanskiĭ-Petunin inequality says the probability of X being more than k standard deviations from its mean is less than

4/3k² − 1/3.

Gauss’ inequality is similar to the Vysochanskiĭ-Petunin inequality. It also assumes X is unimodal, and for convenience we will assume the mode is at zero (otherwise look at Y = Xm where m is the mode of X). Gauss’ inequality bounds the probability of X being more than k standard deviations away from its mode rather than its mean.

Let τ² be the expected value of X². If the mean value of X is zero then τ² = σ² and the equations below are similar to the Vysochanskiĭ-Petunin inequality. But Gauss does not require that X has mean zero.

Gauss’ inequality says that

P(|X| > kτ) ≤ 4/9k²

if if k ≥ √(4/3) and

P(|X| > kτ) ≤ 1 − k/(√3 τ)

otherwise.

Gauss’ inequality is stronger than the Vysochanskiĭ-Petunin inequality when X has zero mean, but it is also assuming more, namely that the mean and mode are equal.

Related post: Strengthening Markov’s inequality with conditional probability.

Connecting the FFT and quadratic reciprocity

Some readers will look at the title of this post and think “Ah yes, the FFT. I use it all the time. But what is this quadratic reciprocity?”

Others will look at the same title and think “Gauss called the quadratic reciprocity theorem the jewel in the crown of mathematics. But what is this FFT thing? I think I remember an engineer saying something about that.”

Gauss proved a theorem that relates quadratic reciprocity and the FFT. For distinct odd primes p and q, the following equation holds.

\left(\frac{p}{q}\right) \left(\frac{q}{p}\right) = \frac{\text{Tr} {\cal F}_{pq}}{ \text{Tr} {\cal F}_p\, \text{Tr} {\cal F}_q}

I’ll spend the rest of this post unpacking this equation.

Legendre symbols

The expressions on the left are not fractions but rather Legendre symbols. The parentheses are not for grouping but are part of the symbol.

The Legendre symbol

\left(\frac{a}{r}\right)

is defined to be 0 if a is a multiple of r, 1 if a has a square root mod r, and −1 otherwise.

Fourier transforms

The Discrete Fourier Transform (DFT) of a vector of length n multiplies the vector by the n by n Fourier matrix Fp whose j, k entry is equal to exp(2πi jk / n). The Fast Fourier Transform (FFT) is a way to compute the DFT more quickly than directly multiplying by the Fourier matrix. Since the DFT is nearly always computed using the FFT algorithm, the DFT is commonly referred to as the FFT.

Matrix trace

The trace of a matrix is the sum of the elements along the main diagonal. So the trace of the Fourier matrix of size n is

\text{Tr} {\cal F}_n = \sum_{j=1}^n \exp(2\pi ij^2/n)

Numerical illustration

The quadratic reciprocity theorem, also due to Gauss, is usually stated as

\left(\frac{p}{q}\right) \left(\frac{q}{p}\right) = (-1)^{\frac{p-1}{2}\frac{q-1}{2}}

We can illustrate the theorem at the top of the page numerically with the following Python code, using the quadratic reciprocity theorem to evaluate the product of the Legendre symbols.

from numpy import exp, pi

tr = lambda p: sum(exp(2j*pi*k**2/p) for k in range(1, p+1))
p, q = 59, 17
print( tr(p*q)/(tr(p)*tr(q)) )
print( (-1)**((p-1)*(q-1)/4) ) 

The first print statement produces (0.9999999999998136-1.4048176871018313e-13j) due to some loss of precision due to floating point calculations, but this is essentially 1, which is what the second print statement produces.

If we change q to 19, both statements print −1 (after rounding the first result).

Quadratic Gauss sum

We can quickly prove

\left(\frac{p}{q}\right) \left(\frac{q}{p}\right) = \frac{\text{Tr} {\cal F}_{pq}}{ \text{Tr} {\cal F}_p\, \text{Tr} {\cal F}_q}

if we assume the quadratic reciprocity theorem and the following equation for the trace of the Fourier matrix.

\text{Tr} {\cal F}_n = \sum_{j=1}^n \exp(2\pi ij^2/n) =
\left\{
	\begin{array}{ll}
		\sqrt{n}  & \mbox{if } n \equiv 1 \bmod{4} \\
		0 & \mbox{if } n \equiv 2 \bmod{4} \\
                i\sqrt{n} & \mbox{if } n \equiv 3 \bmod{4} \\
                (1+i)\sqrt{n} & \mbox{if } n \equiv 0 \bmod{4} \\
	\end{array}
\right.

This proof is historically backward. It assumes quadratic reciprocity, but Gauss proved quadratic reciprocity by first proving the equation we’re trying to prove. He then obtained the expression on the right hand side of the quadratic reciprocity theorem using the equation above for the trace of the Fourier matrix.

The trace of the Fourier matrix is now called a quadratic Gauss sum. It’s a special case of more general sums that Gauss studied, motivated by his exploration of quadratic reciprocity.

Incidentally, Gauss gave many proofs of the quadratic reciprocity theorem. I don’t know where the proof outlined hear fits into the sequence of proofs he developed.

Related posts

Bessel zero spacing

Bessel functions are to polar coordinates what sines and cosines are to rectangular coordinates. This is why Bessel function often arise in applications with radial symmetry.

The locations of the zeros of Bessel functions are important in application, and so you can find software for computing these zeros in mathematical libraries. In days gone by you could find them in printed tables, such as Table 9.5 in A&S.

Bessel functions are solutions to Bessel’s differential equation,

x^2 y'' + x y' + (x^2 - \nu^2) y = 0

For each ν the functions Jν and Yν, known as the Bessel functions of the first and second kind respectively, form a basis for the solutions to Bessel’s equation. These functions are analogous to cosine and sine

As x → ∞, Bessel functions asymptotically behave like damped sinusoidal waves. Specifically,

\begin{align*} J_\nu(x) &\sim \sqrt{\frac{2}{\pi x}} \cos(x - \pi\nu/2 - \pi/4) \\ Y_\nu(x) &\sim \sqrt{\frac{2}{\pi x}} \sin(x - \pi\nu/2 - \pi/4) \end{align*}

So if for large x Bessel functions of order ν behave something like sin(x), you’d expect the spacing between the zeros of the Bessel functions to approach π, and this is indeed the case.

We can say more. If ν² > ¼ then the spacing between zeros decreases toward π, and if ν² < ¼ the spacing between zeros increases toward π. This is not just true of Jν and Yν but also of their linear combinations, i.e. to any solution of Bessel’s equation with parameter ν.

If you look carefully, you can see this in the plots of J0 and J1 below. The solid blue curve, the plot of J0, crosses the x-axis at points closer together than π, and dashed orange curve, the plot of J1, crosses the x-axis at points further apart than π.

For more on the spacing of Bessel zeros see [1].

Related posts

[1] F. T. Metcalf and Milos Zlamal. On the Zeros of Solutions to Bessel’s Equation. The American Mathematical Monthly, Vol. 73, No. 7, pp. 746–749

Coloring the queen’s graph

Suppose we have an n × n chessboard. The case n = 8 is of course most common, but we consider all positive integer values of n.

The graph of a chess piece has an edge between two squares if and only if the piece can legally move between the two squares.

Now suppose we have a rule that if a piece can move between two squares, the squares must be colored differently. The minimum number of colors necessary to color the chessboard this was is the chromatic number of that piece’s graph.

The chromatic number of a rook is clearly at least n: since a rook can move between any two squares in a row, every square in the row must be a different color. And in fact the chromatic number for a rook’s graph is exactly n.

Bishops are a little more complicated. If n is even, then the chromatic number for a bishop is n. If n is odd, the number is n for a white bishop but n − 1 for a black bishop. Thinking of a 3 × 3 board shows why the two bishops are different.

Now a queen is sorta like a rook and a bishop combined, but determining the chromatic number for a queen’s graph is more difficult. The chromatic number of a queen can be no less than that of a rook since the former can make all the moves of the latter. So the queen’s chromatic number is at least n, but is it equal to n?

The results stated above can be found in [1].

Let k(n) be the chromatic number of a queen’s graph on an the n × n chessboard. The results in [1] regarding k(n) are summarized as follows.

  • k(1) = 1
  • k(2) = 4
  • k(3) = k(4) = 5
  • k(6) = 7
  • k(n) = n if n is not divisible by 2 or 3.

The authors stated that they had found an example proving that k(8) ≤ 9 and conjectured that k(8) = 9. You can find a C++ program that confirms this conjecture here.

The authors conjectured that k(n) ≤ n + 1 for all n > 3. I don’t know whether this has been proven. My impression following a brief search is that the problem of finding k(n) for all n is still not fully resolved.

Update: According to a link in the comments, n = 27 is the smallest n such that k(n) is unknown.

Related posts

[1] M. R. Iyer and V. V. Menon. On Coloring the n × n Chessboard. The American Mathematical Monthly, 1966, Vol. 73, pp. 721–725.

When is a function of two variables separable?

Given a function f(xy), how can you tell whether f can be factored into the product of a function g(x) of x alone and a function h(y) of y alone? Depending on how an expression for f is written, it may or may not be obvious whether f(x, y) can be separated into g(x) h(y).

There are several situations in which you might want to know whether a function is separable. For example, the ordinary differential equation

y′ = f(x, y)

can be solved easily when f(x, y) = g(x) h(y).

You might want to do something similar for a partial differential equation, using separation of variables, possibly choosing a coordinate system that allows the separation of variables trick to work.

Aside from applications to differential equations, you might want to know whether a polynomial in two variables can be factored into the product of polynomials in each variable separately.

In [1] David Scott gives a simple necessary condition for f to be separable:

f fxy = fx fy

Here the subscripts indicate partial derivatives.

It’s easy to see this condition is necessary. Scott shows the condition is also sufficient under some mild technical assumptions.

As an example, determine the value of k such that the differential equation

y′ = 6xy² + 3y² −4x + k

is separable.

Scott’s equation

f fxy = fx fy

becomes

(6xy² + 3y² −4x + k)(12y) = (6y² −4)(12xy + 6y)

which holds if and only if k = −2.

Related posts

[1] David Scott. When is an Ordinary Differential Equation Separable? The American Mathematical Monthly, Vol. 92, No. 6, pp. 422–423