Analogies between Weierstrass functions and trig functions

If you look at the Wikipedia article on Weierstrass functions, you’ll find a line that says “the relation between the sigma, zeta, and ℘ functions is analogous to that between the sine, cotangent, and squared cosecant functions.” This post unpacks that sentence.

Weierstrass p function

First of all, what is ℘? It’s the Weierstrass elliptic function, which is the mother of all elliptic functions in some sense. All other elliptic functions can be constructed from this function and its derivatives. As for the symbol itself, ℘ is the traditional symbol for the Weierstrass function. It’s U+2118 in Unicode, ℘ in HTML, and \wp in LaTeX.

The line above suggests that ℘(x) is analogous to csc²(x). Indeed, the plots of the two functions are nearly identical.

Here’s Weierstrass’ function:

Weierstrass p plot

And here’s csc²:

plot of csc^2

The two plots basically agree to within the thickness of a line.

The Weierstrass function ℘ has two parameters that I haven’t mentioned. Elliptic functions are periodic in two directions in the complex plane, and so their values everywhere are determined by their values over a parallelogram. The two parameters specify the fundamental parallelogram, or at least that’s one way of parametrizing the function. The WeierstrassP function in Mathematica takes two other parameters, called the invariants, and these invariants were chosen in the plot above to match the period of the cosecant function.

    Plot[Sin[x]^-2, {x, -10, 10}]
    Plot[WeierstrassP[
            x, 
            WeierstrassInvariants[{Pi/2, Pi I/2}]
         ], 
        {x, -10, 10}
    ]

The fundamental parallelogram is defined in terms of half periods, usually denoted with ω’s, and the invariants are denoted with g‘s. The function WeierstrassInvariants converts from half periods to invariants, from ω’s to g‘s.

Note that ℘ and cosecant squared are similar along the real axis, but they’re very different in the full complex plane. Trig functions are periodic along the real axis but grow exponentially along the complex axis. Elliptic functions are periodic along both axes.

Weierstrass zeta function

The Weierstrass zeta function is not an elliptic function. It is not periodic but rather quasiperiodic. The derivative of the Weierstrass zeta function is negative of the Weierstrass elliptic function, i.e.

ζ ‘(x) = -℘(x)

which is analogous to the fact that the derivative of cot(x) is -csc²(x). So in that sense ζ is to ℘ as cotangent is to cosecant squared.

The plots of ζ(x) and cot(x) are similar as shown below.

Plot of Weierstrass zeta and cotangent

The Mathematica code to make the plot above was

    Plot[
        {WeierstrassZeta[x, WeierstrassInvariants[{Pi/2, Pi I/2}]], 
         Cot[x]}, 
        {x, -10, 10}, 
        PlotLabels -> {"zeta", "cot"}
    ]

Weierstrass sigma function

The Weierstrass sigma function is also not an elliptic function. It is analogous to the sine function as follows. The logarithmic derivative of the Weierstrass sigma function is the Weierstrass zeta function, just as the logarithmic derivative of sine is cotangent. That is,

(log(σ(x))’ = ζ(x).

The logarithmic derivative of a function is the derivative of its log, and so the logarithmic derivative of a function f is f ‘ / f.

However, the plot of the sigma function, WeierstrassSigma in Mathematica, hardly looks line sine.

Plot of Weierstrass sigma and sine

So in summary, logarithmic derivative takes Weierstrass sigma to Weierstrass zeta just as it takes sine to cotangent. Negative derivative takes Weierstrass zeta to Weierstrass ℘ just as it takes cotangent to negative cosecant squared.

Related elliptic function posts

More efficient way to sum a list comprehension

List comprehensions in Python let you create a list declaratively, much like the way you would describe the set in English. For example,

    [x**2 for x in range(10)]

creates a list of the squares of the numbers 0 through 9.

If you wanted the sum of these numbers, you could of course say

    sum( [x**2 for x in range(10)] )

but in this case the brackets are unnecessary. The code is easier to read and more efficient if you omit the brackets.

    sum( x**2 for x in range(10) )

With the brackets, Python constructs the list first then sums the elements. Without the brackets, you have a generator expression. Python will sum the values as they’re generated, not saving all the values in memory. This makes no difference for a short list as above, but with a large list comprehension the generator expression could be more efficient.

On this day

I looked back to see what posts I had written on this date in years past. January 14 doesn’t have any particular significance for me, so these post are a representative cross section of the blog.

Some years I didn’t write anything on January 14, and some years I wrote posts that are not worth revisiting. Here are some posts that have held up well.

On this day in 2008 I wrote about four tips for learning regular expressions. I keep writing about regular expressions occasionally. Readers find them interesting, in part because they’re useful, but perhaps also because they’re fairly simple but not trivial. There are little complications and subtleties in how they’re used.

On this day in 2009 I wrote about how to display figures side by side in LaTeX. I often get one of two responses when I write about LaTeX. The first is “Thanks. That’s what I needed. Now I can get on with my work.” The other is “That’s not the latest way to do things!” I have updated my LaTeX habits over the years, but I’m not ashamed to use older syntax if it produces the same array of pixels in the end.

On this day in 2011 I commented on conversations that can be summarized as “Your job is trivial. (But I couldn’t do it.)” The explicit message is “I could do this myself” while the implicit message is “There’s no way I could do this myself, and so I desperately need you to do it for me.”

(You’ll find several older posts on the frustrations of working inside a large bureaucratic institution. These posts stopped suddenly around the beginning of 2013 for some reason.)

On this day in 2013 I riffed on a line from Dune about adaptive decision making. The next day I announced here that I was leaving my job to start my consultancy.

On this day in 2015 I elaborated on a someone’s remark that “the result of striving for ultimate simplicity is intolerable complexity.” I’ve seen that over and over, especially in software projects. A moderate desire for simplicity makes things simpler. But when people obsess on a particular notion of simplicity, they can make things horribly complex. These are often very smart people; they make things difficult in a way that less intelligent people couldn’t pull off.

On this day last year I wrote about algorithms for multiplying very large numbers. This is a practical question in cryptography, for example, but there have been recent theoretical developments which are interesting but as yet impractical.

I wonder what I might be writing about this time next year, assuming I’m still around and still blogging. Probably something about applied math; I write about math more often than the sampling above would imply. Maybe I’ll write something about computing or privacy, topics I’ve written about more often recently.

Thanks for reading.

Logistic bifurcation diagram in detail

The image below is famous. I’ve seen it many times, but rarely with a detailed explanation. If you’ve ever seen this image but weren’t sure exactly what it means, this post is for you.

logistic bifurcation diagram

This complex image comes from iterating a simple function

f(x) = r x(1 – x)

known as the logistic map. The iterates of the function can behave differently for different values of the parameter r.

We’ll start by fixing a value of r, with 0 ≤ r < 1. For any starting point 0 ≤ x ≤ 1, f(x) is going to be smaller than x by at least a factor of r, i.e.

0 ≤ f(x) ≤ rx.

Every time we apply f the result decreases by a factor of r.

0 ≤ f( f(x) ) ≤ r²x.

As we do apply f more and more times, the result converges to 0.

For r ≥ 1 it’s not as clear what will happen as we iterate f, so let’s look at a little bit of code to help see what’s going on.

    def f(x, r):
        return r*x*(1-x)

    def iter(x, r, n):
        for _ in range(n):
            x = f(x, r)
        return x

We can see that for 1 ≤ r ≤ 3, and 0 ≤ x ≤ 1, the iterations of f converge to a fixed point, and the value of that fixed point increases with r.

    >>> iter(0.1, 1.1, 100)
    0.09090930810626502

    # more iterations have little effect
    >>> iter(0.1, 1.1, 200)
    0.09090909091486

    # different starting point has little effect
    >>> iter(0.2, 1.1, 200)
    0.0909090909410433

    # increasing r increases the fixed point
    >>> iter(0.2, 1.2, 200)
    0.16666666666666674

Incidentally, for 1 ≤ r ≤ 3, the fixed point equals (r – 1)/r. [1]

When r is a little bigger than 3, things get more interesting. Instead of a single fixed point, the iterates alternate between a pair of points, an attractor consisting of two points.

    >>> iter(0.2, 3.1, 200) 
    0.7645665199585945  
                             
    >>> iter(0.2, 3.1, 201) 
    0.5580141252026958  
                             
    >>> iter(0.2, 3.1, 202) 
    0.7645665199585945 

How can we write code to detect an attractor? We can look at the set of points when we get, say when we iterate 1000, then 1001, etc. up to 1009 times. If this set has less than 10 elements, the iterates must have returned to one of the values. We’ll round the elements in our set to four digits so we actually get repeated values, not just values that differ by an extremely small amount.

    def attractors(x, r):
        return {round(iter(x, r, n), 4) for n in range(1000,1010)}

This is crude but it’ll do for now. We’ll make it more efficient, and we’ll handle the possibility of more than 10 points in an attractor.

Somewhere around r = 3.5 we get not just two points but four points.

    >>> attractors(0.1, 3.5)
    {0.3828, 0.5009, 0.8269, 0.875}

As r gets larger the number of points keeps doubling [2] until chaos sets in.

The function attractors is simple but not very efficient. After doing 1000 iterations, it starts over from scratch to do 1001 iterations etc. And it assumes there are no more than 10 fixed points. The following revision speeds things up significantly.

    def attractors2(x, r):
        x = iter(x, r, 100)
        x0 = round(x, 4)
        ts = {x0}
        for c in range(1000):
            x = f(x, r)
            xr = round(x, 4)
            if xr in ts:
                return ts
            else:
                ts.add(xr)

Notice we put a cap of 1000 on the number of points in the attractor for a given r. For some values of r and x, there is no finite set of attracting points.

Finally, here’s the code that produced the image above.

    import numpy as np
    import matplotlib.pyplot as plt
    rs = np.linspace(0, 4, 1000)
    for r in rs:
        ts = attractors2(0.1, r)
        for t in ts:
            plt.plot(r, t, "ko", markersize=1)
    plt.show()

Update: See this post for graphs showing the trajectory of points over time for varying values of r.

More dynamical system posts

[1] Why is this only for 1 ≤ r ≤ 3? Shouldn’t (r – 1)/r be a fixed point for larger r as well? It is, but it’s not a stable fixed point. If x is ever the slightest bit different from (r – 1)/r the iterates will diverge from this point. This post has glossed over some fine points, such as what could happen on a set of measure zero for r > 3.

[2] The spacing between bifurcations decreases roughly geometrically until the system becomes chaotic. The ratio of one spacing to the next reaches a limit known as Feigenbaum’s constant, approximately 4.669. Playing with the code in this post and trying to estimate this constant directly will not get you very far. Feigenbaum’s constant has been computed to many decimal places, but by using indirect methods.

Various kinds of pseudoprimes

Every condition that all primes satisfy has a corresponding primality test and corresponding class of pseudoprimes.

The most famous example is Fermat’s little theorem. That theorem says that for all primes p, a certain congruence holds. The corresponding notion of pseudoprime is a composite number that also satisfies Fermat’s congruence.

To make a useful primality test, a criterion should be efficient to compute, and the set of pseudoprimes should be small.

Fermat pseudoprimes

A Fermat pseudoprime base b is a composite number n such that

bn-1 = 1 (mod n).

The smallest example is 341 because

2340 = 1 mod 341

and 341 = 11*31 is composite. More on Fermat pseudoprimes here.

Wilson pseudoprimes

Fermat’s little theorem leads to a practical primality test, but not all theorems about primes lead to such a practical test. For example, Wilson’s theorem says that if p is prime, then

(p – 1)! = -1 mod p.

A Wilson pseudoprime would be a composite number n such that

(n – 1)! = -1 mod n.

The good news is that the set of Wilson pseudoprimes is small. In fact, it’s empty!

The full statement of Wilson’s theorem is that

(p – 1)! = -1 mod p

if and only if p is prime and so there are no Wilson pseudoprimes.

The bad news is that computing (p – 1)! is more work than testing whether p is prime.

Euler pseudoprimes

Let p be an odd prime and let b be a positive integer less than p. Then a theorem of Euler says that

b^{(p-1)/2} = \left( \frac{b}{p}\right) \bmod p

where the symbol on the right is the Legendre symbol. It looks like a fraction in paretheses, but that’s not what it means. The symbol is defined to be 1 if b is a square mod p and -1 otherwise. It’s an unfortunate but well-established bit of notation.

We could turn Euler’s theorem around and convert it into a primality test by saying that if a number p does not the congruence above, it cannot be prime.

There’s one complication with this: the Legendre symbol is only defined if p is prime. However, there is a generalization of the Legendre symbol, the Jacobi symbol, which is defined as long as the top argument is relatively prime to the bottom argument. Euler pseudoprimes are also known as Euler-Jacobi pseudoprimes.

The Jacobi symbol uses the same notation as the Legendre symbol, but there is no ambiguity because if the bottom argument is prime, the Jacobi symbol equal the Legendre symbol. More on Legendre and Jacobi symbols here.

An odd composite number n is a Euler pseudoprime base b if n and b are relatively prime and

b^{(n-1)/2} = \left( \frac{b}{n}\right) \bmod n

where the symbol on the right is now the Jacobi symbol.

There are efficient algorithms for computing Jacobi symbols, and so the criterion above is a practical way to prove a number is not prime. But there are Euler pseudoprimes that slip past this test. A small example is n = 9 and b = 10. Euler pseudoprimes are less common than Fermat pseudoprimes.

Other examples

There are many other examples: Lucas pseudoprimes, elliptic pseudoprimes, etc. Any theorem about prime numbers can be recast as a primality test, and the composite numbers that slip through labeled pseduoprimes relative to that test. The question is whether such a test is useful, i.e. efficient to compute and without too many false positives.

Finding large pseudoprimes

Fermat’s little theorem says that if p is a prime number, then for any integer b,

bp-1 = 1 (mod p).

This gives a necessary but not sufficient test for a number to be prime. A number that satisfies the equation above but is not prime is called a pseudoprime [1] for base b. For example, 341 is a pseudoprime base 2 because

2340 = 1 mod 341

and clearly 341 = 11*31.

How might you go about hunting for pseudoprimes? This page gives a way to find pseduoprimes base 2, quoting the following theorem.

If both numbers q and 2q – 1 are primes and n = q(2q-1) then 2n-1 = 1 (mod n) if and only if q is of the form 12k + 1.

Here’s Python code that will find pseudoprimes of a given size in bits.

    from secrets import randbits
    from sympy import isprime

    def find_pseudoprime(numbits):
        n = numbits // 2
        while True:
            q = randbits(n)    
            if isprime(q) and q%12 == 1 and isprime(2*q-1):
                return q*(2*q-1)

    print( find_pseudoprime(256) )

This returned

47362882341088596725068562696893704769436677460225591859092704246296157080253

in under a second.

Clearly the return value of find_pseudoprime is composite since it is computed as the product of two integers. You could confirm it returns a pseudoprime by computing

    pow(2, p-1, p)

on the return value to verify that it equals 1.

***

[1] Sometimes such a number is called a Fermat pseudoprime to distinguish it from other kinds of pseudo primes, i.e. numbers that satisfy other necessary conditions for being prime without actually being prime. For example, Perrin pseudoprimes. See the next post for more examples.

Technology à la carte

I ran across an article this morning about the booming market for low-tech tractors. Farmers are buying up old tractors at auction because the older tractors are cheaper to buy, cheaper to operate, and cheaper to repair.

The article opens with the story of Kris Folland, a Minnesota farmer who bought a 1979 John Deere tractor. The article says he “retrofitted it with automatic steering guided by satellite.” That’s the part of the article I wanted to comment on.

If you suggest that anything new isn’t better, someone will label you a Luddite. But Folland is not a not Luddite. He wants satellite technology, though he also wants a machine he can repair himself. He didn’t buy a 1979 tractor out of nostalgia. He bought it because it was an order of magnitude cheaper to own and operate.

Related posts

Estimating the proportion of smooth numbers

river rocks

A number is said to be “smooth” if all its prime factors are small. To make this precise, a number is said to be y-smooth if it only has prime factors less than or equal to y. So, for example, 1000 is 5-smooth.

The de Bruijn function φ(x, y) counts the number of y-smooth positive integers up to x, and so φ(x, y)/x is the probability that a number between 1 and x is y-smooth.

It turns out that

φ(x, x1/u)/x ≈ 1/uu.

This means, for example, that the proportion of numbers with less than 100 digits whose factors all have less than 20 digits is roughly 1/55 = 1/3125.

Source: The Joy of Factoring

Here’s a little Python code to experiment with this approximation. We’ll look at the proportion of 96-bit numbers whose largest prime factor has at most 32 bits.

    from secrets import randbits
    from sympy.ntheory.factor_ import smoothness

    smooth = 0
    for _ in range(100):
        x = randbits(96)
        s = smoothness(x)[0]
        if s < 2**32:
            smooth += 1
    print("Num small:", smooth)

The SymPy function smoothness returns a pair, and the first element of the pair is the largest prime dividing its argument.

In our case u = 3, and so we’d expect about 1/27 of our numbers to have no factors larger than 32 bits. I ran this five times, and got 8, 2, 5, 3, and 7. From the approximation above we’d expect results around 4, so our results are not surprising.

Sufficient statistic paradox

A sufficient statistic summarizes a set of data. If the data come from a known distribution with an unknown parameter, then the sufficient statistic carries as much information as the full data [0]. For example, given a set of n coin flips, the number of heads and the number of flips are sufficient statistics. More on sufficient statistics here.

There is a theorem by Koopman, Pitman, and Darmois that essentially says that useful sufficient statistics exist only if the data come from a class of probability distributions known as normal families [1]. This leads to what Persi Diaconis [2] labeled a paradox.

Exponential families are quite a restricted family of measuures. Modern statistics deals with far richer classes of probabilities. This suggests a kind of paradox. If statistics is to be of any real use it must provide ways of boiling down great masses of data to a few humanly interpretable numbers. The Koopman-Pitman-Darmois theorem suggests this is impossible unless nature follows highly specialized laws which no one really believes.

Diaconis suggests two ways around the paradox. First, the theorem he refers to concerns sufficient statistics of a fixed size; it doesn’t apply if the summary size varies with the data size. Second, and more importantly, the theorem says nothing about a summary containing approximately as much information as the full data.

***

[0] Students may naively take this to mean that all you need is sufficient statistics. No, it says that if you know the distribution the data come from then all you need is the sufficient statistics. You cannot test whether a model fits given sufficient statistics that assume the model. For example, mean and variance are sufficient statistics assuming data come from a normal distribution. But knowing only the mean and variance, you can’t assess whether a normal distribution model fits the data.

[1] This does not mean the data have to come from the normal distribution. The normal family of distributions includes the normal (Gaussian) distribution, but other distributions as well.

[2] Persi Diaconis. Sufficiency as Statistical Symmetry. Proceedings of the AMS Centennial Symposium. August 8–12, 1988.

Number of forms of a finite field

The number of elements in a finite field must be a prime power, and for every prime power there is only one finite field up to isomorphism.

The finite field with 256 elements, GF(28), is important in applications. From one perspective, there is only one such field. But there are a lot of different isomorphic representations of that field, and some are more efficient to work with that others.

Just how many ways are there to represent GF(28)? Someone with the handle joriki gave a clear answer on Stack Exchange:

There are 28−1 different non-zero vectors that you can map the first basis vector to. Then there are 28−2 different vectors that are linearly independent of that vector that you can map the second basis vector to, and so on. In step k, 2k-1 vectors are linear combinations of the basis vectors you already have, so 28−2k−1 aren’t, so the total number of different automorphisms is

\prod_{k=1}^8\left(2^8 - 2^{k-1} \right ) = 5348063769211699200

This argument can be extended to count the number of automorphism of any finite field.

More finite field posts