Advantages of redundant coordinates

Since you can describe a point in the plane with two numbers, why would you choose to use three numbers? Why would you ever want to use a coordinate system with more coordinates than necessary?

Barycentric coordinates

One way to indicate the location of a point inside a triangle is to give the distance to each of the vertices. These three distances are called barycentric coordinates. Why would you use three numbers when two would do?

Barycentric coordinates make some things much simpler. For example, the coordinates of the three vertices are (1, 0, 0), (0, 1, 0), and (0, 0, 1) for any triangle. The points inside are written as convex combinations of the vertices. The coordinates of the center of mass, the barycenter, are (1/3, 1/3, 1/3). The vertices are treated symmetrically, even if the triangle is far from symmetric.

Barycentric coordinates are useful in applications, such as computer graphics and finite element analysis, because they are relative coordinates. When a triangle moves or is rescaled, you only need to keep track of where the vertices went; the coordinates of the points inside relative to the vertices haven’t changed.

This can be generalized to more dimensions. For example, you could describe a point in a tetrahedron with four coordinates, more in higher dimensions you could describe a point in an n-simplex by the convex combination coefficients of the n vertices.

Barycentric coordinates are related to Dirichlet probability distributions. When you have n probabilities that sum to 1, you’ve got n-1 degrees of freedom. But it often simplifies things to work with n variables. As with the discussion of triangles above, the extra variable makes expressions more symmetric.

Quaternions and rotations

A point in three dimensional space can be described with three numbers, but it’s often useful to think of the usual three coordinates as the vector part of a quaternion, a set of four numbers.

Suppose you have a point

a = (x, y, z)

and you want to rotate it by an angle θ around an axis given by a unit vector

b = (u, v, w).

You can compute the rotation by associating the point a with the quaternion

p = (0, x, y, z)

and the axis b with the quaternion

q = (cos(θ/2), sin(θ/2) u, sin(θ/2) v, sin(θ/2) w)

The image of a is then given by the quaternion

q p q-1.

This quaternion will have zero real part, and so the Euclidean coordinates are given by the vector part, the last three components.

Of course the product above is a quaternion product, which is not commutative. That’s why the q and the q-1 don’t cancel out.

Using quaternions for rotations has several advantages over using rotation matrices. First, the quaternion representation is more compact, describing a rotation with four real numbers rather than nine. Second, the quaternion calculation can be better behaved numerically. But most importantly, the quaternion approach avoids gimbal lock, a sort of singularity in representing rotations.

Projective planes

In applications of algebra, such as elliptic curve cryptography, you often need to add “points at infinity” to make things work out. To formalize this, you add an extra coordinate. So while an elliptic curve is usually presented as an equation such as

y² = x³ + ax + b,

it’s more formally an equation in three variables

y²z = x³ + axz² + bz³.

Points in the projective plane have coordinates (x, y, z) where points are considered equivalent if they differ by a non-zero multiple, i.e. (x, y, z) is considered the same point as (λx, λy, λz) for any non-zero λ.

You can often ignore the z, choosing λ so that the z coordinate is 1. But when you need to work with the point at infinity in a uniform way, you bring out the full coordinates. Now the “point at infinity” is not some mysterious entity, but simply the point (0, 1, 0).

Common themes

Projective coordinates, like barycentric coordinates, introduce symmetry. With the addition of an extra coordinate, the three coordinates all behave similarly, with no reason to distinguish any coordinate as special. And as with quanternion rotations, projective coordinates make singularities go away, which is consequence of symmetry.

Related posts

Determining fundamental frequency

My daughter had a homework problem the other day that gave the frequencies of several Fourier components and asked her to find the fundamental frequency. The numbers were nice enough that brute force worked, and I’m sure that’s what students were expected to do. But this could easily be a much more sophisticated problem.

If the frequencies are all integers and exact multiples of a fundamental frequency, you can simply take the greatest common divisor of the frequencies. If you’re told the frequencies are 1760, 2200, and 3080, then the fundamental frequency is apparently 440 since that’s the greatest common divisor.

But what if the data are a little different? Say the highest pitch is 3081. Surely 440 should still be considered the fundamental frequency, even though now the greatest common divisor of the frequencies would be 1 Hz. What if the highest frequency was 3078 + π? Surely the fundamental frequency is still 440 for practical purposes.

And what might these practical purposes be? One purpose might be pitch detection. When several frequencies are combined that are small integer multiples of a fundamental frequency, we perceive the combination as having pitch given by that fundamental.

For something like a guitar string, the frequency components are close to small integer multiples of a fundamental frequency. But for something like a church bell, the frequencies don’t line up so neatly, though there’s still a clearly perceived pitch. For something like a metal mixing bowl, it may be difficulty to predict what pitch a person will hear when something strikes the bowl.

One complication we haven’t addressed yet is that the fundamental frequency will not be unique without some constraint. In the example above, the frequencies were all multiples of 440, but they’re also all multiples of 440/n for every positive integer n. We might get around this by specifying some lower bound on the fundamental frequency. Or we could say that all other things being equal, we want the largest candidate for the fundamental frequency.

We could formulate the problem of finding the fundamental frequency as an optimization problem. For example, we could form a mixed integer program. Suppose we have three frequencies f1, f2, and f3. We could find a fundamental frequency f and integers n1, n2, and n3 that minimize

(f1n1 f)² + (f2n2 f)² + (f3n3 f

subject to a lower bound on f.

We can eliminate the explicit dependence on the integer coefficients by minimizing

(f1/f – [f1/f])² + (f2/f – [f2/f])² + (f3/f – [f3/f])² .

where [x] denotes nearest integer to x. The first formulation has a more common form. The latter has a more complicated objective function, but it’s only a function of one variable.

Here’s what the latter looks like for frequencies 1760, 2200, and 3080.

objective function

Clearly there’s a minimum at 440 Hz.

Here’s the same plot with 10% random noise [1] added to each frequency: 1701, 2368, and 3339.

objective function

Now there’s a minimum near 336, but the local minimum at 566 is nearly as good.

Related posts

[1] There are a couple reasons you might want to solve a problem like this. Maybe your frequencies really are integer multiples of a fundamental frequency, but there is measurement error. Another is that the frequencies are not exactly multiples of a fundamental, as when striking a bell or a mixing bowl. How might you formulate the two cases differently?

Illustrating Cayley-Hamilton with Python

If you take a square matrix M, subtract x from the elements on the diagonal, and take the determinant, you get a polynomial in x called the characteristic polynomial of M. For example, let

M = \left[ \begin{matrix} 5 & -2 \\ 1 & \phantom{-}2 \end{matrix} \right]

Then

\left| \begin{matrix} 5-x & -2 \\ 1 & 2-x \end{matrix} \right| = x^2 - 7x + 12

The characteristic equation is the equation that sets the characteristic polynomial to zero. The roots of this polynomial are eigenvalues of the matrix.

The Cayley-Hamilton theorem says that if you take the original matrix and stick it into the polynomial, you’ll get the zero matrix.

\left[ \begin{matrix} 5 & -2 \\ 1 & \phantom{-}2 \end{matrix} \right]^2 - 7\left[ \begin{matrix} 5 & -2 \\ 1 & \phantom{-}2 \end{matrix} \right] + 12\left[ \begin{matrix} 1 & 0 \\ 0 & 1\end{matrix} \right] = \left[ \begin{matrix} 0 & 0 \\ 0 & 0\end{matrix} \right]

In brief, a matrix satisfies its own characteristic equation. Note that for this to hold we interpret constants, like 12 and 0, as corresponding multiples of the identity matrix.

You could verify the Cayley-Hamilton theorem in Python using scipy.linalg.funm to compute a polynomial function of a matrix.

>>> from scipy import array
>>> from scipy.linalg import funm
>>> m = array([[5, -2], [1, 2]])
>>> funm(m, lambda x: x**2 - 7*x + 12)

This returns a zero matrix.

I imagine funm is factoring M into something like PDP-1 where D is a diagonal matrix. Then

f(M) = P f(D) P-1.

This is because f can be applied to a diagonal matrix by simply applying f to each diagonal entry independently. You could use this to prove the Cayley-Hamilton theorem for diagonalizable matrices.

Related posts

Why can’t grep find negative numbers?

Suppose you’re looking for instances of -42 in a file foo.txt. The command

    grep -42 foo.txt

won’t work. Instead you’ll get a warning message like the following.

    Usage: grep [OPTION]... PATTERN [FILE]...
    Try 'grep --help' for more information.

Putting single or double quotes around -42 won’t help. The problem is that grep interprets 42 as a command line option, and doesn’t have such an option. This is a problem if you’re searching for negative numbers, or any pattern that beings with a dash, such as -able or --version.

The solution is to put -e in front of a regular expression containing a dash. That tells grep that the next token at the command line is a regular expression, not a command line option. So

    grep -e -42 foo.txt

will work.

You can also use -e several times to give grep several regular expressions to search for. For example,

    grep -e cat -e dog foo.txt

will search for “cat” and “dog.”

See the previous post for another example of where grep doesn’t seem to work. By default grep supports a restricted regular expression syntax and may need to be told to use “extended” regular expressions.

Why doesn’t grep work?

If you learned regular expressions by using a programming language like Perl or Python, you may be surprised when tools like grep seem broken. That’s because what you think of as simply regular expressions, these tools consider extended regular expressions. Tell them to search on extended regular expressions and some of your frustration will go away.

As an example, we’ll revisit a post I wrote a while back about searching for ICD-9 and ICD-10 codes with regular expressions. From that post:

Most ICD-9 diagnosis codes are just numbers, but they may also start with E or V. Numeric ICD-9 codes are at least three digits. Optionally there may be a decimal followed by one of two more digits. … Sometimes the decimals are left out.

Let’s start with the following regular expression.

    [0-9]{3}\.?[0-9]{0,2}

This says to look for three instances of the digits 0 through 9, optionally followed by a literal period, followed by zero, one, or two more digits. (Since . is a special character in regular expressions, we have to use a backslash to literally match a period.)

The regular expression above will work with Perl or Python, but not with grep or sed by default. That’s because it uses two features of extended regular expressions (ERE), but programs like grep and sed support basic regular expressions (BRE) by default.

Basic regular expressions would use \{3\} rather than {3} to match a pattern three times. So, for example,

   echo 123 | grep "[0-9]\{3\}"

would return 123, but

   echo 123 | grep "[0-9]{3}"

would return nothing.

Similarly,

    echo 123 | sed -n "/[0-9]\{3\}/p"

would return 123 but

    echo 123 | sed -n "/[0-9]{3}/p"

returns nothing.

(The -n option to sed tells it not to print every line by default. The p following the regular expression tells sed to print those lines that match the pattern. Here there’s only one line, the output of echo, but typically grep and sed would be use on files with multiple lines.)

Turning on ERE support

You can tell grep and sed that you want to use extended regular expressions by giving either one the -E option. So, for example, both

   echo 123 | grep -E "[0-9]{3}"

and

    echo 123 | sed -E -n "/[0-9]{3}/p"

will print 123.

You can use egrep as a synonym for grep -E, at least with Gnu implementations.

Incidentally, awk uses extended regular expressions, and so

    echo 123 | awk "/[0-9]{3}/"

will also print 123.

Going back to our full regular expression, using \.? for an optional period works with grep and sed if we ask for ERE support. The following commands all print 123.4.

    echo 123.4 | grep -E "[0-9]{3}\.?[0-9]{0,2}"
    echo 123.4 | sed -E -n "/[0-9]{3}\.?[0-9]{0,2}/p"
    echo 123.4 | awk "/[0-9]{3}\.[0-9]{0,2}/"

Without the -E option, grep and sed will not return a match.

This doesn’t fix everything

At the top of the post I said that if you tell tools you want extended regular expression support “some of your frustration will go away.” The regular expression from my ICD code post was actually

    \d{3}\.?\d{0,2}

rather than

    [0-9]{3}\.?[0-9]{0,2}

I used the shortcut \d to denote a digit. Python, Perl, and Awk will understand this, but grep will not, even with the -E option.

grep will understand \d if instead you use the -P option, telling it you want to use Perl-compatible regular expressions (PCRE). The Gnu version of grep supports this option, but the man page says “This is experimental and grep -P may warn of unimplemented features.” I don’t know whether other implementations of grep support PCRE. And sed does not have an option to support PCRE.

Related

Top cryptography posts of 2019

Toward the end of each year I write a post or two listing the most popular posts by category. This year the categories will be a little different. I’ll start by listing my most popular posts about cryptography this year.

The next categories will be command line tools, privacy, and math.

(When I wrote this, I started with crypto because I didn’t think I’d write any more posts on the topic. The the announcement about RSA-240 came out and so I wrote something about it yesterday.)

New RSA factoring challenge solved

How hard is it to factor large numbers? And how secure are encryption methods based on the difficulty of factoring large numbers?

The RSA factoring challenges were set up to address these questions. Last year RSA-230 was factored, and this week RSA-240 was factored. This is a 240 digit (795 bit) number, the product of two primes.

Researchers solved two related problems at the same time, factoring RSA-240 and solving a discrete logarithm problem. Together these problems took about 4,000 core-years to solve. It’s not clear from the announcement how long it would have taken just to factor RSA-240 alone.

If you were to rent the computing power used, I imagine the cost would be somewhere in the six figures.

This makes 2048-bit and 3072-bit RSA keys look very conservative. However, the weakest link in RSA encryption is implementation flaws, not the ability to factor big numbers.

Assume for a moment that breaking RSA encryption requires factoring keys. (This may not be true in theory [*] or in practice.) How long would it take to factor a 2048 or 3072 bit key?

The time required to factor a number n using the number field sieve is proportional to

\exp\left( \left(\sqrt[3]{\frac{64}{9}} + o(1)\right)(\ln n)^{\frac{1}{3}}(\ln \ln n)^{\frac{2}{3}}\right)

Here o(1) roughly means terms that go away as n gets larger. (More on the notation here.) For simplicity we’ll assume we can ignore these terms.

This suggests that factoring a 2048-bit key is 12 orders of magnitude harder than factoring RSA-240, and that factoring a 3072-bit key is 18 orders of magnitude harder.

However, I don’t think anyone believes that breaking RSA with 2048-bit keys would require a quadrillion core-years. If the NSA believed this, they wouldn’t be recommending that everyone move to 3072-bit keys.

Why such a large discrepancy? Here are a few reasons. As mentioned above, RSA encryption often has exploitable implementation flaws. And even if implemented perfectly, there is no proof that breaking RSA encryption is as hard as factoring. And there could be breakthroughs in factoring algorithms. And large-scale quantum computers may become practical, in which case factoring would become much easier.

***

[*] Factoring is sufficient to break RSA, but there’s no proof that it’s necessary. Michael Rabin’s variation on RSA is provably as hard to break as factoring: decryption would enable you to factor the key. But as far as I know, Rabin’s method isn’t used anywhere. Even if you know your method is as hard as factoring, maybe factoring isn’t as hard as it seems. Lower bounds on computational difficulty are much harder to obtain than upper bounds.

Distracted by the hard part

Last night I was helping my daughter with calculus homework. I told her that a common mistake was to forget what the original problem was after getting absorbed in sub-problems that have to be solved. I saw this over and over when I taught college.

Then a few minutes later, we both did exactly what I warned her against. She took the answer to a difficult sub-problem to be the final answer. I checked her work and confirmed that it was correct, until I saw we hadn’t actually answered the original question.

As I was waking up this morning, I realized I was about to make the same mistake on a client’s project. The goal was to write software to implement a function f which is a trivial composition of two other functions g and h. These two functions took a lot of work, including a couple levels of code generation. I felt I was done after testing g and h, but I forgot to write tests for f, the very thing I was asked to deliver.

This is a common pattern that goes beyond calculus homework and software development. It’s why checklists are so valuable. We resist checklists because they insult our intelligence, and yet they greatly reduce errors. Experienced people in every field can skip a step, most likely a simple step, without some structure to help them keep track.

Related posts

Data Science and Star Science

I recently got a review copy of Statistics, Data Mining, and Machine Learning in Astronomy. I’m sure the book is especially useful to astronomers, but those of us who are not astronomers could use it as a survey of data analysis techniques, especially using Python tools, where all the examples happen to come from astronomy. It covers a lot of ground and is pleasant to read.