Things that work best when you don't notice them

Fonts, translations, and programming languages have one thing in common: they work best when you don’t notice them.

If someone says “Hey, look at this cool font I just found!” you probably wouldn’t want to read a book set in that font. At least to an untrained eye, a great font will not stand out in a list of small samples. You have to see large blocks of text set in a font to appreciate it. Even then, most people will not consciously appreciate a very readable font.

Translations are similar. If you find yourself saying “What an interesting translation!” then the translator has probably fallen down on the job. A good translation is neither archaic nor trendy. It does not draw attention to itself but allows you to focus on the original content. I believe the English Standard Version achieves that with Bible translation.

Python is like a good font or a good translation. For years I’d look into Python briefly when someone would recommend it. I’d thumb through a Python book, but it all looked rather plain. Only later did I come to appreciate that the beauty of Python is that it is rather plain. It doesn’t call attention to itself. It just gets out of your way and lets you write programs. It seems to me that compared to other programming language communities, the Python community brags less about their language per se and more about what they’re able to do with it.

How to solve quadratic congruences

Quadratic congruences are much more complex than linear congruences. If you’re interested in the details, see these notes on quadratic congruences.

When I started looking into this, I thought my number theory class years ago covered the details but that I’d forgotten them. Looking back, I see that my instructor and my textbook neglected some cases. Then I looked at a few other references and saw that they also left out some cases. I put together some notes by pulling together information from several incomplete but complementary references.

Syntax coloring for code samples in HTML

Syntax coloring makes it much easier to read source code, especially when you become accustomed to a particular color scheme. For example, I’m used to the default color scheme in Visual Studio: comments are green, keywords are blue, string literals are red, etc. Once you get used to color-coded source code, it’s hard to go back to black-and-white. However, the code samples here are monochrome and I’ve been thinking about doing something about that.

It’s possible to mark up code samples on the web just like any other chuck of HTML, but that would be time-consuming. Also, you want to leave code samples as plain text so readers can copy the source and paste it into their code. So what people usually do is use client-side JavaScript to change the way code samples are displayed in the browser. That way the code appears to be marked up but it’s still unadorned underneath.

I like the way code samples look on Scott Hanselman’s blog, and so I started to use the JavaScript solution that he recommends, SyntaxHighlighter. However, I decided to hold off when I read this comment on his blog.

I tried to use it. Unfortunately the scrolling errors I see on your and other sites are too much to bear.

More recently I heard that StackOverflow is using prettify.js to add syntax coloring for their code samples. I haven’t noticed any problems on that site, so I’m starting to try out prettify. I’ve heard that the script doesn’t always handle VB code correctly, but that doesn’t matter to me.

The prettify script is easy to use. You don’t have to tell it what programming language you’re highlighting. However, you do have the option of  specifying the language, which presumably can’t hurt. I’ve tried it on a page containing C++ code and on another page containing Python code and they both look OK. My only complaint is that the default color scheme is not what I would have chosen. However, the color scheme can be modified by editing a style sheet and I intend to do that.

I’m going to start by experimenting with static files on my site. I’m more cautious about incorporating syntax coloring with the blog since I don’t know what interaction problems there could be with the WordPress software on the server or with blog reader software on the clients. Scott Hanselman says on his blog

… I couldn’t find a syntax highlighting solution that worked in EVERY feed reader. There are lots of problems with online ones like Google Reader and BlogLines if I, as the publisher, try to get tricky with the CSS.

So I may leave the code samples on the blog alone. Also, I’m also not sure what I’ll do about PowerShell samples. The prettify script works well with C-family languages, but PowerShell syntax may be too far afield from what it expects.

Does anyone have suggestions in general? For PowerShell in particular?

Constructive proof of the Chinese Remainder Theorem

The Chinese Remainder Theorem (CRT) is a tool for solving problems involving modular arithmetic. The theorem is called the “Chinese” remainder theorem because the Chinese mathematician Sun Tsu stated a special case of the theorem sometime between 280 and 473 A.D. In its simplest form, the theorem says that if you want to solve a congruence (mod α β), you can solve it (mod α) and (mod β) separately, then combine the solutions to these sub-problems into a solution to the original problem, provided α and β are relatively prime.

More generally, the CRT says that the system of equations xai (mod mi) has a common solution if the numbers m1, m2, …, mr are relatively prime in pairs (i.e. mi and mj have no factors in common if ij). The solution is unique mod m where m is the product of the mi‘s. In a typical application, you may start with a problem stated (mod m) and then factor m into prime powers. Each of these prime powers is then an mi in the CRT. For example, to solve a congruence (mod 1400), you might set m1 = 8, m2 = 25, m3 = 7.

You can find an existence proof of the Chinese Remainder Theorem in any number theory book. Here I present an explicit solution given in Donald Knuth’s book Seminumerical Algorithms. For other algorithms and much more information on the CRT, see the book Chinese Remainder Theorem: Applications in Computing, Coding, and Cryptography. (In case it seems “coding” and “cryptography” are redundant, the book uses “coding” to refer to error-correcting codes, such as the way music is encoded on a CD, not secret codes, which would fall under cryptography.)

First we find numbers yi such that yi ≡ 1 (mod mi) and yi ≡ 0 (mod mj) for i j. Then the sum a1 y1 + a2 y2 + … + ar yr is the solution we’re looking for.

But how to find the yi? We can let yi = (m/mi)φ(mi).

But what is φ and why does this work? The function φ is called the Euler phi function. For a positive integer n, φ(n) is the number of positive integers less than n and relatively prime to n. (I’ll explain how to compute it in a minute.) Euler proved that if two numbers a and n are relatively prime, then aφ(n) ≡ 1 (mod n). Because m/mi is relatively prime to mi, yi ≡ 1 (mod mi). For ji, m/mi is a multiple of mj and so yi ≡ 0 (mod mj).

So how do you compute φ(n)? This is easy once you know two facts about this function. First, if α and β are relatively prime, then φ(α β) = φ(α) φ(β). Second, for a prime p, φ(p) = p-1 and φ(pk) = pkpk-1 for k > 1. Putting these facts together, you can easily calculate φ(n) once you have its factorization into prime powers.

The solution given here is not be the fastest way to compute a solution to the problem posed by the CRT, but it is interesting because it’s explicit. On the other hand, it’s not terribly inefficient. The term (m/mi)φ(mi) can be calculated using no more than 2 log2φ(mi) multiplications using fast exponentiation.

Related posts

Probability of semantic markup being correct

In a comment on my post on RDFa, Daniel Lemire says

The basic problem is that RDF is metadata. … The problem with metadata is that it is wrong, misleading, too general, too specific… you name it… there is never a good match between the metadata and the user query.

I want to focus on the probability of metadata being correct, whether or not it’s useful. I think that’s an interesting angle, one I’ve not heard much about.

Think about the analogy to comments in source code. Well-written comments can be a blessing to the person who inherits the code. But comments are so often wrong that it’s best to read them skeptically. They may start out as true and helpful statements but they often get out of sync with the code they document. If you want to be sure what the code is doing, you’ve got to read the code.

I could imagine a similar state of affairs for HTML pages marked up with RDF(a). Particularly if the metadata is manually added, it could easily get out of sync with the content. Metadata generated by web authoring tools would be more trustworthy, or at least would start out more trustworthy. It’s easy to imagine a page initially created by a tool then subsequently edited by hand.

Returning to software comments, the probability of error increases with the distance between the comment and the code it documents. Comments in external documents are almost certainly wrong. Comments in headers are likely wrong. But inline comments are more likely to start out correct and remain correct over time. I imagine RDFa might be more reliable than other ways of adding metadata just because it gives you a way to embed the metadata closer to the content it describes.

Fast exponentiation

How do you efficiently compute an for integer n? You could multiply a*a*a*…*a, n-1 multiplications, but there are much more efficient ways. For example, a8 could be computed by ((a2)2)2 in three multiplications. This example suggests it may be worthwhile to think about the binary representation of the exponent.

Basic algorithm

Here’s an algorithm. Write the exponent n in binary. Read the binary representation from left to right, starting with the second bit from the left. Start with the number a, and every time you read a 0 bit, square what you’ve got. Every time you read a 1 bit, square what you’ve got and multiply by a. It follows that an can be computed using no more than 2 log2(n) multiplications.

As an example, suppose we want to calculate 713. In binary, 13 is written as 1101. We skip over the leading 1 then read 1, 0, and 1. So we calculate ((((72)7)2)2)7. In C notation, we would compute as follows.

        x = 7;
        x *= x;
        x *= 7;
        x *= x;
        x *= x;
        x *= 7;

Modular exponents

In number theory calculations, such as arise in cryptography, it’s often necessary to compute an (mod m) for very large integers a, n, and m. These numbers may be hundreds or thousands of digits long and so of course the calculations cannot be carried out using ordinary machine integers. You could compute an first then reduce the result (mod m), but it would be far more efficient to reduce (mod m) after every multiplication. This puts an upper bound on the size of numbers (and hence the amount of memory) needed for the calculation.

Matrix exponents

The same algorithm works when a is not an integer but a matrix. For example, consider a large matrix M that gives the connections between nodes in a graph. M contains a 1 in position (i, j) if the ith and jth nodes are connected by an edge and contains a 0 otherwise. The nth power of M tells which nodes are connected by paths of length less than or equal to n. If M is large, we don’t want to do unnecessary multiplications in computing Mn.

Special cases

For some exponents, the method given above can be improved. The smallest exponent for which you can improve on the binary algorithm is 15. Using the binary algorithm, computing a15 requires six multiplications. But you could compute the same quantity in five multiplications as follows.

        b = a*a*a;
        c = b*b;
        c *= c;
        c *= b;

For more examples, see Seminumerical Algorithms section 4.6.3.

How to solve linear congruences

A linear congruence is the problem of finding an integer x satisfying

axb (mod m)

for specified integers a, b, and m. This problem could be restated as finding x such that

  1. the remainder when ax is divided by m is b,
  2. a*x % m == b in C code,
  3. axb is divisible by m, or
  4. in base m, ax ends in b.

Two solutions are considered the same if they differ by a multiple of m. (It’s easy to see that x is a solution if and only if x + km is a solution for all integers k.)

For example, we may want to solve 7x ≡ 3 (mod 10). It turns out x = 9 will do, and in fact that is the only solution. However, linear congruences don’t always have a unique solution. For example, 8x ≡ 3 (mod 10) has no solution; 8x is always an even integer and so it can never end in 3 in base 10. For another example, 8x ≡ 2 (mod 10) has two solutions, x = 4 and x = 9.

In this post, we answer two questions.

  1. How many solutions does axb (mod m) have?
  2. How can we compute them?

The answer to the first question depends on the greatest common divisor of a and m. Let g = gcd(a, m). If b is not divisible by g, there are no solutions. If b is divisible by g, there are g solutions.

So if g does divide b and there are solutions, how do we find them? The brute force solution would be to try each of the numbers 0, 1, 2, …, m-1 and keep track of the ones that work. That works in theory, but it is impractical for large m. Cryptography applications, for example, require solving congruences where m is extremely large and brute force solutions are impossible.

First, suppose a and m are relatively prime. That is, assume g = gcd(a, m) = 1. We first put the congruence axb (mod m) in a standard form. We assume a > 0. If not, replace axb (mod m) with –ax ≡ –b (mod m). Also, we assume a < m. If not, subtract multiples of m from a until a < m.

Now solve my ≡ –b (mod a). This is progress because this new problem is solving a congruence with a smaller modulus since a < m. If y solves this new congruence, then x = (my + b)/a solves the original congruence. We can repeat this process recursively until we get to a congruence that is trivial to solve. The algorithm can be formalized into a procedure suitable for programming. The result is closely related to the Euclidean algorithm.

Now what if the numbers a and m are not relatively prime? Then first solve the congruence (a/g)y ≡ (b/g) (mod (m/g)) using the algorithm above. Then the solutions to axb (mod m) are x = y + tm/g where t = 0, 1, 2, …, g-1.

Here are a couple examples.

First, let’s solve 7x ≡ 13 (mod 100). Since 7 and 100 are relatively prime, there is a unique solution. The algorithm says we should solve 100y ≡ -13(mod 7). Since 100 ≡ 2 (mod 7) and -13 ≡ 1 (mod 7), this problem reduces to solving 2y ≡ 1 (mod 7), which is small enough to solve by simply sticking in numbers. We find y = 4. Then x = (100*4 + 13)/7 = 59. You can verify that 7*59 = 413 so 7*59 ≡ 13 (mod 100). In general, we may have to apply the algorithm multiple times until we get down to a problem small enough to solve easily.

Now let’s find all solutions to 50x ≡ 65 (mod 105). Since gcd(50, 105) = 5 and 65 is divisible by 5, there are 5 solutions. So we first solve 10x ≡ 13 (mod 21). The algorithm above says we can solve this by first solving 21y ≡ -13 (mod 10), which reduces immediately to y ≡ 7 (mod 10), and so we take y = 7. This says we can take x = (105*7 + 65)/50 = 16. The complete set of solutions to our original congruence can be found by adding multiples of 105/5 = 21. So the solutions are 16, 37, 58, 79, and 100.

I intend to write posts in the future about how to solve simultaneous systems of linear congruences and how to solve quadratic congruences.

Update: Here are the posts I intended to write: systems of congruences, quadratic congruences.

Related: Applied number theory

RDFa

Phil Windley had a recent interview with Elias Torres and Ben Adida on RDFa. This is an emerging standard for adding semantic information to HTML documents. The “a” in RDFa stands for attributes. Rather than creating new documents, RDFa allows you to add RDF-like semantic information to your existing HTML pages via element attributes. A great deal of thought has gone into how to accomplish this without unwanted side effects such as changing how pages are rendered.

As I listened to the interview, I tried to think of how I would apply this to my websites. There are standardized vocabularies for expressing friendship relationships, address and calendar information, audio file meta data, etc. However, little of this is relevant to my websites. I’m ready to jump on the RDFa bandwagon once there are standard vocabularies more applicable to the kind of content I publish.

To learn more about RDFa, listen to Phil Windley’s podcast or read the RDFa primer.