I am not an operating system

I thought Apple’s I’m a Mac ad campaign was amusing, but I’m not impressed with Microsoft’s belated I’m a PC response. When Apple claimed that cool people use Macs, it was lame to respond that some PC users are cool too.

PC users don’t identify with their operating system the way Mac users do. PC users don’t put Windows logos on cars, much less on their bodies, for example. (Maybe some do, but they’re less common.)

This was inevitable, and it has little to do with Apple and Microsoft per se. In any industry, when one vendor has a 95% market share, they’re forced into a position being all things to all people and making a lot of people moderately happy, or at least not unhappy. And when another vendor has a 5% market share, they have to make that 5% very happy. The minority vendor can be hip and scrappy; the majority vendor cannot.

Maybe a better response to “I’m a Mac” would have been “I’m not an operating system,” poking fun at people who identify strongly with their computer.

What it takes to make Paint.NET easy to install

Writing good installation programs is hard. It takes experience and forethought to imagine all the things that might happen on the client’s computer. Top notch programmers know that installers are critical to a user’s experience and put a lot of time into making them high quality. Not so top notch programmers think a project is done when they can get it to work on their computer.

Scott Hanselman has an interview with Paint.NET creator Rick Brewster. The main topic of the interview was the things Rick did to make Paint.NET easy to install.

Paint.NET is an image editing product written using Microsoft’s .NET framework. In a sense Paint.NET is the geometric mean between the Paint program that ships with Windows and Adope Photoshop: it does about 20x more than Paint and about 20x less than Photoshop. It does most of the things I need, but it’s small enough that I can do a brute force search of the features if I have to.

The differences between bias, consistency, and efficiency

Sometimes code is easier to understand than abstract math. A few days ago I was having a hard time conveying bias, consistency, and efficiency in a statistics class. Writing some pseudo-code on the board seemed to help clear things up. Loops and calls to random number generation routines are more tangible than discussions about random samples.

Later I turned the pseudo-code into Python code (after all, Python is supposed to be “executable pseudo-code”) and fancied it up a bit. The following page gives some explanation, some plots of the output, and the source code.

The difference between an unbiased estimator and a consistent estimator

How many numbers are squares mod m

In a previous post, fast way to tell whether a number is a square, the question came up of how many integers are squares modulo an integer m. That is, for how many numbers y in the list 0, 1, 2, …, m-1 does the equation x2y (mod m) have a solution. The application in the earlier post was only concerned with m equal to a power of 2, but it turns out there is a completely general solution.

Michael Lugo pointed to a paper (*) that gives a solution for all values of m. I will summarize the paper here and give a Python script implementing the solution.

The paper defines the function s(n) as the number of distinct squares modulo n. The first observation is that s(n) is a multiplicative function. That means that if m and n are relatively prime, s(mn) = s(m) s(n). Since every integer can be factored into prime powers, knowing s() is multiplicative reduces the problem to computing s(pn) for primes p. The paper then shows how to compute s() for powers of odd primes and powers of 2. The full solution is summarized in the code below.

Assume we have factored our number m. For example, suppose we want to know how many possible last three digits there are for squares base 10. That is equivalent to computing s(1000), and so we factor 1000 into 23 53. We represent that factorization as a list of pairs: [ (2, 3), (5, 3) ]. The function squares below tells us that s(1000) is 159.

def isOdd( n ):
    return n%2

def s( factor ):
    (p, n) = factor
    if isOdd( p ):
        if n == 1:
            r = (p+1)/2
        elif n == 2:
            r = (p*p - p + 2)/2
        elif isOdd( n ):
            r = (p**(n+1) + 2*p + 1)/(2*p + 2)
        else:
            r = (p**(n+1) + p + 2)/(2*p + 2)
    else:
        if isOdd( n ):
            r = (2**(n-1) + 5)/3
        else:
            r = (2**(n-1) + 4)/3

return r

def squares( factorization_list ):
    product = 1
    for factor in factorization_list:
        product *= s(factor);
    return product

# test cases to exercise each branch
assert squares( [(5, 1)] ) == 3
assert squares( [(5, 2)] ) == 11
assert squares( [(5, 3)] ) == 53
assert squares( [(5, 4)] ) == 261
assert squares( [(2, 3)] ) == 3
assert squares( [(2, 4)] ) == 4
assert squares( [(2,3), (5,3)] ) == 159

The importance of this result to the original problem is that looking at the last several bits of a number can screen out no more than 5/6 of the non-squares. Looking at the last four bits, i.e. the last hexadecimal digit, screens 3/4 of the non-squares, so there’s not room to do much better.

Update: So about 1 out of every 6 integers is a square mod 2n for large n. What about mod pn for odd prime p? The corresponding ratio is 1/2.

Update (21 April 2010): Fixed a couple typos.  Added syntax coloring to the source code. See an implementation of the algorithm in J by Tracy Harms.

* W. D. Stangl, “Counting Squares in Z_n”, Mathematics Magazine, pp. 285-289, Vol. 69 No. 4 October 1996.

Favorite audio book narrators

I have a long commute to work and so I’ve listened to a lot of recorded books. Here are my two favorite narrators for audio books.

John McDonough, narrator of the Mitford series by Jan Karon.

Rob Inglis, narrator of J. R. R. Tolkien’s books The Hobbit and The Lord of the Rings trilogy.

Both gentlemen are pleasant to listen to. They both develop unique voices for a large cast of characters, making it easier to follow their novels while driving.

Five criticisms of significance testing

The follow list summarizes five criticisms of significance testing as it is commonly practiced.

Related posts

Creativity for left-brained people

I’ve written a fair amount about creativity on this blog, though I’ve written much more about other topics. I created a Squidoo lens to pull together some of my creativity posts. Maybe some people will enjoy that page who would otherwise be put off by some of the more technical posts here. [Update: apparently the Squidoo lens has gone away.]

The Squidoo page organizes posts into three categories: creativity, simplicity and complexity, and innovation. (*) There is a brief description for each link in each category, usually a single sentence.

In case you haven’t heard of Squidoo, it’s a project started by Seth Godin that encourages people to write about narrow topics they enjoy. One of their slogans is that everyone’s an expert at something. It’s fun to browse around and see what obscure topics people have written lenses for. They’re up to over 700,000 pages.

(*) Three cheers for the Oxford comma.

Fast way to test whether a number is a square

A question came up on StackOverflow today regarding testing whether integers were perfect squares. The person asking the question said his first idea was to take the (floating point) square root, round the result to the nearest integer, square that, and see whether he got the original number back.  But he thought maybe there was a faster way that avoided floating point arithmetic.

My suggestion was to first look at the number in hexadecimal (base 16) and see whether the number could be ruled out as a square before proceeding.  Squares in base 16 end in 0, 1, 4, or 9. (You can see this by brute force: square every number from 0 to 15 and take the remainder dividing by 16.) You can find the end of the number in hex by a fast bit operation, then rule out 12 out of every 16 numbers. Then do the square root and square procedure on the 4 out of 16 numbers that slip through the hexadecimal filter.  Here’s C++ code for the algorithm.

int PerfectSquare(int n)
{
    int h = n & 0xF; // last hexadecimal "digit"
    if (h > 9)
        return 0; // return immediately in 6 cases out of 16.

    // Take advantage of Boolean short-circuit evaluation
    if ( h != 2 && h != 3 && h != 5 && h != 6 && h != 7 && h != 8 )
    {
        // take square root if you must
        int t = (int) floor( sqrt((double) n) + 0.5 );
            return t*t == n;
    }
    return 0;
}

This algorithm ran about twice as fast as always taking the square root. The analogous C# code also ran about twice as fast as the more direct method.

What about looking at other bases? We want to use bases that are powers of 2 so we can get the last “digit” quickly. Half the numbers in base 4 are potential squares. Three out of eight numbers in base 8 are potential squares. Four out of 16 are potential squares in base 16. So taking smaller powers of 2 is probably not faster. Seven out of 32 numbers are potential squares base 32, a slightly lower ratio than base 16, but at the cost of more comparisons. I haven’t benchmarked using bases other than 16, but I doubt they are faster. If any are faster, I imagine the difference is small.

Here are a couple number theory problem that comes out of the problem above. First, how many numbers have square roots mod 2n, i.e. for how many values y does x2y (mod 2n) have a solution? Call that number g(n). For example, g(3) = 3, g(4) = 4, g(5) = 7, g(6) = 12. Is there a simple formula for g(n)? Second, what is the minimum value of g(n)/2n?

Update: See Michael Lugo’s solution to the number theory questions in the comments below or more discussion here.

Update: According to the StackOverflow discussion, the base 64 technique was a little faster than the base 16 technique, at least when implemented in C#. The relative speeds of variations on the algorithm depend on what language you’re using.

Related: Applied number theory

Origin of "statistically significant"

The words signify and signal come from the Latin word signus for sign. The phrase “statistically significant” was coined to indicate the existence of a statistical signal that an effect was present. The word significant in this context is not meant to imply that an effect is large or important. David Salsburg elaborates on this in his book The Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth Century (ISBN 0805071342).

The word was used in its late-nineteenth-century meaning, which is simply that the computation signified or showed something. As the English language entered the twentieth century, the word significant began to take on other meanings, until it took on its current meaning, implying something very important. … Unfortunately, those who use statistical analysis often treat a significant test statistic as implying something closer to the modern meaning of the word.

An effect is statistically significant if it is considered unlikely to be a coincidence. An effect can be highly significant (very unlikely to be a coincidence) and yet be very small. Also, calling an effect “statistically significant” does not imply that anyone necessarily thinks the effect is consequential.

Here’s an example from astronomy. Newtonian mechanics describes the motions of the planets pretty well, but not perfectly. There is a measurable contribution from relativistic effects when studying the orbit of Mercury. The effect of relativity in this case would highly significant statistically in an experiment conducted carefully enough to measure the effect. I don’t know whether the effect is significant in the sense that NASA would need to take it into consideration when sending a probe to Mercury; it may not be. I imagine the contributions of relativity to describing the orbits of the rest of the planets are not significant to NASA engineers though they would be statistically significant if they could be measured.

10,000 spam comments blocked

The spam filter on my blog has now blocked over 10,000 comments since I installed it. As far as I know, there have only been a couple false positives (legitimate messages flagged as spam). Occasionally I’ll skim the blocked comments for false positives, but I usually delete them all without looking.

So far the number of false negatives (spam comments that slip through the filter) has been manageable.  However, the total amount of spam and the number of false negatives has increased as this blog has become more popular.

For reasons I don’t understand, one post has attracted far more spam than any other: One program to rule them all.