Smith’s determinant

Here’s a curious identity I ran across while browsing Knuth’s magnum opus.

\[\left| \begin{array}{llll} \gcd(1,1) & \gcd(1,2) & \ldots & \gcd(1, n) \\ \gcd(2,1) & \gcd(2,2) & \ldots & \gcd(2, n) \\ \vdots & \vdots & & \vdots \\ \gcd(n,1) & \gcd(n,2) & \ldots & \gcd(n, n) \end{array}\right| = \varphi(1) \varphi(2) \cdots \varphi(n) \]

Here gcd(i, j) is the greatest common divisor of i and j, and φ(n) is Euler’s phi (or totient) function, the number of positive integers less than or equal to n and relatively prime to n.

The more general version of Smith’s determinant replaces gcd(i, j) above with f( gcd(i, j) ) for an arbitrary function f.

As a hint at a proof, you can do column operations on the matrix to obtain a triangular matrix with φ(i) in the ith spot along the diagonal.

Source: TAOCP vol 2, section 4.5.2, problem 42.

On the compulsive reading of news

“Remind me … to write a popular article on the compulsive reading of news. The theme will be that most neuroses and some psychoses can be traced to the unnecessary and unhealthy habit of daily wallowing in the troubles and sins of five billion strangers.”

Robert Heinlein, Stranger in a Strange Land

Too many objects

Around 1990, object oriented programming (OOP) was all the buzz. I was curious what the term meant and had a hard time finding a good definition. I still remember a description I saw somewhere that went something like this:

Object oriented programming is a way of organizing large programs. … Unless your program is fairly large, you will not see any difference between object oriented and structured programming.

The second sentence is no longer true. OOP is not just a high-level organizational style. Objects and method calls now permeate software at the lowest level. And that’s where things went wrong. Software developers got the idea that if objects are good, more objects are better. Everything should be an object!

For example, I had a discussion with a colleague once on how to represent depth in an oil well for software we were writing. I said “Let’s just use a number.”

    double depth;

My suggestion was laughed off as hopelessly crude. We need to create depth objects! And not just C++ objects, COM objects! We couldn’t send a double out into the world without first wrapping it in the overhead of an object.

Languages like C# and Java enforce the everything-is-an-object paradigm. You can’t just write a function; you have to write member functions of an object. This leads to a proliferation of “-er” classes that do nothing but wrap a function. For example, suppose you need a function to solve a quadratic equation. You might make it the Solve method on a QuadraticEquationSolver object. That’s just silly. as John Carmack said,

Sometimes, the elegant implementation is a function. Not a method. Not a class. Not a framework. Just a function.

Languages are changing. You can create anonymous functions in C#, for example. You can even create a named function, by creating an anonymous function first and saving it to a named variable. A little wonky, but not too bad.

I imagine when people say OOP is terrible, they often mean that OOP as commonly practiced now goes too far.

More software development posts

Consulting update

Here’s a quick update for those who might be interested in what I’m up to.

These days I’m primarily working on two large projects, one that’s mathematical modeling and other that’s data exploration. I also have a couple smaller projects going on, one in software process improvement and another mathematical modeling project.

I really enjoy what I’m doing. I’ve worked on variety of interesting projects, and that’s important to me. And I’m spending less time now looking for work and more time doing work. (You can never stop looking for work, but for now I can afford to spend less time doing so than I did at first.)

Golden strings and the rabbit constant

Golden strings are analogous to Fibonacci numbers, except one uses concatenation rather than addition.

Start with s1 = “1” and s2 = “10”. Then define sn = sn-1 + sn-2 where “+” means string concatenation.

The first few golden strings are

  • “1”
  • “10”
  • “101”
  • “10110”
  • “10110101”

The length of sn is Fn+1, the n+1st Fibonacci number. Also, sn contains Fn 1’s and Fn-1 0’s. (Source: The Glorious Golden Ratio).

If we interpret the sn as the fractional part of a binary number, the sequence converges to the rabbit constant R = 0.7098034428612913…

It turns out that R is related to the golden ratio φ by

R = \sum_{i=1}^\infty 2^{-\lfloor i \phi \rfloor}

where ⌊i φ⌋ is the largest integer no greater than iφ.

Here’s a little Python code to print out the first few golden strings and an approximation to the rabbit constant.

    from mpmath import mp, fraction

    a = "1"
    b = "10"
    for i in range(10):
        b, a = b+a, b
        print(b)

    n = len(b)
    mp.dps = n
    denom = 2**n
    num = int(b, 2)

    rabbit = fraction(num, denom)
    print(rabbit)

Note that the code sets the number of decimal places, mp.dps, to the length of the string b. That’s because it takes up to n decimal places to exactly represent a rational number with denominator 2n.

Related posts

Three properties of real-world graphs

From Graph Theory in the Information Age by Fan Chung:

Empirically, most real-world graphs have the following properties:

  • sparsity — The number of edges is within a constant multiple of the number of vertices.
  • small world phenomenon — Any two vertices are connected by a short path. Two vertices having a common neighbor are more likely to be neighbors.
  • power law degree distribution — The degree of a vertex is the number of its neighbors. The number of vertices with degree j (or having j neighbors) is proportional to j for some fixed constant β.

 

The seven color map theorem

The famous four color theorem says that any flat map can be colored with four colors so that no two countries that touch at more than a point are the same color. The hard part is to show that four colors are sufficient; it’s easy to show that four colors are necessary.

It’s easier to prove generalizations of the four color theorem to more complex surfaces. These were settled decades before the original four color theorem.

For example, the seven color map theorem says that any map on a torus (doughnut) can be colored using seven colors, and there exists a map that requires seven colors.

Besides the seven color theorem for the torus, there’s an eight color theorem for a surface with two holes, a nine color theorem for a surface with three holes, etc. In fact, there’s an n color theorem for all n at least 7.

From the examples above, it looks like if a surface has g holes (g for genus, what topologists call these holes), the number of colors required is 6 + g. That’s true for g up to 6, but then it fails. The actual theorem is that a surface with g holes requires

\left\lfloor \frac{7 + \sqrt{1+48g}}{2}\right\rfloor

colors where ⌊ x ⌋ is the largest integer no greater than x.

Because the expression above flattens out as a function of g, it will produce all integers greater than 6, and will produce some several times. For example, a map on a surface with 6 holes requires 12 colors, but so does a map on a surface with 7 holes.

Source: Four Colors Suffice: How the Map Problem Was Solved

Some things can’t be done slowly

Keith Perhac mentioned in a podcast that a client told him he accomplished more in three days than the client had accomplished in six months. That sounds like hyperbole, but it’s actually plausible.

Sometimes a consultant can accomplish in a few days what employees will never accomplish, not because the consultant is necessarily smarter, but because the consultant can give a project a burst of undivided attention.

Some projects can only be done so slowly. If you send up a rocket at half of escape velocity, it’s not going to take twice as long to get where you want it to go. It’s going to take infinitely longer.

Related post: A sort of opposite of Parkinson’s law

21st Century C

I ran across a copy of 21st Century C (ISBN 1491903899) this afternoon. I hadn’t heard of the book, but the title was intriguing.  I wrote more C in the 20th century than the 21st, so my ideas regarding C (sans ++) are out of date. (I’ve written a fair amount of C++ this century, but I have only written C under duress and with difficulty.)

I’ve only skimmed through the book so far, but one thing I like about it is that the first 100 pages are devoted to tools, not the C language per se. There’s a lot more to using any language than the language itself, and I find it harder to learn about tools than languages. It’s hard to know what tools to learn, and what features of those tools to learn first.

No data on the need to bring data

The preface to Elements of Statistical Learning opens with the popular quote

In God we trust, all others bring data. — William Edwards Deming

The footnote to the quote is better than the quote:

On the Web, this quote has been widely attributed to both Deming and Robert W. Hayden; however Professor Hayden told us that he can claim no credit for this quote, and ironically we could find no “data” confirming that Deming actually said this.

Emphasis added.

The fact that so many people attributed the quote to Deming is evidence that Deming in fact said it. It’s not conclusive: popular attributions can certainly be wrong. But it is evidence.

Another piece of evidence for the authenticity of the quote is the slightly awkward phrasing “all others bring data.” The quote is often stated in the form “all others must bring data.” The latter is better, which lends credibility to the former: a plausible explanation for why the more awkward version survives would be that it is what someone, maybe Deming, actually said.

The inconclusive evidence in support of Deming being the source of the quote is actually representative of the kind of data people are likely to bring someone like Deming.