Twitter is not micro-blogging

Twitter is often described as a micro-blogging platform. Twitter posts and like blog posts, except they’re limited to 140 characters (so they fit in a cell phone text message). You subscribe to Twitter posts (called “tweets”) sorta like you subscribe to a blog. Some people, like Kathy Sierra, do use Twitter for micro-blogging. Her tweets are little self-contained messages, often one sentence. Here’s a recent example:

Much as I liked Outliers, makes me cringe to see people focus on “it’s all luck/chance” rather than the “it takes 10,000 hours” part.

Nice observation, all in 133 characters. (She’s talking about Malcolm Gladwell’s book Outliers. He argues that success is a matter of accumulated lucky advantages plus around 10,000 hours of deliberate practice.)

Maybe the most common form of micro-blogging is link sharing. Here’s an example, again from Kathy Sierra.

…and for those whose head literally explodes over misuse of the word “literally”  http://literally.barelyfitz.com/

Some people use Twitter as a question-and-answer forum. This is sorta like blogging and inviting comments, and it can be very powerful.

But a lot of traffic on Twitter is not what I’d consider micro-blogging.  It’s more like a public form of instant messaging. I found this disorienting when I first started using Twitter. Who are they talking to? Context?! Why would I want everyone to see my instant messages? I suppose it’s an acquired taste.

Everyone on Twitter has some mixture of micro-blogging, Q&A, and instant messaging. Some people love the instant messaging-style conversations. To each his own. My preferred mix is weighted toward micro-blogging and Q&A.

I’m on Twitter at @johndcook.

Update (29 March 2010): It’s been more than a year since I first wrote this post. I now use the instant messaging aspect of Twitter a little more than I did then, though I still prefer the micro-blogging aspect. And I’ve created several daily tip accounts that are pure microblogs.

Probability distributions and object oriented programming

This post looks at applying object oriented programming ideas to probability distribution software. It explains the Liskov Substitution Principle and shows how it can keep you from falling into a subtle trap.

One of the big ideas in object-oriented programming is to organize software into units of data and related functions that represent things in your problem domain. These units are called classes. Particular instances of classes are called objects. For example, a business application could have a customer class. Particular customers are represented by customer objects.

The C++ numerical library that we developed at MDACC has classes that represent probability distributions. A probability distribution class contains methods (functions) such as PDF, CDF, Mean, Mode, etc.  For example, the NormalDistribution class represents normal distributions. Particular NormalDistribution objects each have their own mean and variance parameters.

Another big idea of object-oriented programming is inheritance, a way to organize classes into a hierarchy. This is where things get more interesting.

Inheritance is commonly described as an “is a” relationship. That explanation is often helpful, but sometimes it can get you into trouble. (Listen to this interview with Robert Martin for an explanation.) Probability distributions illustrate when “is a” should be represented by inheritance and when it should not be.

A beta distribution is a continuous distribution. So is a normal distribution. The BetaDistribution and NormalDistribution classes representing these probability distributions both derive from ContinuousDistribution class. This makes it possible to write generic code that operates on continuous distributions. Later we could pass in a particular type of  continuous distribution rather than having to write special code for every kind of continuous distribution.

Now think about a chi square distribution. A chi square distribution with ν degrees of freedom is a gamma distribution with shape ν/2 and scale 2. So in a mathematical sense, a chi square distribution “is a” gamma distribution. But should a class representing a chi square distribution inherit from a class representing a gamma distribution? The surprising answer is “no.” A rule called the “Lyskov Substitution Principle” (LSP) says this is a bad idea.

When a class X inherits from a class Y, we say X is the derived class and Y is the base class. The LSP says code should work without surprises when an instance of a derived class is passed into a function written to receive instances of the base class.  Deriving a BetaDistribution class from a ContinuousDistribution class should not lead to any surprises. A function that handles continuous distributions in general should work just fine when you give it a specific distribution such as a beta, normal distribution, etc.

Now suppose we derive our ChiSquareDistribution class from the GammaDistribution class. Suppose also we have a function that expects a GammaDistribution object. What happens if we pass it a ChiSquareDistribution? Maybe the function works with no surprises. If the function calls methods like PDF or CDF there’s no problem. But what if the function calls a SetParameters method specifies the shape and scale of the distribution? Now we have a problem. You can’t set the shape and scale independently for a chi square distribution.

If you try to make this work, you’re going to dig yourself into a hole. The code can’t be intuitive: two people could have reasonable but different expectations for how the code should behave. And attempts to patch the situation are only going to make things worse, introducing awkward dependencies and generally entangling the code. The LSP says don’t go there. From an object oriented programming view point, the gamma and chi square distributions are simply unrelated. Neither derives from the other.

The canonical explanation of the LSP uses squares and rectangles. Geometrically, a square is a special type of rectangle. But should a Square class derive from a Rectangle class? The LSP says no. You can’t set the length and width of a square independently. What should a Square class do when someone tries to set its length and width? Ignore one of them? Which one? Suppose you just use the length input and set the width equal to the length. Now you’ve got a surprise: setting the length changes the width, not something you’d expect of rectangles. Robert Martin does a good job of explaining this example in the interview mentioned above. On the other hand, if a Square class and a Rectangle class both derive from a Shape class, code written to act on Shape objects will work just fine when passed either Square objects or Rectangle objects.

Programmers will argue till they’re blue in the face over whether a Square “is a” Rectangle, or vice versa, or neither. The resolution to the argument is that inheritance does not mean “is a.” The idea of “is a” is often useful when thinking about inheritance, but not always. Of course a square is a rectangle, but that does not mean it’s wise to derive a Square class from a Rectangle class. Inheritance actually has to do with interface contracts. The pioneers of object oriented programming did not use the term “is a” for inheritance. That terminology came later.

So although a chi square distribution is a gamma distribution, a ChiSquareDistribution class should not inherit from a GammaDistribution class, just as a square is a rectangle but a Square class should not inherit from a Rectangle class. On the other hand, chi square and gamma distributions are continuous distribution, and it’s fine for ChiSquareDistribution and GammaDistribution classes to inherit from a ContinuousDistribution class, just as it’s fine for Square and Rectangle classes to derive from a Shape class. The difference is a matter of software interface functionality and not philosophical classification.

Free optimization software from Microsoft

This morning I stumbled across Microsoft Solver Foundation, optimization software developed in C#. The site only mentions a free “express edition.” Sounds like they’re releasing a free version first and may sell an upgrade in the future.

Here are the details of the algorithms supported.

  • Revised, Simplex Linear and Mixed Integer Programming (Primal and Dual Simplex)
  • Interior Point Method Linear and Quadratic Programming
  • Constraint Programming with Exhaustive Tree Search, Local Search, and Metaheuristic Techniques
  • Compact, Quasi-Newton (L-BFGS), Unconstrained Nonlinear Programming

Microsoft is inconsistent in its support for numerical computing. For example, Visual Studio’s math.h implementation does not include the mathematical functions that are included in POSIX systems. And their implementation of C++ TR1 does not yet include the specified mathematical functions. On the other hand, they have produced products like Windows HPC Server and Solver Foundation. Clearly some people at Microsoft care about numerical computing.

Distribution of time customers spend in coffee shops

How would you model the time customers spend in a coffee shop?

This post is pure speculation based on no hard data whatsoever, which makes things considerably easier! If anyone has data or suggestions, please leave a comment. Here goes a first attempt.

The time people spend in a coffee shop depends on why they are there.

  1. Some grab their coffee and go.
  2. Some are there to visit with a friend.
  3. Some drink their coffee (alone) and leave.
  4. Some are there to work.

Each group would have its own time distribution, and the overall distribution would be a mixture of these distributions. Since I’m doing this for fun, I’ll ignore (1) and (2) and just concentrate on (3) and (4). I’ll also ignore complications such as how patterns change throughout the day and how they change according to the day of the week.

Say someone comes in alone to have a cup of coffee. Maybe they stay an average of 15 minutes. I’ll assume the time these folks spend in a coffee shop is normally distributed. Not many stay more than 30 minutes, so let’s say the standard deviation is 5 minutes. That would put only about 0.4% staying longer than 30 minutes. It would be more realistic to truncate the distribution at zero to eliminate the small probability of spending negative time in the coffee shop (!) and  skew the distribution a little to the  right, giving more probability to people staying more than 30 minutes.

The people who come to the coffee shop to work stay considerably longer than the folks who are just there to drink a cup of coffee. And their time distribution would be heavily skewed. These folks are unlikely to stay less than 30 minutes, so the distribution would drop off sharply on the left. There’s a wide variety of how long people might work, so I’d expect a long tail to the right. The inverse gamma distribution fits this description. Say there’s a 5% chance that a worker will stay less than 30 minutes, and a 5% chance they’ll stay more than two hours. Using this software to solve for parameters, we find a shape parameter of 6.047 and a scale parameter of 317.3 fits the time distribution in minutes. This distribution has a mean of about 63 minutes, which I suppose is reasonable.

Here’s what the graphs of the two distributions would look like: a symmetric distribution centered at 15 minutes for the drinkers and a skewed distribution centered around 63 minutes for the workers.

Now suppose 70% of customers are drinkers and 30% are workers. Then the mixture distribution would look like this.

As the percentage of workers goes down, so does the second hump in the graph. If a coffee shop had about 20% drinkers and 80% workers, the two humps would be about the same height.

How would you include people who come to a coffee shop with a friend?

Programming for artists

Last night I listened to the latest FLOSS Weekly podcast, an interview with the creators of Processing. I’d heard of the Processing language before, but I thought it was some sort of ETL (extract, transform, and load) tool for data processing. Instead, it’s a Java-like language for artists. Here’s the description from the processing.org site.

Processing is an open source programming language and environment for people who want to program images, animation, and interactions. It is used by students, artists, designers, researchers, and hobbyists for learning, prototyping, and production. It is created to teach fundamentals of computer programming within a visual context and to serve as a software sketchbook and professional production tool.

Show notes

How unevenly can you split an convex set through its center?

Here’s a surprising theorem. Suppose you have a convex set in Rn you pass a plane through its center of mass. How much of the volume of the set can be on one side? No more than about 63%. (Precisely, 1 – 1/e.) This holds for any dimension n and for any direction through the center. I don’t have a reference for this theorem except that it is mentioned near the end of lecture 5 in this course.

Update:  See Splitting a convex set through its center for an illustration and a partial proof.

Using SWIG to expose C code to Python

Sergey Fomel left a valuable comment on my post about computing the error function erf(x). He gave some sample code for exposing C functions to Python via SWIG on Linux. I haven’t used SWIG, but I’ve heard good things about it. I believe Google uses SWIG extensively to make C++ code callable from Python.

Here’s Sergey’s code.

bash$ cat erf.i

%module erf
#include
double erf(double);

bash$ swig -o erf_wrap.c -python erf.i
bash$ gcc -o erf_wrap.os -c -fPIC -I/usr/include/python2.4 erf_wrap.c
bash$ gcc -o _erf.so -shared erf_wrap.os
bash$ python
>>> from erf import erf
>>> erf(1)
0.84270079294971489

C. S. Lewis on reading old books

C. S. Lewis on the value of reading old books:

Every age has its own outlook. It is specially good at seeing certain truths and specially liable to make certain mistakes. We all, therefore, need the books that will correct the characteristic mistakes of our own period. And that means the old books. All contemporary writers share to some extent the contemporary outlook—even those, like myself, who seem most opposed to it. … To be sure, the books of the future would be just as good a corrective as the books of the past, but unfortunately we cannot get at them.

The quote comes from an introduction he wrote for a translation of On the Incarnation. In the same book, Lewis recommended reading one old book for every contemporary book or two. I agree that reading old books provides perspective on the present, and I do read old books from time to time, but I don’t come close his recommended ratio of old books.

What are some old books you’ve enjoyed or intend to read?

Update: Related post, Old math books

Stand-alone normal (Gaussian) distribution function

I’ve seen several people ask lately how to compute the distribution (CDF) function for a standard normal random variable, often denoted Φ(x). They want to know how to compute it in Java, or Python, or C++, etc. Every language has its own standard libraries, and in general I recommend using standard libraries. However, sometimes you want to minimize dependencies. Or maybe you want more transparency than your library allows. The code given here is in Python, but it is so compact that it could easily be ported to any other language.

I just posted Python code for computing the error function, erf(x). The normal density Φ(x) is a simple transformation of erf(x). Given code for erf(x), here’s code for Φ(x).

def phi(x):
    return 0.5*( 1.0 + erf(x/math.sqrt(2)) )

After deriving the transformations between erf(x) and Φ(x) several times, including their complements and inverses, I wrote them down to save. See the PDF file Relating Φ and erf.

See also stand alone code for computing the inverse of the standard normal CDF.