From the monthly archives:

July 2009

Financial control and useless projects

by John on July 17, 2009

Tom DeMarco has an article in the latest IEEE Software in which he gives an example of two hypothetical software projects. Both are expected to cost around a million dollars. One is expected to return a value of 1.1 million and the other 50 million. Financial controls are crucial for the former but not for the latter. He concludes

… strict control is something that matters a lot on relatively useless projects and much less on useful projects. It suggests that the more you focus on control, the more likely you’re working on a project that’s striving to deliver something of relatively minor value.

Thanks to John MacIntyre for pointing out Tom DeMarco’s article.

{ 2 comments }

Solo software development

by John on July 16, 2009

Is it becoming easier or harder to be a solo software developer? I see two trends flowing in opposite directions.

Matt Heusser argues in his article The Boutique Tester that it’s easier to be an independent software developer now than it was a decade ago. You don’t have to burn CDs and ship them; you just put your software up on the web. You don’t have to maintain your own server; you can rent a server cheaply.You don’t have to buy expensive development tools; good tools are available for free. All these things are true, but there are other issues.

Software developers are required to know more languages than ever. A decade ago, you could make a career writing desktop applications in Visual Basic or C++ and not need to know any other language. Now in order to write a web application you need to know at least HTML, CSS, JavaScript, and SQL in addition to a programming language such as Java, C#, or Ruby. However, just knowing these languages is just a beginning. You need to learn a web development framework such as JSP, ASP.NET, or Rails. The list seems to never end. See Joe Brinkman’s article Polyglot Programming: Death By A Thousand DSLs. Programming language proliferation is not the only new difficulty in software development — security anyone? — but I’ll focus on languages.

Can one developer learn all these languages? The surprising answer is “yes.” You might think that such a menagerie of languages would lead developers to specialize, but programmers are not nearly as specialized as an outsider might expect, even in large organizations. On the other hand, most developers don’t entirely understand what they’re doing, having to work with more languages than they could possibly master. This is no doubt the root cause of many bugs.

Going back to the original question, is it easier or harder to be a solo developer these days? Software development itself has gotten harder, but the external difficulties have been greatly reduced. Programmers have to know more programming languages, but programmers have a knack for that. They don’t have to spend as much time on distribution, system administration, etc. Even sales and marketing, the bane of many developers, is easier now.  So while software development itself has become harder, being an independent software developer may have become easier.

Many people disagree that software development has gotten harder; my opinion may be in the minority.  Software development tools have certainly improved. It would be much easier to develop 1999-style applications in now than it was in 1999. But I believe that developing 2009-style applications with 2009 tools is harder than developing 1999-style applications was with 1999 tools, particularly for high quality software. Throwing together applications that sorta work most of the time may be easier now, but developing quality software is more difficult.

Related post:
Programming language fatigue

{ 3 comments }

I owe Microsoft Word an apology

by John on July 15, 2009

I tried to use the Equation Editor in Microsoft Word years ago and hated it. It was hard to use and produced ugly output. I tried it again recently and was pleasantly surprised. I’m using Word 2007. I don’t remember what version I’d tried before.

I’ve long said that math written in Word is ugly, and it usually is. But the fault lies with users, like myself, not with Word. I realize now that the problem is that most people writing math in Word are not using the Equation Editor. LaTeX produces ugly math too when people do not use it correctly, though this happens less often.

Math typography is subtle. For example, mathematical symbols are set in an italic font that is not quite the same as the italic font used in prose. Also, word-like symbols such as “log” or “cos” are not set in italics. I imagine most people do not consciously notice these conventions — I never noticed until I learned to use LaTeX — but subconsciously notice when the conventions are violated. The conventions of math typography give clues that help readers distinguish, for example, the English indefinite article “a” from a variable named “a” and to distinguish the symbol for maximum from the product of variables “m”, “a”, and “x.”

Microsoft’s Equation Editor typesets math correctly. Word documents usually do not, but only because folks usually do not use the Equation Editor. In the following example, I set the same equation three times: using ordinary text, using ordinary italic for the “x”, and finally using the Equation Editor.

screen shot of trig identity using MS Word

Note that the “x” in the third version is not the same as the italic “x” in the second version. The prose in this example is set in Calibri font and the Equation Editor uses Cambria Math font. Also, I did not tell Word to format “sin” and “cos” one way and “x” another or tell it what font to use; I simply typed sin^2 x + cos^2 x = 1 into the Equation Editor and it formatted the result as above. I haven’t used it much, but the Equation Editor seems to be more capable and easier to use than I thought.

Here are a few more examples of Equation Editor output.

examples of math using Word: Gaussian integral, Fourier series, quadratic equation

I still prefer using LaTeX for documents containing math symbols. I’ve used LaTeX for many years and I can typeset equations very quickly using it. But I’m glad to know that Word can typeset equations well and that the process is easier than I thought.

I tried out the Equation Editor because Bob Matthews suggested I try MathType, a third-party equation editor add-on for Microsoft Word. I haven’t tried MathType yet but from what I hear it produces even better output.

Related post: Contrasting Microsoft Word and LaTeX

{ 21 comments }

Ever feel like a newspaper?

by John on July 13, 2009

Why are newspapers going out of business? The simple explanation is that newspaper owners are stupid; the world around them is changing and they’re oblivious. Michael Nielsen has a more interesting explanation. He says that newspapers are in trouble not because they’re stupid now but because they’ve been smart in the past.

Nielsen argues that newspapers are locked into their current business models because they have been so successful. Any small changes will make their businesses less profitable. I don’t know enough about the newspaper industry to say whether Nielsen is right, though I find his argument plausible. (His article is entitled Is scientific publishing about to be disrupted? However, it is about much more than scientific publishing.)

Nielsen argues that newspapers are standing on the top of one hill and profitable online news sources are standing on a higher hill, a hill that didn’t exist 20 years ago. In mathematical lingo, both businesses are at local maxima. Newspapers are trapped because they can’t improve their situation without first making it worse. Anyone who leads a newspaper down its hill in order to climb a new hill will be fired before he starts gaining altitude again.

I don’t care that much about newspapers, but Nielsen’s article struck me because it provides an explanation for many other situations. I feel like some areas of my life are stuck at a local maximum: there’s plenty of room for improvement, but not by making small changes.

{ 5 comments }

Random inequalities VIII: folded normals

by John on July 13, 2009

Someone who ran into my previous posts on random inequalities asked me how to compute random inequalities for folded normals. (A folded normal random variable is the absolute value of a normal random variable.) So the question is how to compute

Pr(|X| > |Y|)

where X and Y are normally distributed. Here’s my reply as a short tech report: Inequality probabilities for folded normal random variables.

Previous posts in this series:

Introduction
Analytical results
Numerical results
Cauchy distributions
Beta distributions
Gamma distributions
Three or more random variables

{ 0 comments }

F# may succeed where others have failed

by John on July 13, 2009

Philip Wadler wrote an article a decade ago entitled Why no one uses functional languages. He begins the article by explaining that yes, there have been a number of large successful projects developed in functional programming languages. But compared to the number of programmers who work in procedural languages, the number working in functional languages is essentially zero. The reasons he listed fall into eight categories.

  1. Lack of compatibility with existing code
  2. Limited library support compared to popular languages
  3. Lack of portability across operating systems
  4. Small communities and correspondingly little community support
  5. Inability to package code well for reuse
  6. Lack of sophisticated tool support
  7. Lack of training for new developers in functional programming
  8. Lack of popularity

Most of these reasons do not apply to Microsoft’s new functional language F# since it is built on top of the .NET framework. For example, F# has access to the enormous Common Language Runtime library and smoothly interoperates with anything developed with .NET. And as far as tool support, Visual Studio will support F# starting with the 2010 release. Even portability is not a barrier: The Mono Project has been quite successful in porting .NET code to non-Microsoft platforms. (Listen to this Hanselminutes interview with Aaron Bockover for an idea how mature Mono is.)

The only issues that may apply to F# are training and popularity. Programmers receive far more training in procedural programming, and the popularity of procedural programming is self-reinforcing. Despite these disadvantages, interest in functional programming in general is growing. And when programmers want to learn a functional programming language, I believe many will choose F#.

It will be interesting to see whether F# catches on. It resolves many of the accidental difficulties of functional programming, but the intrinsic difficulties remain. Functional programming requires a different mindset, one that programmers have been reluctant to adopt. But now programmers have a new incentive to give functional languages a try: multi-core processors.

Individual processor cores are not getting faster, but we’re getting more of them per box. We have to write multi-threaded code to take advantage of extra cores, and multi-threaded programming in procedural languages is hard, beyond the ability of most programmers. Strict functional languages eliminate many of the difficulties with multi-threaded programming, and so it seems likely that at least portions of systems will be written in functional languages.

Related post: Functional in the small, OO in the large

{ 7 comments }

Weekend miscellany

by John on July 11, 2009

Here are a few eclectic links for the weekend.

How globes are made. You might enjoy watching this with your children.

Jason Fried presentation on the 37 Signals approach to small business. My favorite line: “Tomorrow doesn’t happen unless you get today right.” You do not want to watch this one with your children.

A lot of people don’t know what a web browser is.

Scientific American podcast interview with Atul Gawande, author of Complications and Better. Among other things, Gawande explains how process improvements, not new science, have caused a dramatic decrease in battlefield fatalities.

Software projects and power laws. The probability distributions for delays have thick tails.

A quick comparison of US and Canadian law.

Two math blog carnivals came out this week: Carnival of Mathematics and
Math Teachers at Play. Anyone know when or where the next Carnival of Mathematics will be?

{ 1 comment }

Emily Dickinson versus Paris Hilton

by John on July 9, 2009

Mark Helprin discusses the decline of serious political discourse in America in his excellent book Digital Barbarism. Earlier generations were more patient, “primed to deliberate rather than merely to react.” He summarizes his argument by comparing Emily Dickinson and Paris Hilton.

That is not to say that all Americans were models of dignity and concentration, but by and large they were quite different from what we are now. … Rather than a massive comparison, suffice it to say that although today not everyone is like Paris Hilton, and in the nineteenth century not everyone was like Emily Dickinson, each of these is far more characteristic of her age than would be the other, and that this is self-evident along with all it implies.

Related post:

Place, privacy, and dignity

{ 14 comments }

My mathematical opposite

by John on July 8, 2009

Eugenia Cheng may be my mathematical opposite. She did a great interview with Peter Rowlett in which she bubbles over with enthusiasm for category theory. She explains that she couldn’t stand applied math, but stuck with math because she believed there was something there she could love. The further she moved from applicable math, the happier she became. Abstract algebra was a big improvement, but still too concrete. When she discovered category theory, she was home.

Category theory is a sort of meta-mathematics. It aims to identify patterns across diverse areas of math the way a particular area of math may identify patterns in nature. I like the idea of category theory, but I get that deer-in-the-headlights look in my eyes almost immediately when I look at category theory in any detail.

I enjoy pure math, though I prefer analysis to algebra. I even enjoyed my first abstract algebra class, but when I ran into category theory I knew I’d exceeded my abstraction tolerance. I moved more toward the applied end of the spectrum the longer I was in college. Afterward, I moved so far toward the applied end that you might say I fell off the end and moved into things that are so applied that they’re not strictly mathematics: mathematical modeling, software development, statistics, etc. I call myself a very applied mathematician because I actually apply math and don’t just study areas of math that could potentially be applied.

I appreciate Eugenia Cheng’s enthusiasm even though I don’t share her taste in math. I have long intended to go back and learn a little category theory. It would be great mental exercise precisely because it is so foreign to my way of thinking. Cheng’s interview inspired me to give it one more try.

{ 7 comments }

How Michelangelo worked

by John on July 7, 2009

Michelangelo's Pieta</ins>

The following quote from Irving Stone describes how Michelangelo worked on his Pietà.

He carved in a fury from first light to dark, then threw himself across his bed, without supper and fully clothed, like a dead man. He awoke around midnight, refreshed, his mind seething with sculptural ideas, craving to get at the marble.

{ 2 comments }

Database developers all know the ACID acronym. It says that database transactions should be:

  • Atomic: Everything in a transaction succeeds or the entire transaction is rolled back.
  • Consistent: A transaction cannot leave the database in an inconsistent state.
  • Isolated: Transactions cannot interfere with each other.
  • Durable: Completed transactions persist, even when servers restart etc.

These qualities seem indispensable, and yet they are incompatible with availability and performance in very large systems. For example, suppose you run an online book store and you proudly display how many of each book you have in your inventory. Every time someone is in the process of buying a book, you lock part of the database until they finish so that all visitors around the world will see accurate inventory numbers. That works well if you run The Shop Around the Corner but not if you run Amazon.com.

Amazon might instead use cached data. Users would not see not the inventory count at this second, but what it was say an hour ago when the last snapshot was taken. Also, Amazon might violate the “I” in ACID by tolerating a small probability that simultaneous transactions could interfere with each other. For example, two customers might both believe that they just purchased the last copy of a certain book. The company might risk having to apologize to one of the two customers (and maybe compensate them with a gift card) rather than slowing down their site and irritating myriad other customers.

There is a computer science theorem that quantifies the inevitable trade-offs. Eric Brewer’s CAP theorem says that if you want consistency, availability, and partition tolerance, you have to settle for two out of three. (For a distributed system, partition tolerance means the system will continue to work unless there is a total network failure. A few nodes can fail and the system keeps going.)

An alternative to ACID is BASE:

  • Basic Availability
  • Soft-state
  • Eventual consistency

Rather than requiring consistency after every transaction, it is enough for the database to eventually be in a consistent state. (Accounting systems do this all the time. It’s called “closing out the books.”) It’s OK to use stale data, and it’s OK to give approximate answers.

It’s harder to develop software in the fault-tolerant BASE world compared to the fastidious ACID world, but Brewer’s CAP theorem says you have no choice if you want to scale up. However, as Brewer points out in this presentation, there is a continuum between ACID and BASE. You can decide how close you want to be to one end of the continuum or the other according to your priorities.

{ 8 comments }

Weekend miscellany

by John on July 4, 2009

Software

Here’s an old NBC news report speculating about technology in the year 2000. Apparently “something called the Internet” will be important. HT: Sorting out Science.

The Mono project, an open source rewrite of Microsoft’s .NET framework, is more mature than I thought. From Hanselminutes.

“Cloud” is a good metaphor for most of what I hear about “cloud computing” because it’s so nebulous. But Michael Stiefel has some solid things to say on the subject.

Python Infrequently Answered Questions

Quote from Word Aligned blog “One day software will be the most reliable component of every product which contains it.” — Tony Hoare. I’m not as optimistic as Mr. Hoare, or I at least thing “one day” is far away.

Joakim Karlsson says in his post The Locality of Code Changes “The probability that you will change a piece of code in the near future increases when you make changes to that code or to code in its vicinity.”

Economics

The best explanation I’ve seen for why newspapers are dying

Malcolm Gladwell’s rebuttal to Chris Anderson’s “Free” thesis

EconTalk interview with Mark Helprin on copyright

Math and statistics

In Is P = NP an ill-posed problem? Dick Lipton contrasts the Riemann hypothesis and the question of whether P = NP.

Visualizing correlations

Music, coffee, and physics

Classical music in cartoons

Latte art

Fun with an MRI machine. NB: The block is aluminum, not iron. Magnets don’t attract aluminum. But aluminum can conduct a current induced by a magnetic field. HT: Ovablastic.

{ 2 comments }

Glynn Foster from Sun talks about OpenSolaris on FLOSS Weekly episode 75. After explaining how Solaris has always been a robust, scalable operating system, Foster brags that now on a Toshiba laptop with OpenSolaris pre-installed ” … the volume works, and the keys work…” Then host Jono Bacon laughs “The keys work?!” The dialog starts at about 23:30 into the podcast.

The other host, Leo Laporte, mumbled “so cool” after Glynn Foster says “the volume works” and apparently would have let him get away with saying “the keys work.” But Jono Bacon is the community manager for Ubuntu, a Linux distribution that cares more about whether the volume and keyboard work than whether the OS scales.

It was amusing to listen to Glynn Foster and Jono Bacon personify their respective operating system’s priorities, server performance for Solaris and desktop experience for Ubuntu. Foster says that OpenSolaris used to be a royal pain to install and configure but now it has gotten much better. I don’t know how well Ubuntu scales — I imagine it’s not nearly as scalable as OpenSolaris — but it was designed from the beginning to be easy to install.

{ 3 comments }

Three rules of thumb

by John on July 1, 2009

Here are three rules of thumb for back-of-the-envelope estimates:

  1. Duff’s rule: Pi seconds is a nanocentury.
  2. Hopper’s rule: Light travels one foot in a nanosecond.
  3. Rule of 72: An investment at n% interest will double in 72/n years.

How might you use these? How accurate are they?

Duff’s rule comes in handy when converting from times measured in seconds to times measured on calendars. This may not sound useful, but it often happens in software. For example, if a task takes a second to complete, how long would it take to do it a billion times? Well, a billion seconds, obviously. But how long is that in familiar terms? Duff’s rule says a century is about 3.14 billion seconds, so a billion seconds would be something like 30 years.

How accurate is Duff’s rule? A year is 31,536,000 seconds, whereas Duff’s rule would estimate 31,415,927 seconds, so it underestimates the number of seconds in a year by about 0.4%.

Hopper’s rule is useful in electrical engineering. For example, you might need to know how long it would take a radio signal to travel between a transmitter and receiver. Hopper’s rule can explain why computer chip clock rates are not increasing. Electrical signals travel at some fraction of the speed of light, and current chip designs are limited by whether a signal can move across the chip during a clock cycle.

How accurate is Hopper’s rule? Light travels 299,792,458 meters per second. That corresponds to 0.983 feet per nanosecond, so Hopper’s rule overestimates by about 1.7%.

Here is a terrific video of Grace Hopper explaining Hopper’s rule to David Letterman, around 4:25. (Thanks Bill!)

The Rule of 72 is obviously useful in financial estimates. For example, $1000 invested at 6% interest will become $2000 in 72/6 = 12 years.

How accurate is the rule of 72? The value of an initial investment P at time t with under a continuous interest rate r is P exp(rt). Solving exp(rt) = 2 for t gives t = log 2 / r. If we express r as a percentage, we have to multiply t by 100. This says that for continuously compounded interest, the rule of 72 would be exact if “72″ were replaced with 100 log 2 = 69.3. So for continuous interest, the rule overestimates the doubling time by 0.72/log 2 or about 4%. So why use 72 rather than 69.3? There are two reasons. First, 72 is easy to work with mentally since it is divisible by lots of small integers. Second, interest is often compounded periodically — say annually or monthly — rather than continuously.

The doubling time is longer for investments with periodic interest rather than continuous interest. The overestimate from using 72 rather than 69.3 is partially canceled out by the accounting for the longer doubling time for periodic compounding and so 72 may work better than 69.3. Exactly how accurate the rule of 72 is for periodically compounded interest depends on the interest rate and the compounding period.

Related post:

Bancroft’s rule (rule of thumb for estimating linear regression)

{ 9 comments }