From the category archives:

Software development

Readability

by John on November 28, 2011

The Readability bookmarklet lets you reformat any web to make it easier to read. It strips out flashing ads and other distractions. It uses black text on a white background, wide margins, a moderate-sized font, etc. I use Readability fairly often. (Instapaper is a similar service. I discuss it at the end of this post.)

Yesterday I used it to reformat an article on literate programming. For some inexplicable reason, the author chose to use a lemon yellow background. It’s ironic that the article is about making source code easier to read. The content of the article is easy to read, but the format is not.

Readability to the rescue! Here are before and after screen shots.

Before:

After:

I recommend the article, Example of Literate Programming in HTML, and I also recommend using reformatting the page unless you enjoy reading black text on a yellow background.

Readability did a good job until about half way through the article. The article has C and HTML code examples, and perhaps these confused Readability. (Readability usually handles code samples well. It correctly formats the first few code samples in this article.) The last half of the article renders like source code, and the font gets smaller and smaller.

I ran the page through an HTML validator to see whether some malformed HTML could be the source of the problem. The validator found numerous problems, so perhaps that was the issue.

I haven’t seen Readability fail like this before. I’ve been surprised how well it has handled some pages I thought might trip it up.

I ended up saving the article and editing its source, changing the bgcolor value to white. It’s a nice article on literate programming once you get past the formatting. The best part of the article is the first section, and that much Readability formats correctly.

Instapaper

Instapaper reformats web pages similarly. It produces a narrower column of text, but otherwise the output looks quite similar.

Instapaper did not discover the title of the literate programming article. (The title of the article was not in an <h1> tag as software might expect but was only in a <title> tag in the page header.) However, it did format the entire body of the article correctly.

I find it slightly more convenient to use the Readability bookmarklet than to submit a link to Instapaper. I imagine there are browser plug-ins that make Instapaper just as easy to use, though I haven’t looked into this because I’m usually satisfied with Readability.

Related posts:

Literate programming and statistics
Tricky code

{ 12 comments }

The Tangled Web

by John on November 27, 2011

The Tangled Web is a security book that you may find interesting even if you’re not interested in security. The first half of the book is an excellent explanation of how Web technologies work in theory and especially in practice. This material is included in order to discuss security implications, but it’s interesting on its own. The second half of the book is directly devoted to Web security.

The author, Michal Zalewski, has a colorful writing style. His book is serious and loaded with technical detail, but that doesn’t stop him from turning a nice phrase here and there.

Here’s an excerpt from The Tangled Web that I particularly liked, one that explains why security concerns on the Web differ from previous security concerns.

In the traditional model followed by virtually all personal computers … there are very clear boundaries between high-level data objects (documents), user-level code (applications), and the operating system kernel … These boundaries are well studied and useful for building practical security schemes. A file opened in your text editor is unlikely to be able to steal your email …

In the browser world, this separation is practically nonexistent: Documents and code live as parts of the same intermingled blobs of HTML, isolation between completely unrelated applications is partial at best …

In the end, the seemingly unlikely scenario of a text file stealing your email is, in fact, a frustratingly common pattern on the Web.

{ 1 comment }

Norris’ number

by John on November 22, 2011

My friend Clift Norris has identified a fundamental constant that I call Norris’ number, the average amount of code an untrained programmer can write before he or she hits a wall. Clift estimates this as 1,500 lines. Beyond that the code becomes so tangled that the author cannot debug or modify it without herculean effort.

Related posts:

Writes large correct programs
Little programs versus big programs
Experienced programmers and lines of code

{ 26 comments }

Career advice regarding tools

by John on November 21, 2011

J. D. Long wearing a Panama and smoking a Dominican

A few weeks ago, J. D. Long gave some interesting advice in a Google+ discussion. He starts out

Lunch today with an analyst 13 years my junior made me think about things I wish I had known about the technical analytical profession when I was 25. Here’s some things that popped into my head:

The entire list is worth reading, but I want to focus on two things he said about tools.

  • Use tools you don’t have to ask permission to install (i.e. open source).
  • Dependence on tools that are closed license and un-scriptable will limit the scope of problems you can solve. (i.e. Excel) Use them, but build your core skills on more portable & scalable technologies.

I would have disagreed a few years ago, but now I think this is good advice.

In the late 90’s I used mostly Microsoft tools. That was a good time to be a Microsoft developer. Windows was on the rise; Unix and Mac OS were on the ropes. Desktop applications were the norm and were easier to write on Windows. Open source software was hard to install and hard to use. People who used open source software often did so for ideological reasons, not because it made their work easier.

Of course times have changed. Mac recovered from its near death experience. Unix didn’t, but it has been resurrected as Linux. The web made it easier to write cross-platform software. And above all, open source software has matured. The open source community is more positive, focused on promoting good software rather than trying to give some corporation a stick in the eye.

Now the advantages of open source are clearer. There’s not the same hidden cost in frustration that there was a few years ago. Now I would say yes, it’s a great advantage to use tools you can install whenever and wherever you want, without having to go through a purchasing bureaucracy.

It’s interesting that JD equates open source with scriptability. Open source software often is scriptable, not because it’s open source, but because of the Unix aesthetic that pervades the open source community. Closed source software is often not scriptable, not because it’s closed source, but because it is often written for consumers who value usability over composability. Commercial server-side products may be scriptable. If I were to restate JD’s advice on this point, I’d say to keep composability in mind and don’t just think about usability.

I appreciate JD’s attitude toward applications such as Excel. He’s not saying you should never defile your conscience by opening Excel. Some tasks are incredibly easy in Excel. The danger comes from pushing the tool into territory where other tools are better. There are still some in the open source community who believe that opening Excel is a sin, but I’m much more in agreement with the people who say, for example, that Excel isn’t the best tool for statistical analysis.

Portability is funny. In the early days of computing, there were no dominant players, and portability was important (and difficult). Then for a while, portability didn’t matter if you were content with only running on the 95% of the world’s computers that ran Windows. Now portability is important again. Windows still has a huge market share on the desktop, but the desktop itself is losing market share.

And portability matters for more than consumer operating systems. JD mentions portability and scalability in one breath. You may want to move code between operating systems to scale up (e.g. to run on a cluster) or to scale down (e.g. to run on a mobile device).

There’s also the aspect of career portability. You want to master tools that you can take with you from job to job. I would be leery of building a career around a small company’s proprietary tools. If I were in that situation, I’d learn something else on the side that’s more portable.

In closing, I’ll give the rest of JD’s career advice without commentary. These points could make interesting fodder for future blog posts.

  • Be a profit center, not a cost center.
  • Use tools you don’t have to ask permission to install (i.e. open source).
  • Dependence on tools that are closed license and un-scriptable will limit the scope of problems you can solve. (i.e. Excel) Use them, but build your core skills on more portable & scalable technologies.
  • Learn basic database tools.
  • Learn a programming language.
  • Your internal job description may say, “Analyst” but get something else on your business cards. Analyst is so vague as to be meaningless. My external title is currently “Sr. Risk Economist.” I like the term “Data Scientist” for now. I expect that term will be meaningless in 5 years.
  • Large organizations do not properly appreciate agile and smart analytic types. Time at large firms should be seen as subsidized learning. Learn lots, but get out.
  • Ensure you can explain any of your projects to your wife or non-technical friends. It’s good practice for board meetings later in your career.
  • Be sure you know the handful of things that you can do better than most anyone else. Add something to that list every year. Make sure you can explain these things to non techies.
  • Be a profit center, not a cost center. At least be as close to the profit center as possible. The chief analyst for the sales SVP is closer to the profit center than an IT analyst supporting billing operations.
  • Get really good at asking questions so you understand problems before you start solving them.
  • Yes, that bit about being a profit center not a cost center is in there twice. It should probably be in there 5 times.

{ 13 comments }

The plumber programmer

by John on November 15, 2011

I called someone a plumber programmer the other day. The person I was speaking to didn’t realize that “plumber programmer” is a term of great respect. The plumber is often the most experienced programmer on a team.

As with literal plumbing, software plumbing connects things together. It deals with things other people don’t want to see or think about. And it’s crucial.

Thomas Guest made a couple diagrams that illustrate this. Managers draw software diagrams with big boxes and little arrows. The boxes represent software components and the arrows represent the code that connects them together.

This gives the impression that the boxes are the hard part and the arrows are easy. The opposite is probably true. Thomas says if we drew the diagram so that the size of the components is proportional to the effort, it might look like this:

Related posts:

Where does programming effort go?
Your job is trivial (but I couldn’t do it)

{ 36 comments }

Separating presentation from content

by John on November 14, 2011

In the late ’90s I went to a fair number of Microsoft presentations. One presentation would say “The problem with Technology X is that it mixes presentation and content. We’ve introduced Technology Y to make your code cleaner, separating presentation and content.” A few months later I’d be at another presentation that would announce “The problem with Technology Y is that it mixes presentation and content. We’ve introduced Technology Z …” (Does this remind anyone else of The Cat in the Hat Comes Back?)

When I first learned LaTeX, I was told that one of its strengths is that it separates presentation and content. Then a few years later I hear complaints that the problem with LaTeX is that it mingles presentation and content, unlike XHTML. A few years later, guess what? XHTML mixes presentation and content, so we need something else.

I shut down when I hear someone announce that everything before their product was bad because it mixed presentation and content, and now with their solution, presentation and content will be completely separate.

Sometimes one technology really does make a cleaner separation of presentation and content. But at best the separation is relative. LaTeX separates presentation and content more than Word, though not as much as well-written HTML and CSS, maybe. But presentation and content cannot be entirely separated. Nor is their unanimous agreement on what exactly the dividing line is between the two.

Many people don’t want to separate their presentation and content. They don’t understand why this would be desirable, and they’ll fight against anything designed to encourage separation. Maybe they need to learn the advantages, or maybe they’re just doing the best they can to get their job done and they can’t be bothered with long term advantages that may not materialize.

The principle of separating presentation and content is admirable. It really does have advantages, but it’s easier said than done.

{ 8 comments }

Unix tool tips

by John on November 10, 2011

I’ve renamed my SedAwkTip twitter account to UnixToolTip to reflect its new scope. If you were following SedAwkTip, there’s no need to do anything. You’ll just see a different name.

I have about a week’s worth of sed and awk tips scheduled. Then I’ll start adding in tips on grep, find, uniq, etc. And I’ll come back to sed and awk now and then.

These tools came from the Unix world, but they’re also available on Windows.

For now I’m keeping the original icon. I’m open to suggestions if someone has an idea for a better icon.

s///

Related posts:

Thermonuclear word processor
Retro computing

{ 2 comments }

Firsthand knowledge

by John on November 6, 2011

From C. S. Lewis:

It has always therefore been one of my main endeavors as a teacher to persuade the young that firsthand knowledge is not only more worth acquiring than secondhand knowledge, but it usually much easier and more delightful to acquire.

This quote comes from the essay On the Reading of Old Books, part of the collection God in the Dock: Essays on Theology and Ethics. Lewis says here that it is easier to read Plato or St. Paul, for example, than to read books about Plato or St. Paul.  Lewis says that the fear of reading great authors

… springs from humility. The student is half afraid to meet one of the great philosophers face to face. He feels himself inadequate and thinks he will not understand him. But if he only knew, the great man, just because of his greatness, is much more intelligible than his modern commentators.

This does not only apply to literature. I see the same theme in math. Sometimes early math papers are easier to read because they are more concrete. When I was a postdoc at Vanderbilt I asked Richard Arenstorf about a theorem attributed to him in a book I was reading. He scoffed that he didn’t recognize it. He had done his work in a relatively concrete setting and did not approve of the fancy window dressing the author had placed around his theorem. I sat in on a few lectures by Arenstorf and found them amazingly clear.

The same theme appears in software development. Sometimes you can dive to the bottom of an abstraction hierarchy and find that things are simpler there than you would have supposed. The intervening layers obscure the substance of the program, making its core seem unduly mysterious. Like a mediocre mind commenting on the work of a great mind, developers who build layers of software around core functionality intend to make things easier but may do the opposite.

Related posts:

Endless preparation
Opening black boxes
Why Shakespeare is hard to read
C. S. Lewis on reading old books

{ 5 comments }

Code bloat

by John on November 1, 2011

“Back when I was starting out in computer science I thought by today we’d be writing a few lines of code to accomplish much. Instead, we write hundreds of thousands of lines of code to accomplish little.” — Lispian

{ 15 comments }

“Nothing brings fear to my heart more than a floating point number.” — Gerald Jay Sussman

The context of the above quote was Sussman’s presentation We really don’t know how to compute. It was a great presentation and I’m very impressed by Sussman. But I take exception to his quote.

I believe what he meant by his quote was that he finds floating point arithmetic unsettling because it is not as easy to rigorously understand as integer arithmetic. Fair enough. Floating point arithmetic can be tricky. Things can go spectacularly bad for reasons that catch you off guard if you’re unprepared. But I’ve been doing numerical programming long enough that I believe I know where the landmines are and how to stay away from them. And even if I’m wrong, I have bigger worries.

Nothing brings fear to my heart more than modeling error.

The weakest link in applied math is often the step of turning a physical problem into a mathematical problem. We begin with a raft of assumptions that are educated guesses. We know these assumptions can’t be exactly correct, but we suspect (hope) that the deviations from reality are small enough that they won’t invalidate the conclusions. In any case, these assumptions are usually far more questionable than the assumption that floating point arithmetic is sufficiently accurate.

Modeling error is usually several orders of magnitude greater than floating point error. People who nonchalantly model the real world and then sneer at floating point as just an approximation strain at gnats and swallow camels.

In between modeling error and floating point error on my scale of worries is approximation error. As Nick Trefethen has said, if computers were suddenly able to do arithmetic with perfect accuracy, 90% of numerical analysis would remain important.

To illustrate the difference between modeling error, approximation error, and floating point error, suppose you decide that the probability of something can be represented by a normal distribution. This is actually two assumptions: that the process is random, and that as a random variable it has a normal distribution. Those assumptions won’t be exactly true, so this introduces some modeling error.

Next we have to compute something about a normal distribution, say the probability of a normal random variable being in some range. This probability is given by an integral, and some algorithm estimates this integral and introduces approximation error. The approximation error would exist even if the steps in the algorithm could be carried out in infinite precision. But the steps are not carried out with infinite precision, so there is some error introduced by implementing the algorithm with floating point numbers.

For a simple example like this, approximation error and floating point error will typically be about the same size, both extremely small. But in a more complex example, say something involving a high-dimensional integral, the approximation error could be much larger than floating point error, but still smaller than modeling error. I imagine approximation error is often roughly the geometric mean of modeling error and floating point error, i.e. somewhere around the middle of the two on a log scale.

In Sussman’s presentation he says that people worry too much about correctness. Often correctness is not that important. It’s often good enough to produce a correct answer with reasonably high probability, provided the consequences of an error are controlled. I agree, but in light of that it seems odd to be too worried about inaccuracy from floating point arithmetic. I suspect he’s not that worried about floating point and that the opening quote was just an entertaining way to say that floating point math can be tricky.

Related posts:

Floating point numbers are a leaky abstraction
Avoiding overflow, underflow, and loss of precision
Just an approximation

{ 13 comments }

Software engineering and alarm clocks

by John on October 30, 2011

This morning at church a woman said she was running late because of a software issue. Her alarm clock was manufactured before the US changed the end date of daylight saving time. Her clock “fell back” an hour because daylight saving time would have ended today had the law not changed.

Here are a few thoughts about what went wrong and how it might have been prevented.

  • Laws have unforeseen consequences. When the change was being debated, I doubt many asked about the impact on alarm clocks and other devices with embedded software.
  • The clock tried to be helpful by automating the time change. It would have been better had it done nothing. Moderately smart software is often worse than no software.
  • Should the clock have been designed to check for software updates? What would it have done to the cost to turn a simple clock into a computer with a network connection?
  • The clock could depend on a radio signal for time. Some do, and they’re very accurate. But they’re also more expensive.
  • Should we get rid of daylight saving time? It made more sense when nearly everyone had a 9:00 to 5:00 work schedule. But now that so different people work shifts or have flexible schedules, it doesn’t seem to add as much value.

Related post:

Universal time

{ 9 comments }

Python is a voluntary language

by John on October 26, 2011

People who write Python choose to write Python.

I don’t hear people say “I use Python at work because I have to, but I’d rather be writing Java.” But often I do hear people say they’d like to use Python if their job would allow it. There must be someone out there writing Python who would rather not, but I think that’s more common with other languages.

My point isn’t that everyone loves Python, but rather that those who don’t care for Python simply don’t write it.

Since Python isn’t a common choice for enterprise software projects, it can resist the pressure to be all things to all people. Having a “Benevolent Dictator for Life” also helps Python maintain conceptual integrity. Python is popular enough to have a critical mass of users, but not so popular that it is under pressure to lose its uniqueness.

I don’t know much about the Ruby world, but I wonder whether the increasing popularity of Ruby for web development has created pressure for Ruby to compromise its original philosophy. And I wonder whether Ruby’s creator Yukihiro Matsumoto has “dictatorial” control over his language analogous to the control Guido van Rossum has over Python.

Related posts:

Plain Python
Ruby, Python, and science

{ 44 comments }

John McCarthy and the origin of Lisp

by John on October 24, 2011

As I write this, word has it that John McCarthy passed away yesterday. Tech Crunch is reporting this as fact, citing Hacker News, which in turn cites a single tweet as the ultimate source. So the only authority we have, for now, is one person on Twitter, and we don’t know what relation she has to McCarthy.

[Update: More recent comments on Hacker News corroborate the story. Also, the twitterer cited above, Wendy Grossman, said McCarthy's daughter called her.]

I also have an unsubstantiated story about John McCarthy. I believe I read the following some time ago, but I cannot remember where. If you know of a reference, please let me know. [Update 2: Thanks to Leandro Penz for leaving a link to this article by Paul Graham in the comments below.]

As I recall, McCarthy invented Lisp to be a purely theoretical language, something akin to lambda calculus. When his graduate student Steve Russell spoke of implementing Lisp, McCarthy objected that he didn’t intend Lisp to actually run on a physical computer. Russell then implemented a Lisp interpreter and showed it to McCarthy.

Steve Russell is an unsung hero who deserves some of the credit for Lisp being an actual programming language and not merely a theoretical construct. This does not diminish McCarthy’s achievement, but it does mean that someone else also deserves recognition.

Related posts:

Lisp and the anti-Lisp
Bumblebee software
The myth of the Lisp genius

{ 9 comments }

Why does software have to be maintained?

by John on October 21, 2011

The idea of software maintenance sounds absurd. Why do you have to maintain software? Do the bits try to sneak off the disk so that someone has to put them back?

Software doesn’t change, but the world changes out from under it.

  • People discover bugs. This does not change the software but rather our knowledge of the software.
  • As people use the software, they get new ideas regarding how they want to use it.
  • The human environment around the software changes. Organizational priorities change. Laws change. Project sponsors and users turn over.
  • The technological environment of the software changes. Operating systems, networks, and hardware all change.
  • New possibilities emerge and make us less content with old possibilities.

People often perceive these changes as changes to the software, like someone standing on a dock, eyes fixed on a ship, who feels the dock is moving. We speak of software as if it were some mechanical think that physically wears out. Of course it isn’t, but the effect may be the same.

Related post:

Maintenance costs
Taking your code for a walk
Software sins of omission

{ 13 comments }

Software knowledge shelf life

by John on October 18, 2011

In my experience, software knowledge has a longer useful shelf life in the Unix world than in the Microsoft world. (In this post Unix is a shorthand for Unix and Linux.)

A pro-Microsoft explanation would say that Microsoft is more progressive, always improving their APIs and tools, and that Unix is stagnant.

A pro-Unix explanation would say that Unix got a lot of things right the first time, that it is more stable, and that Microsoft’s technology turn-over is more churn than progress.

Pick your explanation. But for better or worse, change comes slower on the Unix side. And when it comes, it’s less disruptive.

At least that’s how it seems to me. Although I’ve used Windows and Unix, I’ve done different kinds of work on the two platforms. Maybe the pace of change relates more to the task than the operating system. Also, I have more experience with Windows and so perhaps I’m more aware of the changes there. But most of the things I knew about Unix 20 years ago are still useful, and most the things I knew about Windows 10 years ago are not.

Related posts:

Programmers without computers
Where the Unix philosophy breaks down
Software development and the myth of progress

{ 19 comments }