PowerShell browser toolbar

Posted on 15 April 2009 by John

Shay Levy created an amazing browser toolbar for PowerShell. The toolbar works with IE and Firefox. It updates itself using data that Shay maintains. It lets you do Google searches tailored to PowerShell sites, lists popular PowerShell blogs, and has a menu for a wide variety of PowerShell resources. Shay released his toolbar back in June 2008, so this is old news to people more in the know than I am, but I just found out about it yesterday.

I’m not a big fan of toolbars. I installed the toolbar but will keep it hidden most of the time. But I’m glad I installed it just to see Shay’s list of resources.

Shay created his toolbar using Conduit, something else I’d not heard of. Looks like Conduit makes it easy to create other similar toolbars. (Easy in the sense of programming effort; I’m sure a lot of work goes into keeping the resource lists up to date once you’ve created the toolbar.)

Free PowerShell booklet: What I wish I’d known up front

Posted on 14 April 2009 by John

I’ve written a small booklet, 10 pages, of things I wish someone had told me when I first started using Windows PowerShell.

Download here: PowerShell Day 1

Albrecht Dürer’s art and math

Posted on 10 April 2009 by John

Richard Elwes has a fun blog post this morning: Dürer, rhinos, and snowflakes. The post is primarily about the art and mathematics of Albrecht Dürer (1471-1528) but also includes some related links to recent writings, such as Michael Croucher’s blog post on snowflakes.

Durer's rhino

Programming language fatigue

Posted on 10 April 2009 by John

Joe Brinkman wrote an insightful article the other day, Ployglot Programming: Death By A Thousand DSLs. Here’s an excerpt:

I don’t know about other programmers, but I am drowning in DSLs [domain specific languages]. It is hard enough keeping up with my primary development language and the associated platform APIs, but these DSLs are going to be the death of me. The end result is that I have a pretty decent handle on maybe 3 or 4 of these DSLs but rarely do I have the requisite knowledge to make the right choices in anything beyond that.

It takes a dozen programming languages to do any web project these days. Whenever I bring this up in conversation, most developers say “Oh, well. That’s just the way it is. It isn’t so bad.” But I think it really is a problem. Obviously it’s intimidating amount of material for new developers to learn. But the more subtle problem is that experienced developers who think they understand all the different languages they use are probably wrong.

Case in point: JavaScript. Nearly every web project involves some client-side JavaScript, and 99% of the people who write JavaScript do not know the language. I never claimed to be a JavaScript expert, but I thought I understood the language better than I really did until I saw some presentations by Douglas Crockford.

Crockford has written an excellent book: JavaScript: The Good Parts. His position is that there is an elegant, powerful language at the core of JavaScript but it is surrounded by landmines. His book focuses on the good parts, but along the way he tells you how to avoid or disarm the landmines.

Related post: Programming language subsets

Science versus medicine

Posted on 8 April 2009 by John

Before I started working for a cancer center, I was not aware of the tension between science and medicine. Popular perception is that the two go together hand and glove, but that’s not always true.

Physicians are trained to use their subjective judgment and to be decisive. And for good reason: making a fairly good decision quickly is often better than making the best decision eventually. But scientists must be tentative, withhold judgment, and follow protocols.

Sometimes physician-scientists can reconcile their two roles, but sometimes they have to choose to wear one hat or the other at different times.

The physician-scientist tension is just one facet of the constant tension between treating each patient effectively and learning how to treat future patients more effectively. Sometimes the interests of current patients and future patients coincide completely, but not always.

This ethical tension is part of what makes biostatistics a separate field of statistics. In manufacturing, for example, you don’t need to balance the interests of current light bulbs and future light bulbs. If you need to destroy 1,000 light bulbs to find out how to make better bulbs in the future, no big deal. But different rules apply when experimenting on people. Clinical trials will often use statistical designs that sacrifice some statistical power in order to protect the people participating in the trial. Ethical constraints make biostatistics interesting.

Highlights of one year ago

Posted on 7 April 2009 by John

Five of the better posts here from April 2008:

Four pillars of Bayesian statistics

Posted on 7 April 2009 by John

Anthony O’Hagan’s book Bayesian Inference lists four basic principles of Bayesian statistics at the end of the first chapter:

Prior information. Bayesian statistics provides a systematic way to incorporate what is known about parameters before an experiment is conducted. As a colleague of mine says, if you’re going to measure the distance to the moon, you know not to pick up a yard stick. You always know something before you do an experiment.
Subjective probability. Some Bayesians don’t agree with the subjective probability interpretation, but most do, in practice if not in theory. If you write down reasonable axioms for quantifying degrees of belief, you inevitably end up with Bayesian statistics.
Self-consistency. Even critics of Bayesian statistics acknowledge that Bayesian statistics has a rigorous self-consistent foundation. As O’Hagan says in his book, the difficulties with Bayesian statistics are practical, not foundational, and the practical difficulties are being resolved.
No adhockery. Bruno de Finetti coined the term “adhockery” to describe the profusion of frequentist methods. More on this below.

This year I’ve had the chance to teach a mathematical statistics class primarily focusing on frequentist methods. Teaching frequentist statistics has increased my appreciation for Bayesian statistics. In particular, I better understand the criticism of frequentist adhockery.

For example, consider point estimation. Frequentist statistics to some extent has standardized on minimum variance unbiased estimators as the gold standard. But why? And what do you do when such estimators don’t exist?

Why focus on unbiased estimators? Granted, lack of bias sounds like a good thing to have. All things being equal, it would be better to be unbiased than biased. But all things are not equal. Sometimes unbiased estimators are ridiculous. Why only consider biased vs. unbiased rather, a binary choice, rather than degree of bias, a continuous choice? Efficiency is also important, and someone may reasonably accept a small amount of bias in exchange for a large increase in efficiency.

Why minimize expected mean squared error? Efficiency in classical statistics is typically measured by expected mean squared error. But why not minimize some other measure of error? Why use an exponent of 2 and not 1, or 4, or 2.738? Or why limit yourself to power functions at all? The theory is simplest for squared error, and while this is a reasonable choice in many applications, it is still an arbitrary choice.

How much emphasis should be given to robustness? Once you consider robustness, there are infinitely many ways to compromise between efficiency and robustness.

Many frequentists are asking the same questions and are investigating alternatives. But I believe these alternatives are exactly what de Finetti had in mind: there are an infinite number of ad hoc choices you can make. Bayesian methods are criticized because prior distributions are explicitly subjective. But there are myriad subjective choices that go into frequentist statistics as well, though these choices are often implicit.

There is a great deal of latitude in Bayesian statistics as well, but the latitude is confined to fit within a universal framework: specify a likelihood and prior distribution, then update the model with data to compute the posterior distribution. There are many ways to construct a likelihood (exactly as in frequentist statistics), many ways to specify a prior, and many ways to summarize the information contained in the posterior distribution. But the basic framework is fixed. (In fact, the framework is inevitable given certain common-sense rules of inference.)

Anatomy of a floating point number

Posted on 6 April 2009 by John

In my previous post, I explained that floating point numbers are a leaky abstraction. Often you can pretend that they are mathematical real numbers, but sometimes you cannot. This post peels back the abstraction and explains exactly what a floating point number is. (Technically, this post describes an IEEE 754 double precision floating point number, by far the most common kind of floating point number in practice.)

A floating point number has 64 bits that encode a number of the form ± p × 2^e. The first bit encodes the sign, 0 for positive numbers and 1 for negative numbers. The next 11 bits encode the exponent e, and the last 52 bits encode the precision p. The encoding of the exponent and precision require some explanation.

The exponent is stored with a bias of 1023. That is, positive and negative exponents are all stored in a single positive number by storing e + 1023 rather than storing e directly. Eleven bits can represent integers from 0 up to 2047. Subtracting the bias, this corresponds to values of e from -1023 to +1024. Define e_min = -1022 and e_max = +1023. The values e_min – 1 and e_max + 1 are reserved for special use. More on that below.

Floating point numbers are typically stored in normalized form. In base 10, a number is in normalized scientific notation if the significand is ≥ 1 and < 10. For example, 3.14 × 10² is in normalized form, but 0.314 × 10³ and 31.4 × 10² are not. In general, a number in base β is in normalized form if it is of the form p × β^e where 1 ≤ p < β. This says that for binary, i.e. β = 2, the first bit of the significand of a normalized number is always 1. Since this bit never changes, it doesn’t need to be stored. Therefore we can express 53 bits of precision in 52 bits of storage. Instead of storing the significand directly, we store f, the fractional part, where the significand is of the form 1.f.

The scheme above does not explain how to store 0. Its impossible to specify values of f and e so that 1.f × 2^e = 0. The floating point format makes an exception to the rules stated above. When e = e_min – 1 and f = 0, the bits are interpreted as 0. When e = e_min – 1 and f ≠ 0, the result is a denormalized number. The bits are interpreted as 0.f × 2^e_min. In short, the special exponent reserved below e_min is used to represent 0 and denormalized floating point numbers.

The special exponent reserved above e_max is used to represent ∞ and NaN. If e = e_max + 1 and f = 0, the bits are interpreted as ∞. But if e = e_max + 1 and f ≠ 0, the bits are interpreted as a NaN or “not a number.” See IEEE floating point exceptions for more information about ∞ and NaN.

Since the largest exponent is 1023 and the largest significant is 1.f where f has 52 ones, the largest floating point number is 2¹⁰²³(2 – 2^-52) = 2¹⁰²⁴ – 2⁹⁷¹ ≈ 2¹⁰²⁴ ≈ 1.8 × 10³⁰⁸. In C, this constant is defined as DBL_MAX, defined in <float.h>.

Since the smallest exponent is -1022, the smallest positive normalized number is 1.0 × 2^-1022 ≈ 2.2 × 10^-308. In C, this is defined as DBL_MIN. However, it is not the smallest positive number representable as a floating point number, only the smallest normalized floating point number. Smaller numbers can be expressed in denormalized form, albeit at a loss of significance. The smallest denormalized positive number occurs with f has 51 0’s followed by a single 1. This corresponds to 2^-52*2^-1022 = 2^-1074 ≈ 4.9 × 10^-324. Attempts to represent any smaller number must underflow to zero.

C gives the name DBL_EPSILON to the smallest positive number ε such that 1 + ε ≠ 1 to machine precision. Since the significant has 52 bits, it’s clear that DBL_EPSILON = 2^-52 ≈ 2.2 × 10^-16. That is why we say a floating point number has between 15 and 16 significant (decimal) figures.

For more details see What Every Computer Scientist Should Know About Floating-Point Arithmetic.

First post in this series: Floating point numbers are a leaky abstraction

Floating point numbers are a leaky abstraction

Posted on 6 April 2009 by John

Joel Spolsky coined the term leaky abstraction for programming concepts that usually shield you from messy details but sometimes break down. A perfect abstraction is a black box that you never have to open. A leaky abstraction is a black box that you have to open up occasionally.

Floating point numbers, the computer representations of real numbers, are leaky abstractions. They work remarkably well: you can usually pretend that a floating point type is a mathematical real number. But sometimes you can’t. The abstraction leaks, though not very often.

Most explanations I’ve heard for the limitations of machine numbers are pedantic. “There are only a finite number of floating point numbers so they can’t represent real numbers well.” That’s not much help. It doesn’t explain why floating point numbers actually do represent real numbers sufficiently well for most applications, and it doesn’t suggest where the abstraction might leak.

A standard floating point number has roughly 16 decimal places of precision and a maximum value on the order of 10³⁰⁸, a 1 followed by 308 zeros. (According to IEEE standard 754, the typical floating point implementation.)

Sixteen decimal places is a lot. Hardly any measured quantity is known to anywhere near that much precision. For example, the constant in Newton’s Law of Gravity is only known to six significant figures. The charge of an electron is known to 11 significant figures, much more precision than Newton’s gravitational constant, but still less than a floating point number. So when are 16 figures not enough? One problem area is subtraction. The other elementary operations — addition, multiplication, division — are very accurate. As long as you don’t overflow or underflow, these operations often produce results that are correct to the last bit. But subtraction can be anywhere from exact to completely inaccurate. If two numbers agree to n figures, you can lose up to n figures of precision in their subtraction. This problem can show up unexpectedly in the middle of other calculations. For an example, see this post on calculating standard deviation. See also computing derivatives, the second example in Five Tips for Floating Point Programming.

What about overflow or underflow? When do you need numbers bigger than 10³⁰⁸? Often you don’t. But in probability calculations, for example, you need them all the time unless you’re clever. It’s common in probability to compute a medium-sized number that is the product of an astronomically large number and an infinitesimally small number. The final result fits into a computer just fine, but the intermediate numbers might not due to overflow or underflow. For example, the maximum floating point number on most computers is somewhere between 170 factorial and 171 factorial. Such large factorials often appear in applications, often in ratios with other large factorials. See Avoiding Overflow, Underflow, and Loss of Precision for tricks on how to work with factorials that would overflow if computed directly.

Often you can afford to be blissfully ignorant of the details of floating point arithmetic, but sometimes you cannot. A great place to learn more is David Goldberg’s paper What Every Computer Scientist Should Know About Floating-Point Arithmetic.

Update: See follow-up post, Anatomy of a floating point number.

Living within chosen limits

Posted on 2 April 2009 by John

The latest EconTalk podcast is an interview with Brink Lindsey, author of The Age of Abundance. Lindsey said that in the 1980’s and 90’s we learned how to live with the freedoms gained in the 1960’s and 70’s. Many negative social indicators soared in the 60’s and 70’s: crime, divorce, drug use, abortion, etc. But during the 80’s and 90’s many of these indicators reversed direction, and Lindsey believes it is because many people have learned to replace legal and societal limits with chosen limits.

I don’t know whether I agree with Lindsey’s sweeping sociological analysis, but I do see some truth to it. I like his phrase “living within chosen limits.” I see a movement toward living within chosen limits on technology. The most obvious example may be Twitter. About 8,000,000 people at this point see some value in limiting their correspondence to 140 character messages. Some other ways I hear of people placing voluntary limits on their technology:

Unplugging from the Internet to work
Using terminal-style text editors to minimize distraction
Using browser-based applications with limited functionality to avoid installing software
Setting a five-sentence limit on email messages
Paper organizers, e.g. the Hipster PDA

I imagine the people who adopt these limitations will moderate their approach over time. Instead of unplugging from the Internet, they’ll make better use of it and become more disciplined. They may decide that some modern word processor features are worthwhile but still chose something more streamlined than Microsoft Word.

It may take a generation or more to learn how to take advantage of the new possibilities. We’re in a period of excess now, analogous to the culture of the 1960’s. It will be interesting to see what the analogy of the 80’s and 90’s will be.

Related posts from Kevin Kelly:

Related posts here:

Month: April 2009