<?xml version="1.0" encoding="UTF-8"?>
<!-- generator="wordpress/2.3.3" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>

<channel>
	<title>The Endeavour</title>
	<link>http://www.johndcook.com/blog</link>
	<description>The blog of John D. Cook</description>
	<pubDate>Thu, 03 Jul 2008 20:35:20 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.3.3</generator>
	<language>en</language>
			<item>
		<title>Flaw in Riemann hypothesis proof</title>
		<link>http://www.johndcook.com/blog/2008/07/03/flaw-in-riemann-hypothesis-proof/</link>
		<comments>http://www.johndcook.com/blog/2008/07/03/flaw-in-riemann-hypothesis-proof/#comments</comments>
		<pubDate>Thu, 03 Jul 2008 20:34:06 +0000</pubDate>
		<dc:creator>John</dc:creator>
		
		<category><![CDATA[Math]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/2008/07/03/flaw-in-riemann-hypothesis-proof/</guid>
		<description><![CDATA[It appears that someone has found a flaw in Xian-Jin Li&#8217;s proposed proof of the Riemann hypothesis according to the Not Even Wrong blog. (Hat tip: Ars Mathematica)
This doesn&#8217;t mean that all is lost. Andrew Wiles&#8217; first attempt at proving Fermat&#8217;s Last Theorem was flawed, but he fixed it. Perhaps Li can patch his proof. [...]]]></description>
			<content:encoded><![CDATA[<p>It appears that someone has found a flaw in Xian-Jin Li&#8217;s proposed proof of the Riemann hypothesis according to the <a href="http://www.math.columbia.edu/~woit/wordpress/?p=707">Not Even Wrong blog</a>. (Hat tip: <a href="http://www.arsmathematica.net/archives/2008/07/03/lis-preprint/">Ars Mathematica</a>)</p>
<p>This doesn&#8217;t mean that all is lost. Andrew Wiles&#8217; first attempt at proving Fermat&#8217;s Last Theorem was flawed, but he fixed it. Perhaps Li can patch his proof. If not, he may be able to salvage a proof of something valuable short of the Riemann hypothesis.</p>
<p>The news of Li&#8217;s <a href="http://www.amazon.com/gp/product/0521290384/002-0840105-5765668?ie=UTF8&amp;tag=theende-20&amp;linkCode=xm2&amp;camp=1789&amp;creativeASIN=0521290384">proof and its refutation</a> underscores a point I made in an earlier post about <a href="http://www.johndcook.com/blog/2008/02/06/proofs-of-false-statements/">proofs of false statements</a>. Namely, &#8220;&#8230; in mainstream areas of math, blunders are usually uncovered very quickly.&#8221; The Riemann hypothesis is very much in the mainstream, and it looks like a blunder was uncovered within 24 hours.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2008/07/03/flaw-in-riemann-hypothesis-proof/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Quantity and quality</title>
		<link>http://www.johndcook.com/blog/2008/07/03/quantity-and-quality/</link>
		<comments>http://www.johndcook.com/blog/2008/07/03/quantity-and-quality/#comments</comments>
		<pubDate>Thu, 03 Jul 2008 12:02:20 +0000</pubDate>
		<dc:creator>John</dc:creator>
		
		<category><![CDATA[Creativity]]></category>

		<category><![CDATA[Quality]]></category>

		<category><![CDATA[Quotes]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/2008/07/03/quantity-and-quality/</guid>
		<description><![CDATA[Here&#8217;s a quote from a recent blog post from Tom Peters:
You will be remembered in the long haul for the quality of your work, not the quantity of your work—the quantity part is just your defective ego talking—no one evaluates Picasso based on the number of paintings he churned out.
]]></description>
			<content:encoded><![CDATA[<p>Here&#8217;s a quote from a recent <a href="http://www.tompeters.com/entries.php?rss=1&amp;note=http://www.tompeters.com/blogs/main/010499.php">blog post</a> from Tom Peters:</p>
<blockquote><p>You will be remembered in the long haul for the <em>quality</em> of your work, not the quantity of your work—the quantity part is just your defective ego talking—no one evaluates Picasso based on the number of paintings he churned out.</p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2008/07/03/quantity-and-quality/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Riemann hypothesis proof?</title>
		<link>http://www.johndcook.com/blog/2008/07/02/riemann-hypothesis-proof/</link>
		<comments>http://www.johndcook.com/blog/2008/07/02/riemann-hypothesis-proof/#comments</comments>
		<pubDate>Thu, 03 Jul 2008 01:41:23 +0000</pubDate>
		<dc:creator>John</dc:creator>
		
		<category><![CDATA[Math]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/2008/07/02/riemann-hypothesis-proof/</guid>
		<description><![CDATA[Xian-Jin Li claims to have proven the Riemann hypothesis, one of the most famous open problems in math. His paper is posted arXiv.
The Riemann hypothesis is no obscure conjecture. It&#8217;s a natural question and central to number theory. For years mathematicians have been proving theorems of the form &#8220;If the Riemann hypothesis is true, then &#8230;&#8221; and [...]]]></description>
			<content:encoded><![CDATA[<p>Xian-Jin Li claims to have proven the <a href="http://www.claymath.org/millennium/Riemann_Hypothesis/">Riemann hypothesis</a>, one of the most famous open problems in math. His paper is posted <a href="http://arxiv.org/abs/0807.0090">arXiv</a>.</p>
<p>The Riemann hypothesis is no obscure conjecture. It&#8217;s a natural question and central to number theory. For years mathematicians have been proving theorems of the form &#8220;If the Riemann hypothesis is true, then &#8230;&#8221; and so if Li&#8217;s result is correct, many other new results follow. Also, if the proof holds up, Li wins a $1,000,000 prize from the Clay Institute for solving one of the Millennium Problems.</p>
<p>Hat tip: <span class="fn"><a href="http://godplaysdice.blogspot.com/2008/07/lis-proof-of-riemann.html">Isabel Lugo</a></span></p>
<p><strong>Update</strong>: Looks like there&#8217;s <a href="http://www.johndcook.com/blog/2008/07/03/flaw-in-riemann-hypothesis-proof/">a flaw in the proof</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2008/07/02/riemann-hypothesis-proof/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Team moon</title>
		<link>http://www.johndcook.com/blog/2008/07/02/team-moon/</link>
		<comments>http://www.johndcook.com/blog/2008/07/02/team-moon/#comments</comments>
		<pubDate>Wed, 02 Jul 2008 13:01:02 +0000</pubDate>
		<dc:creator>John</dc:creator>
		
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/2008/07/02/team-moon/</guid>
		<description><![CDATA[I ran across the book Team Moon by Catherine Thimmesh when I took my kids to the library. The book&#8217;s subtitle is &#8220;How 400,000 People Landed Apollo 11 on the Moon.&#8221; This children&#8217;s book focuses on the thousands of people who worked behind the scenes of Apollo 11. It highlights some of the things that went wrong or [...]]]></description>
			<content:encoded><![CDATA[<p>I ran across the book <a href="http://www.amazon.com/gp/product/0618507574/104-4299381-6943124?ie=UTF8&amp;tag=theende-20&amp;linkCode=xm2&amp;camp=1789&amp;creativeASIN=0618507574">Team Moon</a> by Catherine Thimmesh when I took my kids to the library. The book&#8217;s subtitle is &#8220;How 400,000 People Landed Apollo 11 on the Moon.&#8221; This children&#8217;s book focuses on the thousands of people who worked behind the scenes of Apollo 11. It highlights some of the things that went wrong or could have gone wrong. One of the early pages of the book quotes the speech that was prepared for President Nixon to read if the mission had failed.</p>
<blockquote><p>Fate has ordained that the men who went to the moon to explore in peace will stay on the moon to rest in peace &#8230; These brave men, Neil Armstrong and [Buzz] Aldrin, know that there is no hope for their recovery. But they also know that there is hope for mankind in their sacrifice.</p></blockquote>
<p>Grim words for a children&#8217;s book. And yet without some explanation of the dangers they faced, it&#8217;s impossible to appreciate the astronauts&#8217; bravery. When I was a child, I was puzzled by talk of brave astronauts. In my mind, astronauts simply got on board a rocket the same way I got in a car. What was brave about that? It didn&#8217;t occur to me that they might not return safely.</p>
<p>Team Moon reminded me of <a href="http://www.amazon.com/gp/product/0684826976/104-4299381-6943124?ie=UTF8&amp;tag=theende-20&amp;linkCode=xm2&amp;camp=1789&amp;creativeASIN=0684826976">Undaunted Courage</a> by Stephen Ambrose. The image of the Lewis and Clark expedition I had from childhood was about as naive as my image of astronauts. I pictured a couple men with coonskin hats in a canoe going on a little trip, not 33 soldiers on a three-year mission. (The name &#8220;Lewis and Clark&#8221; doesn&#8217;t help, implying that they <em>were</em> the expedition rather than the <em>leaders</em> of the expedition.) I didn&#8217;t appreciate the scope or danger of the voyage until I read Ambrose&#8217;s book as an adult. I hope someone writes a children&#8217;s book in the style of Team Moon about the expedition if there&#8217;s not already such a book.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2008/07/02/team-moon/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Distributions in Mathematica and R/S-PLUS</title>
		<link>http://www.johndcook.com/blog/2008/07/01/distributions-in-mathematica-and-rs-plus/</link>
		<comments>http://www.johndcook.com/blog/2008/07/01/distributions-in-mathematica-and-rs-plus/#comments</comments>
		<pubDate>Wed, 02 Jul 2008 04:06:54 +0000</pubDate>
		<dc:creator>John</dc:creator>
		
		<category><![CDATA[Computing]]></category>

		<category><![CDATA[Mathematica]]></category>

		<category><![CDATA[Probability and Statistics]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/2008/07/01/distributions-in-mathematica-and-rs-plus/</guid>
		<description><![CDATA[I posted some notes this evening on working with probability distributions in Mathematica and R/S-PLUS.
I much prefer Mathematica&#8217;s syntax. The first time I had to read some R code I ran across a statement something like runif(1, 3, 4). I thought it was some sort of conditional executation statement: run something if some condition holds. No, [...]]]></description>
			<content:encoded><![CDATA[<p>I posted some notes this evening on working with probability distributions in <a href="http://www.johndcook.com/distributions_Mathematica.html">Mathematica</a> and <a href="http://www.johndcook.com/distributions_R_SPLUS.html">R/S-PLUS</a>.</p>
<p>I much prefer Mathematica&#8217;s syntax. The first time I had to read some R code I ran across a statement something like <code>runif(1, 3, 4)</code>. I thought it was some sort of conditional executation statement: run something if some condition holds. No, the code generates a random value uniformly from the interval (3, 4). The corresponding Mathematica syntax is <code>Random[ UniformDistribution[3,4] ]</code>.</p>
<p>Another example. The statement <code>pnorm(x, m, s)</code> in R corresponds to <code>PDF[ NormalDistribution[m, s], x ]</code> in Mathematica. Both evaluate the PDF of a normal random variable with mean m and standard deviation s at the point x.</p>
<p>It&#8217;s a matter of taste. Some people prefer terse notation, especially for things they use frequently. I&#8217;d rather type more and remember less.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2008/07/01/distributions-in-mathematica-and-rs-plus/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Languages that are easy to pick back up</title>
		<link>http://www.johndcook.com/blog/2008/07/01/languages-that-are-easy-to-pick-back-up/</link>
		<comments>http://www.johndcook.com/blog/2008/07/01/languages-that-are-easy-to-pick-back-up/#comments</comments>
		<pubDate>Tue, 01 Jul 2008 15:34:45 +0000</pubDate>
		<dc:creator>John</dc:creator>
		
		<category><![CDATA[Software development]]></category>

		<category><![CDATA[LaTeX]]></category>

		<category><![CDATA[Mathematica]]></category>

		<category><![CDATA[Perl]]></category>

		<category><![CDATA[Programming]]></category>

		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/2008/07/01/languages-that-are-easy-to-pick-back-up/</guid>
		<description><![CDATA[Some programming languages are much easier to come back to than others. In my previous post I mentioned that Mathematica is easy to come back to, put Perl is not.
I found it easy to come back LaTeX after not using it for a while. It has a few quirks, but it&#8217;s basically consistent. The LaTeX [...]]]></description>
			<content:encoded><![CDATA[<p>Some programming languages are much easier to come back to than others. In my previous post I mentioned that <a href="http://www.johndcook.com/blog/2008/07/01/mathematica-turns-20/">Mathematica</a> is easy to come back to, put <a href="http://www.johndcook.com/blog/2008/01/31/three-hour-a-week-language/">Perl</a> is not.</p>
<p>I found it easy to come back LaTeX after not using it for a while. It has a few quirks, but it&#8217;s basically consistent. The LaTeX commands for <a href="http://www.johndcook.com/blog/2008/06/14/greek-letters-and-math-symbols-in-xhtml/">Greek letters</a> are their names, lower case names for lower case letters, upper case names for upper case letters. The command mathematical symbols is usually the name a mathematician would give the symbol. Modes always begin with <code>\begin</code> and end with <code>\end</code>.</p>
<p>Python also has a consistent syntax that make it easier to come back to the language after a break. Someone has said that Python is similar to Perl, except that the word &#8220;except&#8221; does not appear nearly so often in the Python documentation.</p>
<p>It&#8217;s more important that a language be internally consistent than conventional. Each of the languages I mentioned have their peculiarities. Mathematica uses square brackets for function argument arguments. LaTeX uses percent signs for comments. Python uses indention to denote blocks. Each of these take a little getting used to, but each makes sense in its own context.</p>
<p>A special case of consistency is using full names for keywords. Mathematica always spells out words in full. For example, the gamma distribution object is named <code>GammaDistribution</code>. I don&#8217;t mind a little extra typing. I&#8217;d rather optimize for recall and readability than minimize keystrokes since I spend more time recalling and reading than typing. (One flaw in LaTeX is that it occasionally uses unnecessary abbreviations. For example, <code>\infty</code> for infinity. The corresponding Mathematica keyword is <code>Infinity</code>.)</p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2008/07/01/languages-that-are-easy-to-pick-back-up/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Mathematica turns 20</title>
		<link>http://www.johndcook.com/blog/2008/07/01/mathematica-turns-20/</link>
		<comments>http://www.johndcook.com/blog/2008/07/01/mathematica-turns-20/#comments</comments>
		<pubDate>Tue, 01 Jul 2008 13:51:47 +0000</pubDate>
		<dc:creator>John</dc:creator>
		
		<category><![CDATA[Computing]]></category>

		<category><![CDATA[Mathematica]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/2008/07/01/mathematica-turns-20/</guid>
		<description><![CDATA[Mathematica was first released June 23, 1988. I started using Mathematica not long after it came out and used it for a few years. Then for several years after that I didn&#8217;t touch it. When I began using Mathematica again several years after that, like Rip Van Winkle, I&#8217;d find many things had changed while [...]]]></description>
			<content:encoded><![CDATA[<p>Mathematica was first released June 23, 1988. I started using Mathematica not long after it came out and used it for a few years. Then for several years after that I didn&#8217;t touch it. When I began using Mathematica again several years after that, like Rip Van Winkle, I&#8217;d find many things had changed while I was gone. Instead, I was pleasantly surprised how easy it was to start using it again.</p>
<p>Mathematica syntax is simple, consistent, and predictable. They got this right twenty years ago and stuck to it. They&#8217;ve managed to grow over the years without alienating users, even those of us who take a long hiatus from using the product. I&#8217;ve used Mathematica more or less regularly over the last few years, but I&#8217;ll still go for weeks at a time without using it. It&#8217;s easy to pick up every time I return to it. (The opposite of <a href="http://www.johndcook.com/blog/2008/01/31/three-hour-a-week-language/">my experience with Perl</a>.)</p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2008/07/01/mathematica-turns-20/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Aging with grace</title>
		<link>http://www.johndcook.com/blog/2008/06/29/aging-with-grace/</link>
		<comments>http://www.johndcook.com/blog/2008/06/29/aging-with-grace/#comments</comments>
		<pubDate>Mon, 30 Jun 2008 04:17:48 +0000</pubDate>
		<dc:creator>John</dc:creator>
		
		<category><![CDATA[Science]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/2008/06/29/aging-with-grace/</guid>
		<description><![CDATA[Bill&#8217;s comment on my previous post reminded me of a book I read a few years ago, Aging With Grace by David Snowdon. The author describes what he learned about aging and especially about Alzheimer&#8217;s disease by studying a community of nuns. (Nuns make ideal subjects for epidemiological studies. They have very similar lifestyles, and [...]]]></description>
			<content:encoded><![CDATA[<p>Bill&#8217;s comment on my <a href="http://www.johndcook.com/blog/2008/06/28/brain-plasticity/">previous post</a> reminded me of a book I read a few years ago, <a href="http://www.amazon.com/gp/product/0553380923/104-4299381-6943124?ie=UTF8&amp;tag=theende-20&amp;linkCode=xm2&amp;camp=1789&amp;creativeASIN=0553380923">Aging With Grace</a> by David Snowdon. The author describes what he learned about aging and especially about Alzheimer&#8217;s disease by studying a community of nuns. (Nuns make ideal subjects for epidemiological studies. They have very similar lifestyles, and so a number of confounding variables are reduced. Also, nuns keep excellent records.) The book is a pleasant mixture of science and human interest stories.</p>
<p>Snowdon says in his book that it is nearly impossible to accurately diagnose the extent of Alzheimer&#8217;s disease in a patient without an autopsy. Some nuns in the study who were believed to have advanced Alzheimer&#8217;s disease in fact did not. Others who were mentally sharp until they died were discovered on autopsy to have suffered extensive damage from the disease. (Snowdon tells the story of one nun in particular who was believed to be senile but who was actually quite witty. She was hard of hearing and reluctant to talk. Few people had the patience to carry on a conversation with her, but Snowdon drew her out.)</p>
<p>Nuns who had greater vocabulary and verbal skill earlier in their lives (as measured by essays the nuns wrote upon entering their order) and those who remained mentally active (for example, those who were teachers) fared better as they aged. They may have had more redundant mental pathways so that as Alzheimer&#8217;s disease knocked out pathways at random, enough pathways survived to allow these women to communicate well.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2008/06/29/aging-with-grace/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Brain plasticity</title>
		<link>http://www.johndcook.com/blog/2008/06/28/brain-plasticity/</link>
		<comments>http://www.johndcook.com/blog/2008/06/28/brain-plasticity/#comments</comments>
		<pubDate>Sun, 29 Jun 2008 03:25:02 +0000</pubDate>
		<dc:creator>John</dc:creator>
		
		<category><![CDATA[Science]]></category>

		<category><![CDATA[Neuroplasticity]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/2008/06/28/brain-plasticity/</guid>
		<description><![CDATA[Today&#8217;s Big Ideas podcast carried a lecture by Norman Doidge on neuroplasticity, the recently-discovered ability of the brain to rewire itself. Doidge relates several amazing stories of people who have recovered from severe strokes or other brain injuries by developing detours around the damaged areas. Hearing of people who have had the persistence to re-learn how to use an [...]]]></description>
			<content:encoded><![CDATA[<p>Today&#8217;s Big Ideas podcast carried a lecture by Norman Doidge on neuroplasticity, the recently-discovered ability of the brain to rewire itself. Doidge relates several amazing stories of people who have recovered from severe strokes or other brain injuries by developing detours around the damaged areas. Hearing of people who have had the persistence to re-learn how to use an arm or leg inspires me to not give up so easily when I face comparatively trivial challenges.</p>
<p>Doidge gives several explanations for why it has taken so long to discover neuroplasticity. Until very recently, scientific orthodoxy has held that neuroplasticity is impossible. Patients were told they&#8217;d never be able, for example, to use their left arm again. This became a self-fulfilling prognosis as most patients would not work to do that they were told would be impossible. But what about patients who ignored medical advice and <em>were</em> able to recover lost functionality? Why did that not persuade scientists that neuroplasticity was possible? The patient&#8217;s recovery was interpreted as evidence that the brain damage must not have been as extensive as initially believed, since the alternative was known to be impossible.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2008/06/28/brain-plasticity/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Tips for using regular expressions</title>
		<link>http://www.johndcook.com/blog/2008/06/27/tips-for-using-regular-expressions/</link>
		<comments>http://www.johndcook.com/blog/2008/06/27/tips-for-using-regular-expressions/#comments</comments>
		<pubDate>Fri, 27 Jun 2008 19:40:02 +0000</pubDate>
		<dc:creator>John</dc:creator>
		
		<category><![CDATA[Software development]]></category>

		<category><![CDATA[Regular expressions]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/2008/06/27/tips-for-using-regular-expressions/</guid>
		<description><![CDATA[Jeff Atwood just posted a good article on regular expressions. Not the syntax of regular expressions but rather the strategy of when and how to use them.
]]></description>
			<content:encoded><![CDATA[<p>Jeff Atwood just posted a <a href="http://www.codinghorror.com/blog/archives/001016.html">good article on regular expressions</a>. Not the syntax of regular expressions but rather the strategy of when and how to use them.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2008/06/27/tips-for-using-regular-expressions/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Wine, Beer, and Statistics</title>
		<link>http://www.johndcook.com/blog/2008/06/27/wine-beer-and-statistics/</link>
		<comments>http://www.johndcook.com/blog/2008/06/27/wine-beer-and-statistics/#comments</comments>
		<pubDate>Fri, 27 Jun 2008 12:37:48 +0000</pubDate>
		<dc:creator>John</dc:creator>
		
		<category><![CDATA[Statistics]]></category>

		<category><![CDATA[Probability and Statistics]]></category>

		<category><![CDATA[Quality]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/2008/06/27/wine-beer-and-statistics/</guid>
		<description><![CDATA[William Gosset discovered the t-distribution while working for the Guinness brewing company. Because his employer prevented employees from publishing papers, Gosset published his research under the pseudonym Student. That&#8217;s why his distribution is often called Student&#8217;s t-distribution.
This story is fairly well know. It often appears in the footnotes of statistics textbooks. However, I don&#8217;t think many people realize [...]]]></description>
			<content:encoded><![CDATA[<p><a rel="nofollow" href="http://en.wikipedia.org/wiki/William_Sealy_Gosset">William Gosset</a> discovered the t-distribution while working for the <a rel="nofollow" href="http://www.guinness.com/">Guinness</a> brewing company. Because his employer prevented employees from publishing papers, Gosset published his research under the pseudonym <em>Student</em>. That&#8217;s why his distribution is often called Student&#8217;s t-distribution.</p>
<p>This story is fairly well know. It often appears in the footnotes of statistics textbooks. However, I don&#8217;t think many people realize why it&#8217;s not surprising that fundamental statistical research should come from a brewery, and why we don&#8217;t hear of statistical research coming out of wineries.</p>
<p>Beer makers pride themselves on consistency while wine makers pride themselves on variety. That&#8217;s why you&#8217;ll never hear beer fans talk about a &#8220;good year&#8221; the way wine connoisseurs do. Because they value consistency, beer makers invest more in extensive statistical quality control than wine makers do.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2008/06/27/wine-beer-and-statistics/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Two definitions of expectation</title>
		<link>http://www.johndcook.com/blog/2008/06/27/two-definitions-of-expectation/</link>
		<comments>http://www.johndcook.com/blog/2008/06/27/two-definitions-of-expectation/#comments</comments>
		<pubDate>Fri, 27 Jun 2008 12:37:22 +0000</pubDate>
		<dc:creator>John</dc:creator>
		
		<category><![CDATA[Math]]></category>

		<category><![CDATA[Probability and Statistics]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/2008/06/27/two-definitions-of-expectation/</guid>
		<description><![CDATA[In an introductory probability class, the expected value of a random variable X is defined as

where fXis the probability density function of X. I&#8217;ll call this the analytical definition.
In a more advanced class the expected value of X is defined as

where (Ω, P) is a probability space. I&#8217;ll call this the measure-theoretic definition. It&#8217;s not [...]]]></description>
			<content:encoded><![CDATA[<p>In an introductory probability class, the expected value of a random variable X is defined as</p>
<p style="text-align: center"><img border="0" src="http://www.johndcook.com/expect1.gif" alt="E(X) = \int_{-\infty}^\infty x\, f_X(x) \,dx" height="38" /></p>
<p>where f<sub>X</sub>is the probability density function of X. I&#8217;ll call this the <strong>analytical</strong> definition.</p>
<p>In a more advanced class the expected value of X is defined as</p>
<p style="text-align: center"><img border="0" src="http://www.johndcook.com/expect2.gif" alt="E(X) = \int_\Omega X \,dP" height="38" /></p>
<p>where (Ω, P) is a probability space. I&#8217;ll call this the <strong>measure-theoretic</strong> definition. It&#8217;s not obvious that these two definitions are equivalent. They may even seem contradictory unless you look closely: they&#8217;re integrating different functions over different spaces.</p>
<p>If for some odd reason you learned the measure-theoretic definition first, you could see the analytical definition as a theorem. But if, like most people, you learn the analytical definition first, the measure-theoretic version is quite mysterious. When you take an advanced course and look at the details previously swept under the rug, probability looks like an entirely different subject, unrelated to your introductory course. The definition of expectation is just one concept among many that takes some work to resolve.</p>
<p>I&#8217;ve written a couple pages of notes that bridge the gap between the two <a href="http://johndcook.com/RelatingTwoDefinitionsOfExpectation.pdf">definitions of expectation</a> and show that they are equivalent.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2008/06/27/two-definitions-of-expectation/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Why computer scientists count from zero</title>
		<link>http://www.johndcook.com/blog/2008/06/26/why-computer-scientists-count-from-zero/</link>
		<comments>http://www.johndcook.com/blog/2008/06/26/why-computer-scientists-count-from-zero/#comments</comments>
		<pubDate>Thu, 26 Jun 2008 12:35:29 +0000</pubDate>
		<dc:creator>John</dc:creator>
		
		<category><![CDATA[Software development]]></category>

		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/2008/06/26/why-computer-scientists-count-from-zero/</guid>
		<description><![CDATA[In my previous post, cohort assignments in clinical trials, I mentioned in passing how you could calculate cohort numbers from accrual numbers if the world were simpler than it really is.
Suppose you want to treat patients in groups of 3. If you count patients and cohorts starting from 1, then patients 1, 2, and 3 are [...]]]></description>
			<content:encoded><![CDATA[<p>In my previous post, <a href="http://www.johndcook.com/blog/2008/06/26/cohort-assignments-in-clinical-trials/">cohort assignments in clinical trials</a>, I mentioned in passing how you could calculate cohort numbers from accrual numbers if the world were simpler than it really is.</p>
<p>Suppose you want to treat patients in groups of 3. If you count patients and cohorts starting from 1, then patients 1, 2, and 3 are in cohort 1. Patients 4, 5, and 6 are in cohort 2. Patients 7, 8, and 9 are in cohort 3, etc. In general patient n is in cohort 1 + ⌊(n-1)/3⌋.</p>
<p>If you start counting patients and cohorts from 0, then patients 0, 1, and 2 are in cohort 0. Patients 3, 4, and 5 are in cohort 1. Patients 6, 7, and 8 are in cohort 2, etc. In general patient n is in cohort ⌊n/3⌋.</p>
<p>These kinds of calculations, common in computer science, are often simpler when you start counting from 0. If you want to divide things (patients, memory locations, etc.) into groups of size k, the nth item is in group ⌊n/k⌋. In C notation, integer division truncates to an integer and so the expression is even simpler: <code>n/k</code>.</p>
<p>Counting centuries is confusing because we count from 1. That&#8217;s why the 1900&#8217;s were the 20th century etc. If we called the century immediately following the birth of Christ the 0th century, then the 1900&#8217;s would be the 19th century.</p>
<p>Because computer scientists usually count from 0, most programming languages also count from zero. Fortran and Visual Basic are notable exceptions.</p>
<p>The vast majority of humanity finds counting from 0 unnatural and so there is a conflict between how software producers and consumers count. Demanding that average users learn to count from zero is absurd. So the programmer must either use one-based counting internally, and risk confusing his peers, or use zero-based counting internally, and risk forgetting to do a conversion for input or output. I prefer the latter. The worst option is to vascillate between the two approaches. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2008/06/26/why-computer-scientists-count-from-zero/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Cohort assignments in clinical trials</title>
		<link>http://www.johndcook.com/blog/2008/06/26/cohort-assignments-in-clinical-trials/</link>
		<comments>http://www.johndcook.com/blog/2008/06/26/cohort-assignments-in-clinical-trials/#comments</comments>
		<pubDate>Thu, 26 Jun 2008 11:41:43 +0000</pubDate>
		<dc:creator>John</dc:creator>
		
		<category><![CDATA[Clinical trials]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/2008/06/26/cohort-assignments-in-clinical-trials/</guid>
		<description><![CDATA[Cohorts are very simple in theory but messy in practice. In a clinical trial, a cohort is a group of patients who receive the same treatment. For example, in dose-finding trials, it is very common to treat patients in groups of three. I&#8217;ll stick with cohorts of three just to be concrete, though nothing here depends [...]]]></description>
			<content:encoded><![CDATA[<p>Cohorts are very simple in theory but messy in practice. In a clinical trial, a cohort is a group of patients who receive the same treatment. For example, in dose-finding trials, it is very common to treat patients in groups of three. I&#8217;ll stick with cohorts of three just to be concrete, though nothing here depends particularly on this choice of cohort size.</p>
<p>If we number patients in the order in which they arrive, patients 1, 2, and 3 would be the first cohort. Patients 4, 5, and 6 would be the second cohort, etc. If it were always that simple, we could determine which cohort a patient belongs to based on their accrual number alone. To calculate a patient&#8217;s cohort number, subtract 1 from their accrual number, divide by 3, throw away any remainder, and add 1. In math symbols, the cohort number for patient #n would be 1 + ⌊(n-1)/3⌋. (See the <a href="http://www.johndcook.com/blog/2008/06/26/why-computer-scientists-count-from-zero/">next post</a>.)</p>
<p>Here&#8217;s an example of why that won&#8217;t work. Suppose you treat patients 1, 2, and 3, then discover that patient #2 was not eligible for the trial after all. (This happens regularly.) Now a 4th patient enters the trial. What cohort are they in? If patient #4 arrived <em>after</em> you discovered that patient #2 was ineligible, you could put patient #4 in the first cohort, essentially taking patient #2&#8217;s place. But if patient #4 arrived <em>before</em> you discovered that patient #2 was ineligible, then patient #4 would receive the treatent assigned to the second cohort; the first cohort would have a hole in it and only contain two patients. You could treat patient #5 with the treatment of the first cohort to try to patch the hole, but that&#8217;s more confusing. It gets even worse if you&#8217;re on to the third or fourth cohort before discovering a gap in the first cohort.</p>
<p>In addition to patients being removed from a trial due to ineligibility, patients can remove themselves from a trial at any time.</p>
<p>There are numerous other ways the naïve view of cohorts can fail. A doctor may decide to give the same treatment to only two consecutive patients, or to four consecutive patients, letting medical judgment override the dose assignment algorithm for a particular patient. A mistake could cause a patient to receive the dose intended for another cohort. Researchers may be unable to access the software needed to make the dose assignment for a new cohort and so they give a new patient the dose from the previous cohort.</p>
<p>Cohort assignments can become so tangled that it is simply not possible to look at an ordered list of patients and their treatments after the fact and determine how the patients were grouped into cohorts. Cohort assignment is to some extent a mental construct, an expression of how the researcher thought about the patients, rather than an objective grouping.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2008/06/26/cohort-assignments-in-clinical-trials/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Monitoring legacy code that fails silently</title>
		<link>http://www.johndcook.com/blog/2008/06/24/monitoring-legacy-code-that-fails-silently/</link>
		<comments>http://www.johndcook.com/blog/2008/06/24/monitoring-legacy-code-that-fails-silently/#comments</comments>
		<pubDate>Tue, 24 Jun 2008 20:36:17 +0000</pubDate>
		<dc:creator>John</dc:creator>
		
		<category><![CDATA[Computing]]></category>

		<category><![CDATA[PowerShell]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/2008/06/24/monitoring-legacy-code-that-fails-silently/</guid>
		<description><![CDATA[Clift Norris and I just posted an article on CodeProject entitled Monitoring Unreliable Scheduled Tasks about some software Clift wrote to resolve problems we had calling some legacy software that would fail silently. His software adds from the outside monitoring and logging functions that better software would have provided on the inside.
The monitoring and logging software, called [...]]]></description>
			<content:encoded><![CDATA[<p>Clift Norris and I just posted an article on CodeProject entitled <a href="http://www.codeproject.com/KB/install/RunAndWait.aspx">Monitoring Unreliable Scheduled Tasks</a> about some software Clift wrote to resolve problems we had calling some legacy software that would fail silently. His software adds <em>from the outside</em> monitoring and logging functions that better software would have provided <em>on the inside</em>.</p>
<p>The monitoring and logging software, called <code>RunAndWait</code>, kicks off a child process and waits a specified amount of time for the process to complete. If the child does not complete in time, a list of people are notified by email. The software also checks return codes and writes all its activity to a log.</p>
<p><code>RunAndWait</code> is a simple program, but it has proven very useful over the last year and a half since it was written. We use <code>RunAndWait</code> in combination with PowerShell for scheduling our nightly processes to interact with the legacy system. Since PowerShell has verbose error reporting, calling <code>RunAndWait</code> from PowerShell rather than from <code>cmd.exe</code> gives additional protection against possible silent failures.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2008/06/24/monitoring-legacy-code-that-fails-silently/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Fasting may reduce chemotherapy side-effects</title>
		<link>http://www.johndcook.com/blog/2008/06/24/fasting-may-reduce-chemotherapy-side-effects/</link>
		<comments>http://www.johndcook.com/blog/2008/06/24/fasting-may-reduce-chemotherapy-side-effects/#comments</comments>
		<pubDate>Tue, 24 Jun 2008 11:07:38 +0000</pubDate>
		<dc:creator>John</dc:creator>
		
		<category><![CDATA[Science]]></category>

		<category><![CDATA[Cancer]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/2008/06/24/fasting-may-reduce-chemotherapy-side-effects/</guid>
		<description><![CDATA[Chemotherapy harms cancer cells as well as normal cells. Chemotherapy is designed to be more harmful to cancer cells than to normal cells, but the damage to normal cells can be brutal.
New studies suggest that fasting prior to receiving chemotherapy may reduce the number of normal cells harmed by the treatment. Fasting may put normal [...]]]></description>
			<content:encoded><![CDATA[<p>Chemotherapy harms cancer cells as well as normal cells. Chemotherapy is designed to be more harmful to cancer cells than to normal cells, but the damage to normal cells can be brutal.</p>
<p>New studies suggest that <a href="http://www.webmd.com/cancer/news/20080331/fasting_may_improve_cancer_chemotherapy">fasting prior to receiving chemotherapy</a> may reduce the number of normal cells harmed by the treatment. Fasting may put normal cells in a defensive mode that increases their resistance to chemical attack.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2008/06/24/fasting-may-reduce-chemotherapy-side-effects/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Saving energy by tolerating mistakes</title>
		<link>http://www.johndcook.com/blog/2008/06/23/saving-energy-by-tolerating-mistakes/</link>
		<comments>http://www.johndcook.com/blog/2008/06/23/saving-energy-by-tolerating-mistakes/#comments</comments>
		<pubDate>Mon, 23 Jun 2008 22:06:03 +0000</pubDate>
		<dc:creator>John</dc:creator>
		
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/2008/06/23/saving-energy-by-tolerating-mistakes/</guid>
		<description><![CDATA[Computer chips can use significantly less energy if they don&#8217;t have to be correct all the time. That&#8217;s the idea behind PCMOS — probabilistic complementary metal-oxide semiconductor technology. Here&#8217;s an excerpt from Technology Review&#8217;s article on PCMOS.
[Inventor Krishna] Palem&#8217;s idea is to lower the operating voltage of parts of a chip—specifically, the logic circuits that [...]]]></description>
			<content:encoded><![CDATA[<p>Computer chips can use significantly less energy if they don&#8217;t have to be correct all the time. That&#8217;s the idea behind PCMOS — probabilistic complementary metal-oxide semiconductor technology. Here&#8217;s an excerpt from <a href="http://www.technologyreview.com/read_article.aspx?ch=specialsections&amp;sc=emerging08&amp;id=20246&amp;a=">Technology Review&#8217;s article</a> on PCMOS.</p>
<blockquote><p>[Inventor Krishna] Palem&#8217;s idea is to lower the operating voltage of parts of a chip—specifically, the logic circuits that calculate the least significant bits, such as the <em itxtvisited="1">3</em> in the number <em itxtvisited="1">21,693</em>. The resulting decrease in signal-to-noise ratio means those circuits would occasionally arrive at the wrong answer, but engineers can calculate the probability of getting the right answer for any specific voltage. &#8220;Relaxing the probability of correctness even a little bit can produce significant savings in energy,&#8221; Palem says.</p></blockquote>
<p>In applications such as video processing, a small probability of error would not make a noticeable difference. It would an interesting exercise to separate those parts of a system that require accuracy and those that tolerate error. For example, a cell phone might use high-accuracy chips for dialing phone numbers but low-accuracy chips for controlling the display in order to extend battery life.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2008/06/23/saving-energy-by-tolerating-mistakes/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Why functional programming hasn&#8217;t taken off</title>
		<link>http://www.johndcook.com/blog/2008/06/22/why-functional-programming-hasnt-taken-off/</link>
		<comments>http://www.johndcook.com/blog/2008/06/22/why-functional-programming-hasnt-taken-off/#comments</comments>
		<pubDate>Sun, 22 Jun 2008 23:00:27 +0000</pubDate>
		<dc:creator>John</dc:creator>
		
		<category><![CDATA[Software development]]></category>

		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/2008/06/22/why-functional-programming-hasnt-taken-off/</guid>
		<description><![CDATA[Bjarne Stroustrup made a comment in an interview about functional programming. He said advocates of functional programming have been in charge of computer science departments for 30 years now, and yet functional programming has hardly been used outside academia. Maybe it&#8217;s because it&#8217;s not practical, at least in its purest form.
I&#8217;ve heard other people say [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.research.att.com/~bs/">Bjarne Stroustrup</a> made a comment in an interview about functional programming. He said advocates of functional programming have been in charge of computer science departments for 30 years now, and yet functional programming has hardly been used outside academia. <strong>Maybe it&#8217;s because it&#8217;s not practical</strong>, at least in its purest form.</p>
<p>I&#8217;ve heard other people say that functional programming is the way to go, but most programmers aren&#8217;t smart enough to work that way and its too hard for the ones who are smart enough to go against the herd. But there are too many brilliant maverick programmers out there to make such a condescending explanation plausible. Stroustrup&#8217;s explanation makes more sense.</p>
<p>Let me quickly address some objections.</p>
<ul>
<li>Yes, there have been very successful functional programming projects.</li>
<li>Yes, procedural programming languages are adding support for functional programming.</li>
<li>Yes, the rise of multi-core processors is driving the search for ways to make concurrent programming easier, and functional programming has a lot to offer.</li>
</ul>
<p>I fully expect <strong>there will be more functional programming in the future</strong>, but it will be part of a multi-paradigm approach. On the continuum between pure imperative programming and pure functional programming, development will move toward the functional end, but not all the way. A multi-paradigm approach could be a jumbled mess, but it doesn&#8217;t have to be. One could clearly delineate which parts of a code base are purely functional (say, because they need to run concurrently) and which are not (say, for efficiency). The problem of how to mix functional and procedural programming styles well seems interesting and tractable.</p>
<p>[Stroustrup&#8217;s remark came from an <a href="http://www.informit.com/podcasts/channel.aspx?c=DADF92CA-3BDC-484E-9CD8-CBFE0CFC0DE6">OnSoftware podcast</a>. I&#8217;ve listed to several of his podcasts with OnSoftware lately but I don&#8217;t remember which one contained his comment about functional programming.]</p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2008/06/22/why-functional-programming-hasnt-taken-off/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Probability approximations</title>
		<link>http://www.johndcook.com/blog/2008/06/22/probability-approximations/</link>
		<comments>http://www.johndcook.com/blog/2008/06/22/probability-approximations/#comments</comments>
		<pubDate>Sun, 22 Jun 2008 22:33:20 +0000</pubDate>
		<dc:creator>John</dc:creator>
		
		<category><![CDATA[Math]]></category>

		<category><![CDATA[Probability and Statistics]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/2008/06/22/probability-approximations/</guid>
		<description><![CDATA[When I took my first probability course, it seemed like there were an infinite number of approximation theorems to learn, all mysterious. Looking back, there were probably only two or three, and they don&#8217;t need to be mysterious.
For example, under the right circumstances you can approximate a Binomial(n, p) well with a Normal(np, np(1-p)). While [...]]]></description>
			<content:encoded><![CDATA[<p>When I took my first probability course, it seemed like there were an infinite number of approximation theorems to learn, all mysterious. Looking back, there were probably only two or three, and they don&#8217;t need to be mysterious.</p>
<p>For example, under the right circumstances you can approximate a Binomial(n, p) well with a Normal(np, np(1-p)). While the relationship between the parameters in these two distributions is obvious to the initiated, it&#8217;s not at all obvious to a beginner. It seems much clearer to say that a Binomial can be approximated by a Normal with the same mean and variance. After all, a distribution that doesn&#8217;t get the mean and variance correct doesn&#8217;t sound like a very promising approximation.</p>
<p>Taking it a step further, a good teacher could guide a class to discover this approximation themselves. This would take more time than simply stating the result and working an example or two, but the difference in understanding would be immense. And if you&#8217;re not going to take the time to aim for understanding, what&#8217;s the point in covering approximation theorems at all? They&#8217;re not used that often for computation anymore. In my opinion, the only reason to go over them is the insight they provide.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2008/06/22/probability-approximations/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Attention span by nationality</title>
		<link>http://www.johndcook.com/blog/2008/06/20/attention-span-by-nationality/</link>
		<comments>http://www.johndcook.com/blog/2008/06/20/attention-span-by-nationality/#comments</comments>
		<pubDate>Fri, 20 Jun 2008 11:33:55 +0000</pubDate>
		<dc:creator>John</dc:creator>
		
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/2008/06/20/attention-span-by-nationality/</guid>
		<description><![CDATA[The Code Wizard blog posted some anecdotal evidence of attention span varying as a function of nationality.
The author looked through the visitor statistics on his blog and observed that Americans spend less time per page than visitors from other countries. Visitors from Canada, Australia, and England spend far more time per page and click more links [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://codewiz51.blogspot.com/2008/06/americans-add-and-google-effect.html">The Code Wizard blog</a> posted some anecdotal evidence of attention span varying as a function of nationality.</p>
<p>The author looked through the visitor statistics on his blog and observed that Americans spend less time per page than visitors from other countries. Visitors from Canada, Australia, and England spend far more time per page and click more links while they&#8217;re there. I&#8217;m not aware of anything in the content of the blog that would be intrinsically more interesting to folks outside the US.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2008/06/20/attention-span-by-nationality/feed/</wfw:commentRss>
		</item>
		<item>
		<title>What makes the Mentos-Diet Coke trick work</title>
		<link>http://www.johndcook.com/blog/2008/06/19/what-makes-the-mentos-diet-coke-trick-work/</link>
		<comments>http://www.johndcook.com/blog/2008/06/19/what-makes-the-mentos-diet-coke-trick-work/#comments</comments>
		<pubDate>Thu, 19 Jun 2008 16:37:41 +0000</pubDate>
		<dc:creator>John</dc:creator>
		
		<category><![CDATA[Science]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/2008/06/19/what-makes-the-mentos-diet-coke-trick-work/</guid>
		<description><![CDATA[The American Journal of Physics has an article in the June issue about the physics of dropping Mentos into Diet Coke. The spectacular result depends on physical characteristics of the Mentos, not their chemical composition. Here&#8217;s an explanation from the 60-Second Science podcast.
]]></description>
			<content:encoded><![CDATA[<p>The American Journal of Physics has an article in the June issue about the physics of dropping Mentos into Diet Coke. The spectacular result depends on physical characteristics of the Mentos, not their chemical composition. Here&#8217;s an explanation from the <a href="http://www.sciam.com/podcast/episode.cfm?id=A0F42D0D-E721-EABE-745EA5C44B66F777">60-Second Science</a> podcast.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2008/06/19/what-makes-the-mentos-diet-coke-trick-work/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Bugs in food and software</title>
		<link>http://www.johndcook.com/blog/2008/06/19/bugs-in-food-and-software/</link>
		<comments>http://www.johndcook.com/blog/2008/06/19/bugs-in-food-and-software/#comments</comments>
		<pubDate>Thu, 19 Jun 2008 14:20:29 +0000</pubDate>
		<dc:creator>John</dc:creator>
		
		<category><![CDATA[Software development]]></category>

		<category><![CDATA[Statistics]]></category>

		<category><![CDATA[Quality]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/2008/06/19/bugs-in-food-and-software/</guid>
		<description><![CDATA[What is an acceptable probability of finding bug parts in a box of cereal? You can&#8217;t say zero. As the acceptable probability goes to zero, the price of a box of cereal goes to infinity. In practice, the FDA sets very small but non-zero limits on the probability of finding bug parts in food. This is [...]]]></description>
			<content:encoded><![CDATA[<p>What is an acceptable probability of finding bug parts in a box of cereal? You can&#8217;t say zero. As the acceptable probability goes to zero, the price of a box of cereal goes to infinity. In practice, the FDA sets very small but non-zero limits on the probability of finding bug parts in food. This is unsettling at first, but there&#8217;s no rational way around it.</p>
<p>What is an acceptable probability of finding bugs in your software? Again, you can&#8217;t say zero. The cost increases without bound as the quality requirements increase. In my previous post, I wrote about the extraordinary quality procedures for writing <a href="http://www.johndcook.com/blog/2008/06/17/software-in-space/">software for space probes</a>. And yet even these projects have to tolerate some non-zero probability of error. It&#8217;s not worthwhile to spend 10 billion dollars to prevent a bug in a billion dollar mission.</p>
<p>Bugs are a fact of life. We can insist that they are unacceptable or we can pretend they don&#8217;t exist, but neither approach is constructive. It&#8217;s better to focus on the <strong>probability</strong> of running into bugs and <strong>consequences</strong> of running into bugs.</p>
<p>Not all bugs have the same consequences. It&#8217;s distasteful to find a piece of a roach leg in your can of green beans, but it&#8217;s not the end of the world. Toxic microscopic bugs are more serious. Along the same lines, a software bug that causes incorrect hyphenation is hardly the same as a bug that causes a plane crash. To get an idea of the potential economic cost of  running into a bug, and therefore the resources worthwhile to detect and fix it, multiply the probability by the consequences.</p>
<p>How do you estimate the probabilities of software bugs? The same way you estimate the probability of bugs in food: by conducting experiments and analyzing data. Some people find this very hard to accept. They understand that testing is necessary in the physical world, but they think software is entirely different and must be proven correct in some mathematical sense. They object that computer programs are complex systems, too complex to test. Computer programs are complex, but human bodies are far more complex, and yet we conduct tests on human subjects all the time to estimate different probabilities, such as the probabilities of drug toxicity.</p>
<p>Another objection to software testing is that it can only test paths through the software that are actually taken, not all potential paths. That&#8217;s true, but the most important data when estimating the probability of running into a bug is data from people using the software under normal conditions. A bug that you never run into has no consequences.</p>
<p>But what about people using software in unanticipated ways? I certainly find it frustrating when I uncover bugs when I use a program in an atypical way. But this is not terribly different from physical systems. Bridges may fail when they&#8217;re subject to loads they weren&#8217;t designed for. There is a difference, however. Most software is designed to permit far more uses than can be tested, whereas there&#8217;s less of a gap in physical systems between what is permissible and what is testable. Unit testing helps. If every component of a software system works correctly in isolation, it more likely, though not certain, that the components will work correctly together in a new situation. Still, there&#8217;s no getting around the fact that the best tested uses are the most likely to succeed.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2008/06/19/bugs-in-food-and-software/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Software in Space</title>
		<link>http://www.johndcook.com/blog/2008/06/17/software-in-space/</link>
		<comments>http://www.johndcook.com/blog/2008/06/17/software-in-space/#comments</comments>
		<pubDate>Wed, 18 Jun 2008 04:23:46 +0000</pubDate>
		<dc:creator>John</dc:creator>
		
		<category><![CDATA[Software development]]></category>

		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/2008/06/17/software-in-space/</guid>
		<description><![CDATA[The latest episode of Software Engineering Radio has an interview with Hans-Joachim Popp of the German aerospace company DLR. A bug in the software embedded in a space probe could cost years of lost time and billions of dollars. These folks have to write solid code.
The interview gives some details of the unusual practices DLR uses [...]]]></description>
			<content:encoded><![CDATA[<p>The <a href="http://se-radio.net/podcast/2008-06/episode-100-software-space">latest episode</a> of Software Engineering Radio has an interview with Hans-Joachim Popp of the German aerospace company <a href="http://www.dlr.de/">DLR</a>. A bug in the software embedded in a space probe could cost years of lost time and billions of dollars. These folks have to write solid code.</p>
<p>The interview gives some details of the unusual practices DLR uses to produce such high quality code. For one, Popp said that his company writes an average of 12 lines of test code for every line of production code. They also pair junior and senior developers. The <em>junior </em>developer writes all the code, and the senior developer picks it apart.</p>
<p>Such extreme attention to quality doesn&#8217;t come cheap. Popp said that they produce about 0.6 lines of (production?) code per hour.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2008/06/17/software-in-space/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Small effective sample size does not mean uninformative</title>
		<link>http://www.johndcook.com/blog/2008/06/17/small-effective-sample-size-does-not-mean-uninformative/</link>
		<comments>http://www.johndcook.com/blog/2008/06/17/small-effective-sample-size-does-not-mean-uninformative/#comments</comments>
		<pubDate>Wed, 18 Jun 2008 04:07:28 +0000</pubDate>
		<dc:creator>John</dc:creator>
		
		<category><![CDATA[Clinical trials]]></category>

		<category><![CDATA[Bayesian]]></category>

		<category><![CDATA[Probability and Statistics]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/2008/06/17/small-effective-sample-size-does-not-mean-uninformative/</guid>
		<description><![CDATA[Today I talked to a doctor about the design of a randomized clinical trial that would use a Bayesian monitoring rule. The probability of response on each arm would be modeled as a binomial with a beta prior. Simple conjugate model. The historical response rate in this disease is only 5%, and so the doctor [...]]]></description>
			<content:encoded><![CDATA[<p>Today I talked to a doctor about the design of a randomized clinical trial that would use a Bayesian monitoring rule. The probability of response on each arm would be modeled as a binomial with a beta prior. Simple conjugate model. The historical response rate in this disease is only 5%, and so the doctor had chosen a beta(0.1, 1.9) prior so that the prior mean matched the historical response rate.</p>
<p>For beta distributions, the sum of the two parameters is called the effective sample size. There is a simple and natural explanation for why a beta(a, b) distribution is said to contain as much information as a+b data observations. By this criterion, the beta(0.1, 1.9) distribution is not very informative: it only has as much influence as two observations. However, viewed in another light, a beta(0.1, 1.9) distribution is highly informative.</p>
<p>This trial was designed to stop when the posterior probability is more than  0.999 that one treatment is more effective than the other. That&#8217;s an unusually high standard of evidence for stopping a trial — a cutoff of 0.99 or smaller would be much more common — and yet a trial could stop after only six patients. If X is the probability of response on one arm and Y is the probability of response on the other, after three failures on the first treatment and three successes on the other, Pr(Y &gt; X) &gt; 0.999.</p>
<p>The explanation for how the trial could stop so early is that the prior distribution is, in an odd sense, highly informative. The trial starts with a strong assumption that each treatment is ineffective. This assumption is somewhat justified by of experience, and yet a beta(0.1, 1.9) distribution doesn&#8217;t fully capture the investigator&#8217;s prior belief.</p>
<p>(Once at least one response has been observed, the beta(0.1, 1.9) prior becomes essentially uninformative. But until then, in this context, the prior is informative.)</p>
<p>A problem with a beta prior is that there is no way to specify the mean at 0.05 without also placing a large proportion of the probability mass below 0.05. The beta prior places too little probability on better outcomes that might reasonably happen. I imagine a more diffuse prior with <em>mode</em> 0.05 rather than mean 0.05 would better describe the prior beliefs regarding the treatments.</p>
<p>The beta prior is convenient because Bayes&#8217; theorem takes a very simple form in this case: starting from a beta(a, b) prior and observing s successes and f failures, the posterior distribution is beta(a+s, b+f).  But a prior less convenient algebraically could be more <a href="http://www.johndcook.com/blog/2008/06/08/robust-priors/">robust</a> and better adept at representing prior information.</p>
<p>A more basic observation is that &#8220;informative&#8221; and &#8220;uninformative&#8221; depend on context. This is part of what motivated Jeffreys to look for prior distributions that were equally (un)informative under a set of transformations. But Jeffreys&#8217; approach isn&#8217;t the final answer. As far as I know, there&#8217;s no universally acceptable resolution to this <a href="http://www.johndcook.com/blog/2008/04/22/problems-versus-dilemmas/">dilemma</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2008/06/17/small-effective-sample-size-does-not-mean-uninformative/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Managing passwords II</title>
		<link>http://www.johndcook.com/blog/2008/06/16/managing-passwords-ii/</link>
		<comments>http://www.johndcook.com/blog/2008/06/16/managing-passwords-ii/#comments</comments>
		<pubDate>Tue, 17 Jun 2008 01:48:54 +0000</pubDate>
		<dc:creator>John</dc:creator>
		
		<category><![CDATA[Computing]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/2008/06/16/managing-passwords-ii/</guid>
		<description><![CDATA[PasswordMaker is a clever solution to the problem of managing passwords. Instead of storing passwords for each web site, you use their software to generate a unique password for each site. The idea is quite simple: use a master password and a one-way hash function to turn the URL of a site into the password [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://passwordmaker.org/">PasswordMaker</a> is a clever solution to the problem of <a href="http://www.johndcook.com/blog/2008/05/23/managing-passwords/">managing passwords</a>. Instead of <strong>storing</strong> passwords for each web site, you use their software to <strong>generate</strong> a unique password for each site. The idea is quite simple: use a master password and a one-way hash function to turn the URL of a site into the password for that site. For each site, this generates the same password each time. Each site has a different password, but you only have one password to remember.</p>
<p>You don&#8217;t have to use the URL. You can come up with any string you want to identify a context where you need a password. But the URL is a natural choice.</p>
<p>The software comes in many variations: browser-based, command line, etc.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2008/06/16/managing-passwords-ii/feed/</wfw:commentRss>
		</item>
		<item>
		<title>C++ templates may reduce memory footprint</title>
		<link>http://www.johndcook.com/blog/2008/06/16/c-templates-may-reduce-memory-footprint/</link>
		<comments>http://www.johndcook.com/blog/2008/06/16/c-templates-may-reduce-memory-footprint/#comments</comments>
		<pubDate>Mon, 16 Jun 2008 20:18:55 +0000</pubDate>
		<dc:creator>John</dc:creator>
		
		<category><![CDATA[Software development]]></category>

		<category><![CDATA[C++]]></category>

		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/2008/06/16/c-templates-may-reduce-memory-footprint/</guid>
		<description><![CDATA[One of the complaints about C++ templates is that they can cause code bloat. But Scott Meyers pointed out in an interview that some people are using templates in embedded systems applications because templates result in smaller code.
C++ compilers only generate code for template methods that are actually used in an application, so it&#8217;s possible [...]]]></description>
			<content:encoded><![CDATA[<p>One of the complaints about C++ templates is that they can cause code bloat. But Scott Meyers pointed out in an <a href="http://www.informit.com/podcasts/episode.aspx?e=2214a9d9-aacc-4f1f-a946-e9b6c94fa451">interview</a> that some people are using templates in embedded systems applications because templates result in <em>smaller</em> code.</p>
<p>C++ compilers only generate code for template methods that are actually used in an application, so it&#8217;s possible that code using templates may result in a smaller executable than code that a more traditional object oriented approach.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2008/06/16/c-templates-may-reduce-memory-footprint/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Greek letters and math symbols in (X)HTML</title>
		<link>http://www.johndcook.com/blog/2008/06/14/greek-letters-and-math-symbols-in-xhtml/</link>
		<comments>http://www.johndcook.com/blog/2008/06/14/greek-letters-and-math-symbols-in-xhtml/#comments</comments>
		<pubDate>Sat, 14 Jun 2008 22:19:37 +0000</pubDate>
		<dc:creator>John</dc:creator>
		
		<category><![CDATA[Uncategorized]]></category>

		<category><![CDATA[Typography]]></category>

		<category><![CDATA[XML]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/2008/06/14/greek-letters-and-math-symbols-in-xhtml/</guid>
		<description><![CDATA[It&#8217;s not hard to use Greek letters and math symbols in (X)HTML, but apparently it&#8217;s not common knowledge either. Many pages insert little image files every time they need a special character. Such web pages look a little like ransom notes with letters cut from multiple sources.  Sometimes this is necessary but often it can [...]]]></description>
			<content:encoded><![CDATA[<p>It&#8217;s not hard to use Greek letters and math symbols in (X)HTML, but apparently it&#8217;s not common knowledge either. Many pages insert little image files every time they need a special character. Such web pages look a little like ransom notes with letters cut from multiple sources.  Sometimes this is necessary but often it can be avoided.</p>
<p>I&#8217;ve posted a couple pages on using <a href="http://www.johndcook.com/greek_letters.html">Greek letters</a> and <a href="http://www.johndcook.com/math_symbols.html">math symbols</a> in HTML, XML, XHTML, TeX, and Unicode. I included TeX because it&#8217;s the <em>lingua franca</em> for math typography, and I included <a href="http://www.johndcook.com/unicode.html">Unicode</a> because the X(HT)ML representation of symbols is closely related to Unicode.</p>
<p>The notes give charts for encoding Greek letters and some of the most common math symbols. They explain how HTML and XHTML differ in this context and also discuss browser compatibility issues.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2008/06/14/greek-letters-and-math-symbols-in-xhtml/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Solving the problem with Visual Source Safe and time zones</title>
		<link>http://www.johndcook.com/blog/2008/06/13/solving-the-problem-with-visual-source-safe-and-time-zones/</link>
		<comments>http://www.johndcook.com/blog/2008/06/13/solving-the-problem-with-visual-source-safe-and-time-zones/#comments</comments>
		<pubDate>Fri, 13 Jun 2008 12:39:03 +0000</pubDate>
		<dc:creator>John</dc:creator>
		
		<category><![CDATA[Software development]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/2008/06/13/solving-the-problem-with-visual-source-safe-and-time-zones/</guid>
		<description><![CDATA[If you use Microsoft Visual SourceSafe (VSS) with developers in more than one time zone, you may be in for an unpleasant surprise.
VSS uses the local time on each developer’s box as the time of a check in/out. If every developer’s time is set by the same reference, say an NNTP server, then no problems [...]]]></description>
			<content:encoded><![CDATA[<p>If you use Microsoft Visual SourceSafe (VSS) with developers in more than one time zone, you may be in for an unpleasant surprise.</p>
<p>VSS uses the local time on each developer’s box as the time of a check in/out. If every developer’s time is set by the same reference, say an NNTP server, then no problems will result. But what if one developer is in a different time zone? Say one developer is in Boston and one in Houston. The Boston developer checks in a file at 2:00 PM Eastern time, then 10 minutes later the Houston developer checks out the file, quickly makes a change, and checks the file back in at 2:20 Eastern time, 1:20 Central local time. VSS now says the latest version of the file is the one made at 2:00 Eastern time. When the Houston developer looks at VSS, the Boston developer’s changes were make 40 minutes into the future!</p>
<p>This has been a problem with VSS for all versions prior to VSS 2005, and is <strong>still a problem in the most recent version by default</strong>. Starting with the 2005 version, you can configure VSS 2005 to use “server local time.” This means all transactions will use the time on the server where the VSS repository is located. The time is stored internally as UTC (GMT) but displayed to each user according to his own time zone. In the example above, the server would record the Boston check-in as 7:00 PM UTC and the Houston check-in as 7:20 PM UTC. The Boston user would see the check-ins as happening at 2:00 and 2:20 Eastern time, and the Houston user would see the check-ins as happening at 1:00 and 1:20 Central time. Importantly, everyone agrees which check-in occurred first.</p>
<p>A more subtle version of the problem can occur even if all users are in the same time zone but have not synchronized their clocks. This is a good reason to use server local time even if everyone works in the same city.</p>
<p>Although it is possible to set server local time in VSS 2005, it still uses client local time by default, presumably for backward compatibility. You have to turn on server local time by opening the VSS Administrator tool and clicking on Tools/Options and going to the Time Zone tab.</p>
<p><img border="0" width="408" src="http://www.johndcook.com/VSSOptions.png" alt="SourceSafe Options dialog for time zones" height="419" /></p>
<p>Microsoft has written about this problem at <a href="http://support.microsoft.com/kb/931804">http://support.microsoft.com/kb/931804</a>. Note that the solution only applies to Visual SourceSafe 2005 and later.</p>
<p>When you set the time zone, you will get dire warnings discouraging you from doing so.</p>
<blockquote><p>To avoid unintended data loss, do not change the time zone for a Visual SourceSafe database after it has been established and is being used.</p></blockquote>
<p>You may have to let VSS set idle for as many hours as your developers span time zones to let everything synchronize.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2008/06/13/solving-the-problem-with-visual-source-safe-and-time-zones/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Identical twins are not genetically identical</title>
		<link>http://www.johndcook.com/blog/2008/06/12/identical-twins-are-not-genetically-identical/</link>
		<comments>http://www.johndcook.com/blog/2008/06/12/identical-twins-are-not-genetically-identical/#comments</comments>
		<pubDate>Thu, 12 Jun 2008 10:54:59 +0000</pubDate>
		<dc:creator>John</dc:creator>
		
		<category><![CDATA[Science]]></category>

		<category><![CDATA[Genetics]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/2008/06/12/identical-twins-are-not-genetically-identical/</guid>
		<description><![CDATA[Researchers recently discovered that identical twins are not genetically identical after all. They differ in the copy numbers of their genes. They have the same genes, but each may have different numbers of copies of certain genes.
Source: &#8220;Copy That&#8221; by Charles Q. Choi, Scientific American, May 2008.
]]></description>
			<content:encoded><![CDATA[<p>Researchers recently discovered that identical twins are not genetically identical after all. They differ in the copy numbers of their genes. They have the same genes, but each may have different numbers of copies of certain genes.</p>
<p>Source: &#8220;Copy That&#8221; by Charles Q. Choi, Scientific American, May 2008.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2008/06/12/identical-twins-are-not-genetically-identical/feed/</wfw:commentRss>
		</item>
		<item>
		<title>You have a web site?</title>
		<link>http://www.johndcook.com/blog/2008/06/11/you-have-a-web-site/</link>
		<comments>http://www.johndcook.com/blog/2008/06/11/you-have-a-web-site/#comments</comments>
		<pubDate>Wed, 11 Jun 2008 14:24:45 +0000</pubDate>
		<dc:creator>John</dc:creator>
		
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/2008/06/11/you-have-a-web-site/</guid>
		<description><![CDATA[I was talking to my wife about my web site last night. One my daughters interrupted with &#8220;You have a web site?!&#8221; Then one of her sisters put things in perspective. &#8220;Yeah, but it doesn&#8217;t have any games.&#8221;
]]></description>
			<content:encoded><![CDATA[<p>I was talking to my wife about my web site last night. One my daughters interrupted with &#8220;You have a web site?!&#8221; Then one of her sisters put things in perspective. &#8220;Yeah, but it doesn&#8217;t have any games.&#8221;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2008/06/11/you-have-a-web-site/feed/</wfw:commentRss>
		</item>
	</channel>
</rss>
