<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Endeavour &#187; Statistics</title>
	<atom:link href="http://www.johndcook.com/blog/category/statistics/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.johndcook.com/blog</link>
	<description>The blog of John D. Cook</description>
	<lastBuildDate>Fri, 10 Feb 2012 23:03:26 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>The universal solvent of statistics</title>
		<link>http://www.johndcook.com/blog/2012/02/01/the-universal-solvent-of-statistics/</link>
		<comments>http://www.johndcook.com/blog/2012/02/01/the-universal-solvent-of-statistics/#comments</comments>
		<pubDate>Wed, 01 Feb 2012 16:02:21 +0000</pubDate>
		<dc:creator>John</dc:creator>
				<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Bayesian]]></category>
		<category><![CDATA[Probability and Statistics]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=10594</guid>
		<description><![CDATA[Andrew Gelman just posted an interesting article on the philosophy of Bayesian statistics. Here&#8217;s my favorite passage.
This reminds me of a standard question that Don Rubin … asks in virtually any  situation:  “What would you do if you had all the data?”  For me, that  “what would you do” question is [...]]]></description>
			<content:encoded><![CDATA[<p>Andrew Gelman just posted an interesting article on the <a href="http://andrewgelman.com/2012/02/philosophy-of-bayesian-statistics-my-reactions-to-cox-and-mayo/">philosophy of Bayesian statistics</a>. Here&#8217;s my favorite passage.</p>
<blockquote><p>This reminds me of a standard question that Don Rubin … asks in virtually any  situation:  “<strong>What would you do if you had all the data</strong>?”  For me, that  “what would you do” question is one of <strong>the universal solvents of  statistics</strong>.</p></blockquote>
<p>Emphasis added.</p>
<p>I had not heard Don Rubin&#8217;s question before, but I think I&#8217;ll be asking it often. It reminds me of Alice&#8217;s famous dialog with the Cheshire Cat:</p>
<blockquote><p>&#8220;Would you tell me, please, which way I ought to go from here?&#8221;</p>
<p>&#8220;That depends a good deal on where you want to get to,&#8221; said the Cat.</p>
<p>&#8220;I don&#8217;t much care where&#8211;&#8221; said Alice.</p>
<p>&#8220;Then it doesn&#8217;t matter which way you go,&#8221; said the Cat.</p></blockquote>
<p><img class="alignnone" src="http://www.johndcook.com/cheshire_cat.png" alt="Cheshire Cat" width="200" height="200" /></p>
<p><strong>Related post</strong>: <a href="http://www.johndcook.com/blog/2008/01/14/irrelevant-uncertainty/">Irrelevant uncertainty</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2012/02/01/the-universal-solvent-of-statistics/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Interpreting statistics</title>
		<link>http://www.johndcook.com/blog/2012/01/13/interpreting-statistics/</link>
		<comments>http://www.johndcook.com/blog/2012/01/13/interpreting-statistics/#comments</comments>
		<pubDate>Fri, 13 Jan 2012 14:41:24 +0000</pubDate>
		<dc:creator>John</dc:creator>
				<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Bayesian]]></category>
		<category><![CDATA[Probability and Statistics]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=10450</guid>
		<description><![CDATA[From Matt Briggs:
I challenge you to find me in any published statistical  analysis, outside of an introductory textbook, a confidence interval  given the correct interpretation.  If you can find even one instance  where the [frequentist] confidence interval is not interpreted as a [Bayesian] credible interval,  then I will eat your [...]]]></description>
			<content:encoded><![CDATA[<p>From <a href="http://wmbriggs.com/blog/?p=5062">Matt Briggs</a>:</p>
<blockquote><p>I challenge you to find me in <em>any</em> published statistical  analysis, outside of an introductory textbook, a confidence interval  given the correct interpretation.  If you can find even one instance  where the [frequentist] confidence interval is not interpreted as a [Bayesian] credible interval,  then I will eat your hat.</p></blockquote>
<p>Most statistical analysis is carried out by people who do not interpret their results correctly. They carry out frequentist procedures and then give the results a Bayesian interpretation. This is not simply a violation of an academic taboo. It means that people generally underestimate the uncertainty in their conclusions.</p>
<p><strong>Related posts</strong>:</p>
<p><a href="http://www.johndcook.com/blog/2008/02/07/most-published-research-results-are-false/">Most published research results are false</a><br />
<a href="http://www.johndcook.com/blog/2009/05/04/classical-statistics-in-a-nutshell/">Classical statistics in a nutshell</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2012/01/13/interpreting-statistics/feed/</wfw:commentRss>
		<slash:comments>16</slash:comments>
		</item>
		<item>
		<title>R in Action</title>
		<link>http://www.johndcook.com/blog/2012/01/02/r-in-action/</link>
		<comments>http://www.johndcook.com/blog/2012/01/02/r-in-action/#comments</comments>
		<pubDate>Mon, 02 Jan 2012 16:32:31 +0000</pubDate>
		<dc:creator>John</dc:creator>
				<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Books]]></category>
		<category><![CDATA[Probability and Statistics]]></category>
		<category><![CDATA[Rstats]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=10364</guid>
		<description><![CDATA[No Starch Press sent me a copy of The Art of R Programming last Fall and I wrote a review of it here. Then a couple weeks ago, Manning sent me a copy of R in Action. Here I&#8217;ll give a quick comparison of the two books, then focus specifically on R in Action.


Comparing R [...]]]></description>
			<content:encoded><![CDATA[<p>No Starch Press sent me a copy of <a href="http://www.amazon.com/gp/product/1593273843/ref=as_li_ss_tl?ie=UTF8&amp;tag=theende-20&amp;linkCode=as2&amp;camp=217145&amp;creative=399373&amp;creativeASIN=1593273843">The Art of R Programming</a> last Fall and I wrote a review of it <a href="http://www.johndcook.com/blog/2011/10/10/the-art-of-r-programming/">here</a>. Then a couple weeks ago, Manning sent me a copy of <a href="http://www.amazon.com/gp/product/1935182390/ref=as_li_ss_tl?ie=UTF8&amp;tag=theende-20&amp;linkCode=as2&amp;camp=1789&amp;creative=390957&amp;creativeASIN=1935182390">R in Action</a>. Here I&#8217;ll give a quick comparison of the two books, then focus specifically on <em>R in Action</em>.</p>
<p><a href="http://www.amazon.com/gp/product/1935182390/ref=as_li_ss_il?ie=UTF8&amp;tag=theende-20&amp;linkCode=as2&amp;camp=1789&amp;creative=390957&amp;creativeASIN=1935182390"><img src="http://ws.assoc-amazon.com/widgets/q?_encoding=UTF8&amp;Format=_SL160_&amp;ASIN=1935182390&amp;MarketPlace=US&amp;ID=AsinImage&amp;WS=1&amp;tag=theende-20&amp;ServiceVersion=20070822" border="0" alt="" /></a><img style="border:none !important; margin:0px !important;" src="http://www.assoc-amazon.com/e/ir?t=theende-20&amp;l=as2&amp;o=1&amp;a=1935182390" border="0" alt="" width="1" height="1" /></p>
<p><span id="more-10364"></span></p>
<p><strong>Comparing R books</strong></p>
<p>Norman Matloff, author of <em>The Art of R Programming</em>, is a statistician-turned-computer scientist. As the title may imply, Matloff&#8217;s book has more of a programmer&#8217;s perspective on R as a language.</p>
<p>Robert Kabacoff, author of <em>R in Action</em>, is a psychology professor-turned-statistical consultant. And as its title may imply, Kabacoff&#8217;s book is more about using R to analyze data. That is, the book is organized by analytical task rather than by language feature.</p>
<p>Many R books are organized like a statistical text. In fact, many <em>are</em> statistics texts, organized according to the progression of statistical theory with R code sprinkled in. <em>R in Action</em> is organized roughly in the order of steps one would take to analyze data, starting with importing data and ending with producing reports.</p>
<p>In short, <em>The Art of R Programming</em> is for programmers, <em>R in Action</em> is for data analysts, and most other R books I&#8217;ve seen are for statisticians. Of course a typical R user is to some extent a programmer, an analyst, and a statistician. But this comparison gives you some idea which book you might want to reach for depending on which hat you&#8217;re wearing at the moment. For example, I&#8217;d pick up <em>The Art of R Programming</em> if I had a question about interfacing R and C, but I&#8217;d pick up R in Action if I wanted to read about importing SAS data or using the <code>ggplot2</code> graphics package.</p>
<p><strong>R in Action</strong></p>
<p>Kabacoff begins his book off with two appropriate quotes.</p>
<blockquote><p>What is the use of a book, without pictures or conversations? — Alice, <em>Alice in Wonderland</em></p>
<p>It&#8217;s wonderous, with treasures to satiate desires both subtle and gross; but it&#8217;s not for the timid. — Q,  &#8220;Q Who?&#8221; <em>Star Trek: The Next Generation</em></p></blockquote>
<p><em>R in Action</em> is filled with pictures and conversations. It is also a treasure chest of practical information.</p>
<p>The first third of the book concerns basic data management and graphics. This much of the book would be accessible to someone with no background in statistics. The middle third of the book is devoted to basic statistics: correlation, linear regression, etc. The final third of the book contains more advanced statistics and graphics. (I was pleased to see the book has an appendix on using <code>Sweave</code> and <code>odfWeave</code> to produce reports.)</p>
<p><em>R in Action</em> includes practical details that I have not seen in other books on R. Perhaps this is because the book is focused on analyzing and graphing data rather than exploring the dark corners of R or rounding out statistical theory.</p>
<p>Kabacoff says that he wrote the book that he wishes he&#8217;d had years ago. I also wish I&#8217;d had his book years ago.</p>
<p><strong>Related links</strong>:</p>
<p><a href="http://www.johndcook.com/R_language_for_programmers.html">R programming for those coming from other languages</a> (referenced in <em>R in Action</em>)</p>
<p><a href="http://www.johndcook.com/blog/2011/06/30/calling-cpp-from-r/">Calling C++ from R</a></p>
<p><a href="http://www.johndcook.com/blog/2008/10/31/changing-the-r-console-fonts/">Better R console fonts</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2012/01/02/r-in-action/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Type R error</title>
		<link>http://www.johndcook.com/blog/2011/12/09/type-r-error/</link>
		<comments>http://www.johndcook.com/blog/2011/12/09/type-r-error/#comments</comments>
		<pubDate>Fri, 09 Dec 2011 13:00:14 +0000</pubDate>
		<dc:creator>John</dc:creator>
				<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Science]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=10159</guid>
		<description><![CDATA[Andrew Gelman added a couple more types of error to the standard repertoire of type I and type II errors. He suggests using type S error to describe a result that gets a sign backward, reporting that A is bigger than B when in fact B is bigger than A. He also suggests using type [...]]]></description>
			<content:encoded><![CDATA[<p>Andrew Gelman added <a href="http://www.johndcook.com/blog/2008/04/21/four-types-of-errors/">a couple more</a> types of error to the standard repertoire of <strong>type I</strong> and <strong>type II</strong> errors. He suggests using <strong>type S</strong> error to describe a result that gets a sign backward, reporting that <em>A</em> is bigger than <em>B</em> when in fact <em>B</em> is bigger than <em>A</em>. He also suggests using <strong>type M</strong> error for results that get the magnitude of a result wrong.</p>
<p>Maybe we could add to this list <strong>type R</strong> for reification error: treating an abstraction as if it were real, forgetting that a model is a model and stretching it beyond its limits.</p>
<p><strong>Related links</strong>:</p>
<p><a href="http://www.johndcook.com/blog/2011/09/30/just-an-approximation/">Just an approximation</a><br />
<a href="http://www.johndcook.com/blog/2011/11/01/floating-point-worries/">Floating point error is the least of my worries</a><br />
<a href="http://www.nobelprize.org/nobel_prizes/economics/laureates/1974/hayek-lecture.html">The Pretense of Knowledge</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2011/12/09/type-r-error/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Amputating reality</title>
		<link>http://www.johndcook.com/blog/2011/12/01/amputating-reality/</link>
		<comments>http://www.johndcook.com/blog/2011/12/01/amputating-reality/#comments</comments>
		<pubDate>Fri, 02 Dec 2011 01:47:10 +0000</pubDate>
		<dc:creator>John</dc:creator>
				<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Probability and Statistics]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=10120</guid>
		<description><![CDATA[&#8220;If you just rely on one model, you tend to amputate reality to make it fit your model.&#8221; &#8212; David Brooks
Related post: Advantages of crude models

]]></description>
			<content:encoded><![CDATA[<p>&#8220;If you just rely on one model, you tend to amputate reality to make it fit your model.&#8221; &#8212; David Brooks</p>
<p>Related post:<a href="http://www.johndcook.com/blog/2008/08/07/black-swan-talk/"></a><a href="http://www.johndcook.com/blog/2011/05/25/crude-models/"> Advantages of crude models</a><br />
<a href="http://www.johndcook.com/blog/2011/01/12/occams-razor-bayes-theorem/"></a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2011/12/01/amputating-reality/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Bad logic, but good statistics</title>
		<link>http://www.johndcook.com/blog/2011/11/28/bad-logic-but-good-statistics/</link>
		<comments>http://www.johndcook.com/blog/2011/11/28/bad-logic-but-good-statistics/#comments</comments>
		<pubDate>Mon, 28 Nov 2011 15:22:39 +0000</pubDate>
		<dc:creator>John</dc:creator>
				<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Bayesian]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=10084</guid>
		<description><![CDATA[Ad hominem arguments are bad logic, but good (Bayesian) statistics. A statement isn&#8217;t necessarily false because it comes from an unreliable source, though it is more likely to be false.
Some people are much more likely to know what they&#8217;re talking about than  others, depending on context. You&#8217;re more likely to get good medical advice [...]]]></description>
			<content:encoded><![CDATA[<p><em>Ad hominem</em> arguments are bad logic, but good (Bayesian) statistics. A statement isn&#8217;t <strong>necessarily</strong> false because it comes from an unreliable source, though it is more <strong>likely</strong> to be false.</p>
<p>Some people are much more likely to know what they&#8217;re talking about than  others, depending on context. You&#8217;re more likely to get good medical advice from a doctor than from an accountant, though the former may be wrong and the latter may be right. (Actors are not likely to know what they&#8217;re talking about when giving advice regarding anything but acting, though that doesn&#8217;t stop them.)</p>
<p><em>Ad hominem</em> guesses are a reasonable way to construct a prior, but the prior needs to be updated with data. Given no other data, the doctor is more likely to know medicine than the accountant is. Assuming <em>a priori</em> that both are equally likely to be correct may be &#8220;fair,&#8221; but it&#8217;s not reasonable. However, as you gather data on the accuracy of each, you could change your mind. The posterior distribution could persuade you that you&#8217;ve been talking to a quack doctor or an accountant who is unusually knowledgeable of medicine.</p>
<p><strong>Related post</strong>: <a href="http://www.johndcook.com/blog/2008/01/12/musicians-drunks-and-oliver-cromwell/">Musicians, drunks, and Oliver Cromwell<br />
</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2011/11/28/bad-logic-but-good-statistics/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Career advice regarding tools</title>
		<link>http://www.johndcook.com/blog/2011/11/21/career-advice-regarding-tools/</link>
		<comments>http://www.johndcook.com/blog/2011/11/21/career-advice-regarding-tools/#comments</comments>
		<pubDate>Mon, 21 Nov 2011 15:10:27 +0000</pubDate>
		<dc:creator>John</dc:creator>
				<category><![CDATA[Business]]></category>
		<category><![CDATA[Software development]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=9801</guid>
		<description><![CDATA[
A few weeks ago, J. D. Long gave some interesting advice in a Google+ discussion. He starts out
Lunch today with an analyst 13 years my junior made me think about  things I wish I had known about the technical analytical profession when  I was 25. Here&#8217;s some things that popped into my head:
The [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://www.johndcook.com/jdlong.jpeg" alt="J. D. Long wearing a Panama and smoking a Dominican" width="254" height="208" /></p>
<p>A few weeks ago, <a href="https://plus.google.com/107121399840634452924/posts">J. D. Long</a> gave some interesting advice in a Google+ discussion. He starts out</p>
<blockquote><p>Lunch today with an analyst 13 years my junior made me think about  things I wish I had known about the technical analytical profession when  I was 25. Here&#8217;s some things that popped into my head:</p></blockquote>
<p>The entire list is worth reading, but I want to focus on two things he said about tools.</p>
<ul>
<li>Use tools you don&#8217;t have to ask permission to install (i.e. open source).</li>
<li>Dependence on tools that are closed license and un-scriptable will   limit the scope of problems you can solve. (i.e. Excel) Use them, but   build your core skills on more portable &amp; scalable technologies.</li>
</ul>
<p>I would have disagreed a few years ago, but now I think this is good advice.</p>
<p>In the late 90&#8217;s I used mostly Microsoft tools. That was a good time to be a Microsoft developer. Windows was on the rise; Unix and Mac OS were on the ropes. Desktop applications were the norm and were easier to write on Windows. Open source software was hard to install and hard to use. People who used open source software often did so for ideological reasons, not because it made their work easier.</p>
<p>Of course times have changed. Mac recovered from its near death experience. Unix didn&#8217;t, but it has been resurrected as Linux. The web made it easier to write cross-platform software. And above all, open source software has matured. The open source community is more positive, focused on promoting good software rather than trying to give some corporation a stick in the eye.</p>
<p>Now the advantages of open source are clearer. There&#8217;s not the same hidden cost in frustration that there was a few years ago. Now I would say yes, it&#8217;s a great advantage to use tools you can install whenever and wherever you want, without having to go through a purchasing bureaucracy.</p>
<p>It&#8217;s interesting that JD equates open source with scriptability. Open source software often is scriptable, not because it&#8217;s open source, but because of the Unix aesthetic that pervades the open source community. Closed source software is often not scriptable, not because it&#8217;s closed source, but because it is often written for consumers who value <a href="http://www.johndcook.com/blog/2011/08/15/usability-versus-composability/">usability over composability</a>. Commercial server-side products may be scriptable. If I were to restate JD&#8217;s advice on this point, I&#8217;d say to keep composability in mind and don&#8217;t just think about usability.</p>
<p>I appreciate JD&#8217;s attitude toward applications such as Excel. He&#8217;s not saying you should never defile your conscience by opening Excel. Some tasks are incredibly easy in Excel. The danger comes from pushing the tool into territory where other tools are better. There are still some in the open source community who believe that opening Excel is a sin, but I&#8217;m much more in agreement with the people who say, for example, that Excel isn&#8217;t the best tool for statistical analysis.</p>
<p>Portability is funny. In the early days of computing, there were no dominant players, and portability was important (and difficult). Then for a while, portability didn&#8217;t matter if you were content with only running on the 95% of the world&#8217;s computers that ran Windows. Now portability is important again. Windows still has a huge market share on the desktop, but the desktop itself is losing market share.</p>
<p>And portability matters for more than consumer operating systems. JD mentions portability and scalability in one breath. You may want to move code between operating systems to scale up (e.g. to run on a cluster) or to scale down (e.g. to run on a mobile device).</p>
<p>There&#8217;s also the aspect of career portability. You want to master tools that you can take with you from job to job. I would be leery of building a career around a small company&#8217;s proprietary tools. If I were in that situation, I&#8217;d learn something else on the side that&#8217;s more portable.</p>
<p>In closing, I&#8217;ll give the rest of JD&#8217;s career advice without commentary. These points could make interesting fodder for future blog posts.</p>
<ul>
<li>Be a profit center, not a cost center.</li>
<li>Use tools you don&#8217;t have to ask permission to install (i.e. open source).</li>
<li>Dependence on tools that are closed license and un-scriptable will limit the scope of problems you can solve. (i.e. Excel) Use them, but build your core skills on more portable &amp; scalable technologies.</li>
<li>Learn basic database tools.</li>
<li>Learn a programming language.</li>
<li>Your internal job description may say, &#8220;Analyst&#8221; but get something else on your business cards. Analyst is so vague as to be meaningless. My external title is currently &#8220;Sr. Risk Economist.&#8221; I like the term &#8220;Data Scientist&#8221; for now. I expect that term will be meaningless in 5 years.</li>
<li>Large organizations do not properly appreciate agile and smart analytic types. Time at large firms should be seen as subsidized learning. Learn lots, but get out.</li>
<li>Ensure you can explain any of your projects to your wife or non-technical friends. It&#8217;s good practice for board meetings later in your career.</li>
<li>Be sure you know the handful of things that you can do better than most anyone else. Add something to that list every year. Make sure you can explain these things to non techies.</li>
<li>Be a profit center, not a cost center. At least be as close to the profit center as possible. The chief analyst for the sales SVP is closer to the profit center than an IT analyst supporting billing operations.</li>
<li>Get really good at asking questions so you understand problems before you start solving them.</li>
<li>Yes, that bit about being a profit center not a cost center is in there twice. It should probably be in there 5 times.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2011/11/21/career-advice-regarding-tools/feed/</wfw:commentRss>
		<slash:comments>13</slash:comments>
		</item>
		<item>
		<title>A Bayesian view of Amazon Resellers</title>
		<link>http://www.johndcook.com/blog/2011/09/27/bayesian-amazon/</link>
		<comments>http://www.johndcook.com/blog/2011/09/27/bayesian-amazon/#comments</comments>
		<pubDate>Wed, 28 Sep 2011 02:43:34 +0000</pubDate>
		<dc:creator>John</dc:creator>
				<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Bayesian]]></category>
		<category><![CDATA[Probability and Statistics]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=9533</guid>
		<description><![CDATA[I was buying a used book through Amazon this evening. Three resellers offered the book at essentially the same price. Here were their ratings:

94% positive out of 85,193 reviews
98% positive out of 20,785 reviews
99% positive out of 840 reviews

Which reseller is likely to give the best service? Before you assume it&#8217;s the seller with the [...]]]></description>
			<content:encoded><![CDATA[<p>I was buying a used book through Amazon this evening. Three resellers offered the book at essentially the same price. Here were their ratings:</p>
<ul>
<li>94% positive out of 85,193 reviews</li>
<li>98% positive out of 20,785 reviews</li>
<li>99% positive out of 840 reviews</li>
</ul>
<p>Which reseller is likely to give the best service? Before you assume it&#8217;s the seller with the highest percentage of positive reviews, consider the following simpler scenario.</p>
<p>Suppose one reseller has 90 positive reviews out of 100. The other reseller has two reviews, both positive. You could say one has 90% approval and the other has 100% approval, so the one with 100% approval is better. But this doesn&#8217;t take into consideration that there&#8217;s much more data on one than the other. You can have some confidence that 90% of the first reseller&#8217;s customers are satisfied. You don&#8217;t really know about the other because you have only two data points.</p>
<p style="text-align:center"><img src="http://imgs.xkcd.com/comics/a-minus-minus.png" alt="" width="408" height="379" /></p>
<p>A Bayesian view of the problem naturally incorporates the amount of data as well as its average. Let θ<sub>A</sub> be the probability of a customer being satisfied with company A&#8217;s service. Let θ<sub>B</sub> be the corresponding probability for company B. Suppose before we see any reviews we think all ratings are equally likely. That is, we start with a uniform prior distribution θ<sub>A</sub> and θ<sub>B</sub>. A uniform distribution is the same as a beta(1, 1) distribution.</p>
<p>After observing 90 positive reviews and 10 negative reviews, our posterior estimate on θ<sub>A</sub> has a beta(91, 11) distribution. After observing 2 positive reviews, our posterior estimate on θ<sub>B</sub> has a beta(3, 1) distribution. The probability that a sample from θ<sub>A</sub> is bigger than a sample from θ<sub>B</sub> is 0.713. That is, there&#8217;s a good chance you&#8217;d get better service from the reseller with the lower average approval rating.</p>
<p style="text-align:center"><img title="ineqcalc2" src="http://www.johndcook.com/ineqcalc2.png" alt="beta(91,11) versus beta(3,1)" width="400" height="263" /></p>
<p>Now back to our original question. Which of the three resellers is most likely to satisfy a customer?</p>
<p>Assume a uniform prior on θ<sub>X</sub>, θ<sub>Y</sub>, and θ<sub>Z</sub>, the probabilities of good service for each reseller. The posterior distributions on these variables have distributions beta(80082, 5113), beta(20370, 417), and beta(833, 9).</p>
<p>These beta distributions have such large parameters that we can approximate them by normal distributions with the same mean and variance. (A beta(<em>a</em>, <em>b</em>) random variable has mean <em>a</em>/(<em>a</em>+<em>b</em>) and variance <em>ab</em>/((<em>a</em>+<em>b</em>)<sup>2</sup>(<em>a</em>+<em>b</em>+1)).) The variable with the most variance, θ<sub>Z</sub>, has standard deviation 0.003. The other variables have even smaller standard deviation. So the three distributions are highly concentrated at their mean values with practically non-overlapping support. And so a sample from θ<sub>X</sub> or θ<sub>Y</sub> is unlikely to be higher than a sample from θ<sub>Z</sub>.</p>
<p>In general, going by averages alone works when you have a lot of customer reviews. But when you have a small number of reviews, going by averages alone could be misleading.</p>
<p>Thanks to Charles McCreary for suggesting the xkcd comic.</p>
<p><strong>Related links</strong>:</p>
<p><a href="https://biostatistics.mdanderson.org/SoftwareDownload/SingleSoftware.aspx?Software_Id=9">Inequality Calculator</a><br />
<a href="http://www.bepress.com/mdandersonbiostat/paper46/">Calculating random inequalities</a><br />
<a href="http://www.johndcook.com/UTMDABTR-005-05.pdf">Exact calculation of beta inequalities</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2011/09/27/bayesian-amazon/feed/</wfw:commentRss>
		<slash:comments>28</slash:comments>
		</item>
		<item>
		<title>Big data and humility</title>
		<link>http://www.johndcook.com/blog/2011/09/22/big-data-and-humility/</link>
		<comments>http://www.johndcook.com/blog/2011/09/22/big-data-and-humility/#comments</comments>
		<pubDate>Thu, 22 Sep 2011 18:53:41 +0000</pubDate>
		<dc:creator>John</dc:creator>
				<category><![CDATA[Science]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Bayesian]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=9490</guid>
		<description><![CDATA[One of the challenges with big data is to properly estimate your uncertainty. Often &#8220;big data&#8221; means a huge amount of data that isn&#8217;t exactly what you want.
As an example, suppose you have data on how a drug acts in monkeys and you want to infer how the drug acts in humans. There are two [...]]]></description>
			<content:encoded><![CDATA[<p>One of the challenges with big data is to properly estimate your uncertainty. Often &#8220;big data&#8221; means a huge amount of data that isn&#8217;t exactly what you want.</p>
<p>As an example, suppose you have data on how a drug acts in monkeys and you want to infer how the drug acts in humans. There are two sources of uncertainty:</p>
<ol>
<li>How well do we really know the effects in monkeys?</li>
<li>How well do these results translate to humans?</li>
</ol>
<p>The former can be quantified, and so we focus on that, but the latter may be more important. There&#8217;s a strong temptation to believe that big data regarding one situation tells us more than it does about an analogous situation.</p>
<p>I&#8217;ve seen people reason as follows. We don&#8217;t really know how results translate from monkeys to humans (or from one chemical to a related chemical, from one market to an analogous market, etc.). We have a moderate amount of data on monkeys and we&#8217;ll decimate it and use that as if it were human data, say in order to come up with a prior distribution.</p>
<p>Down-weighting by a fixed ratio, such as 10 to 1, is misleading. If you had 10x as much data on monkeys, would you as much about effects in humans as if the original smaller data set were collected on people? What if you suddenly had &#8220;big data&#8221; involving every monkey on the planet. More data on monkeys drives down your uncertainty about monkeys, but does nothing to lower your uncertainty regarding how monkey results translate to humans.</p>
<p>At some point, more data about analogous cases reaches diminishing return and you can&#8217;t go further without data about what you really want to know. Collecting more and more data about how a drug works in adults won&#8217;t help you learn how it works in children. At some point, you need to treat children. Terabytes of analogous data may not be as valuable as kilobytes of highly relevant data.</p>
<p><strong>Related posts</strong>:</p>
<p><a href="http://www.johndcook.com/blog/2010/12/15/big-data-is-not-enough/">Big data is not enough</a><br />
<a href="http://www.johndcook.com/blog/2010/03/12/does-gaining-weight-make-you-taller/">Does gaining weight make you taller?<br />
</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2011/09/22/big-data-and-humility/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Bayes isn&#8217;t magic</title>
		<link>http://www.johndcook.com/blog/2011/09/06/bayes-isnt-magic/</link>
		<comments>http://www.johndcook.com/blog/2011/09/06/bayes-isnt-magic/#comments</comments>
		<pubDate>Tue, 06 Sep 2011 12:44:18 +0000</pubDate>
		<dc:creator>John</dc:creator>
				<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Bayesian]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=9307</guid>
		<description><![CDATA[If a study is completely infeasible using traditional statistical methods, Bayesian methods are probably not going to rescue it. Bayesian methods can&#8217;t squeeze blood out of a turnip.
The Bayesian approach to statistics has real advantages, but sometimes these advantages are oversold. Bayesian statistics is still statistics, not magic.
]]></description>
			<content:encoded><![CDATA[<p>If a study is completely infeasible using traditional statistical methods, Bayesian methods are probably not going to rescue it. Bayesian methods can&#8217;t squeeze blood out of a turnip.</p>
<p>The Bayesian approach to statistics has real advantages, but sometimes these advantages are oversold. Bayesian statistics is still statistics, not magic.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2011/09/06/bayes-isnt-magic/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Markov chains don&#8217;t converge</title>
		<link>http://www.johndcook.com/blog/2011/08/10/markov-chains-dont-converge/</link>
		<comments>http://www.johndcook.com/blog/2011/08/10/markov-chains-dont-converge/#comments</comments>
		<pubDate>Wed, 10 Aug 2011 14:23:09 +0000</pubDate>
		<dc:creator>John</dc:creator>
				<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Bayesian]]></category>
		<category><![CDATA[Optimization]]></category>
		<category><![CDATA[Probability and Statistics]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=9123</guid>
		<description><![CDATA[I often hear people often say they&#8217;re using a burn-in period in MCMC to run a Markov chain until it converges. But Markov chains don&#8217;t converge, at least not the Markov chains that are useful in MCMC. These Markov chains wander around forever exploring the domain they&#8217;re sampling from. Any point that makes a &#8220;bad&#8221; [...]]]></description>
			<content:encoded><![CDATA[<p>I often hear people often say they&#8217;re using a burn-in period in MCMC to run a Markov chain until it converges. But Markov chains don&#8217;t converge, at least not the Markov chains that are useful in MCMC. These Markov chains wander around forever exploring the domain they&#8217;re sampling from. Any point that makes a &#8220;bad&#8221; starting point for MCMC is a point you might reach by burn-in.</p>
<p>Not only that, Markov chains can&#8217;t remember how they got where they are. That&#8217;s their defining property. So if your burn-in period ends at a point <em>x</em>, the chain will perform exactly as if you had simply started at <em>x</em>.</p>
<p>When someone says a Markov chain has converged, they mean that the chain has entered a high-probability region. I&#8217;ll explain in a moment why that&#8217;s desirable. But the belief/hope is that a burn-in period will put a Markov chain in a high-probability region. And it probably will, but there are a couple reasons why this isn&#8217;t necessarily the best thing to do.</p>
<ol>
<li><strong>Burn-in may be ineffective</strong>. You could use use optimization to be <em>certain</em> that you&#8217;re starting in such a region. Burn-in offers no such assurance. See <a href="http://stronginference.com/weblog/2011/8/9/burn-in-and-other-mcmc-folklore.html">Burn-in and other MCMC folklore</a>.</li>
<li><strong>Burn-in may be inefficient</strong>. Casting aside worries that burn-in may not do what you want, it can be an inefficient way to find a high-probability region. MCMC isn&#8217;t designed to <em>optimize</em> a density function but rather to <em>sample</em> from it.</li>
</ol>
<p>Why use burn-in? MCMC practitioners often don&#8217;t know how to do optimization, and in any case the corresponding optimization problem may be difficult. Also, if you&#8217;ve got the MCMC code in hand, it&#8217;s convenient to use it to find a starting point as well as for sampling.</p>
<p>So why does it matter whether you start your Markov chain in a high-probability region? In the limit, it doesn&#8217;t matter. But since you&#8217;re averaging some function of some finite number of samples, your average will be a better approximation if you start at a typical point in the density you&#8217;re sampling. If you start at a low probability location, your average may be more biased.</p>
<p>Samples from Markov chains don&#8217;t converge, but <em>averages</em> of functions applied to these samples may converge. When someone says a Markov chain has converged, they mean they&#8217;re at a point where the average of a finite number of function applications will be a better approximation of the thing they want to compute than if they&#8217;d started at a low probability point.</p>
<p>It&#8217;s not just a matter of imprecise language when people say a Markov chain has converged. It sometimes betrays a misunderstanding of how Markov chains work.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2011/08/10/markov-chains-dont-converge/feed/</wfw:commentRss>
		<slash:comments>13</slash:comments>
		</item>
		<item>
		<title>The single big jump principle</title>
		<link>http://www.johndcook.com/blog/2011/08/09/single-big-jump-principle/</link>
		<comments>http://www.johndcook.com/blog/2011/08/09/single-big-jump-principle/#comments</comments>
		<pubDate>Tue, 09 Aug 2011 11:05:31 +0000</pubDate>
		<dc:creator>John</dc:creator>
				<category><![CDATA[Math]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Probability and Statistics]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=9105</guid>
		<description><![CDATA[Suppose you&#8217;re playing a game where you take 10 steps of a random size. Here are two variations on the game. Which will give you a better chance of ending up far from where you started?

You take your steps one at a time, starting each new step from where the last one took you.
You return [...]]]></description>
			<content:encoded><![CDATA[<p>Suppose you&#8217;re playing a game where you take 10 steps of a random size. Here are two variations on the game. Which will give you a better chance of ending up far from where you started?</p>
<ol>
<li>You take your steps one at a time, starting each new step from where the last one took you.</li>
<li>You return to the starting point after each step. After 10 turns, you go back to where your largest step took you.</li>
</ol>
<p>Assume all steps are in the same direction. Then you&#8217;re always better off under the first rule. The sum of all your steps will be bigger than the maximum since the maximum is one of the terms in the sum.</p>
<p>However, depending on the probability distribution on your random steps, you may do almost as well under the second rule, taking the maximum of your steps.</p>
<p>If the distribution on your step size is heavy-tailed (technically, subexponential) then the maximum and the sum have the same asymptotic distribution. That is, as <em>x</em> goes to infinity,</p>
<p style="text-align: center;">Pr( <em>X</em><sub>1</sub> + <em>X</em><sub>2</sub> + … + <em>X</em><sub>n</sub> &gt; <em>x </em>) ~ Pr( max(<em>X</em><sub>1</sub>, <em>X</em><sub>2</sub>, … , <em>X</em><sub>n</sub>) &gt; <em>x</em> )</p>
<p>As <em>x</em> gets larger, the relative advantage of taking the sum rather than the maximum goes to zero. This is known as &#8220;the principle of the single big jump.&#8221; If you&#8217;re looking to make a lot of progress, adding up typical jumps isn&#8217;t likely to get you there. You need one big jump. Said another way, <strong>your total progress is about as good as the progress on your best shot</strong>.</p>
<p>Before you draw a life lesson from this, note that this is only true for heavy-tailed distributions. It&#8217;s not true at all if your jumps have a thin-tailed distribution. But sometimes the <a href="http://www.johndcook.com/blog/2009/09/29/achievement-is-log-normal/">payoffs in life&#8217;s games</a> are heavy-tailed.</p>
<p>What distributions are subexponential? Any heavy-tailed distribution you&#8217;re likely to have heard of: <a href="http://www.johndcook.com/blog/2008/10/02/testing-and-getting-nowhere/">Cauchy</a>, <a href="http://www.johndcook.com/blog/2009/08/19/generalized-central-limit-theorem/">Lévy</a>, Weibull (with shape &lt; 1), <a href="http://www.johndcook.com/blog/2009/09/29/achievement-is-log-normal/">log-normal</a>, etc.</p>
<p>To illustrate the single jump principle, let X<sub>1</sub> and X<sub>2</sub> be standard Cauchy random variables. The difference between Pr( <em>X</em><sub>1</sub> + <em>X</em><sub>2</sub> &gt; <em>x</em> ) and Pr( max(<em>X</em><sub>1</sub> , <em>X</em><sub>2</sub>) &gt; <em>x</em> ) becomes impossible to see for <em>x</em> much bigger than 3.</p>
<p style="text-align:center"><img src="http://www.johndcook.com/singlejump1.png" alt="" width="450" height="311" /></p>
<p>Here&#8217;s a plot of the ratio of the sum to the maximum.</p>
<p style="text-align:center"><img src="http://www.johndcook.com/singlejump2.png" alt="" width="450" height="305" /></p>
<p>The ratio tends to 1 in the limit as predicted by the single big jump principle.</p>
<p>I chose to use a Cauchy distribution to simplify the calculations. (The sum of two Cauchy random variables is another Cauchy. See <a href="http://www.johndcook.com/distribution_chart.html">distribution relationships</a>.) In this case the maximum is actually larger than the sum because the Cauchy can be positive or negative. But it is still the case that the two tails converge as <em>x</em> increases.</p>
<p>Here&#8217;s the <a href="http://sagenb.org/home/pub/3018/">Sage notebook</a> that I used to create the graphs.</p>
<p>More on the single big jump <a href="http://www.amazon.com/gp/product/1441994726/ref=as_li_ss_tl?ie=UTF8&amp;tag=theende-20&amp;linkCode=as2&amp;camp=217145&amp;creative=399373&amp;creativeASIN=1441994726">here</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2011/08/09/single-big-jump-principle/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Math superhero in training</title>
		<link>http://www.johndcook.com/blog/2011/07/27/math-superhero-in-training/</link>
		<comments>http://www.johndcook.com/blog/2011/07/27/math-superhero-in-training/#comments</comments>
		<pubDate>Wed, 27 Jul 2011 15:03:25 +0000</pubDate>
		<dc:creator>John</dc:creator>
				<category><![CDATA[Math]]></category>
		<category><![CDATA[Statistics]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=9007</guid>
		<description><![CDATA[Steve Yegge has a new project. He&#8217;s in training to become a math superhero. Or at least a sidekick. He said that math/stat folks superheros and he wants to join them.
In his presentation at OSCON Data 2011 on Monday, Yegge said that all the hard problems require math and statistics. So he&#8217;s quitting his job [...]]]></description>
			<content:encoded><![CDATA[<p>Steve Yegge has a new project. He&#8217;s in training to become a math superhero. Or at least a sidekick. He said that math/stat folks superheros and he wants to join them.</p>
<p>In his presentation at <a href="http://www.oscon.com/data">OSCON Data 2011</a> on Monday, Yegge said that all the hard problems require math and statistics. So he&#8217;s quitting his job at Google to study math in hopes that he can solve big problems three to five years from now.</p>
<p><iframe width="450" height="286" src="http://www.youtube.com/embed/vKmQW_Nkfk8" mce_src="http://www.youtube.com/embed/vKmQW_Nkfk8" frameborder="0" allowfullscreen></iframe></p>
<p>His enthusiasm for math is naive and inspiring. I gather from some of the articles he&#8217;s written that he&#8217;s an original thinker and a hard worker.  It&#8217;ll be interesting to see what he does.</p>
<p><strong>Update</strong>: As pointed out in the comments, Steve Yegge clarified on his blog that he is not quitting Google, only his <em>project</em> at Google. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2011/07/27/math-superhero-in-training/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>How to fit an elephant</title>
		<link>http://www.johndcook.com/blog/2011/06/21/how-to-fit-an-elephant/</link>
		<comments>http://www.johndcook.com/blog/2011/06/21/how-to-fit-an-elephant/#comments</comments>
		<pubDate>Tue, 21 Jun 2011 12:00:47 +0000</pubDate>
		<dc:creator>John</dc:creator>
				<category><![CDATA[Python]]></category>
		<category><![CDATA[Statistics]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=8741</guid>
		<description><![CDATA[John von Neumann famously said
With four parameters I can fit an elephant, and with five I can make him wiggle his trunk.
By this he meant that one should not be impressed when a complex model fits a data set well. With enough parameters, you can fit any data set.
It turns out you can literally fit [...]]]></description>
			<content:encoded><![CDATA[<p>John von Neumann famously said</p>
<blockquote><p>With four parameters I can fit an elephant, and with five I can make him wiggle his trunk.</p></blockquote>
<p>By this he meant that one should not be impressed when a complex model fits a data set well. With enough parameters, you can fit any data set.</p>
<p>It turns out you can literally fit an elephant with four parameters if you allow the parameters to be complex numbers.</p>
<p style="text-align:center"><img src="http://www.johndcook.com/elephant.png" alt="" width="400" height="302" /></p>
<p>I mentioned von Neumann&#8217;s quote on <a href="http://twitter.com/statfact">StatFact</a> last week and <a href="http://twitter.com/#!/zolnie">Piotr Zolnierczuk</a> replied with reference to a paper explaining how to fit an elephant:</p>
<blockquote><p>&#8220;Drawing an elephant with four complex parameters&#8221; by Jurgen Mayer, Khaled Khairy, and Jonathon Howard,  Am. J. Phys. 78, 648 (2010), DOI:10.1119/1.3254017.</p></blockquote>
<p>Piotr also sent me the following Python code he&#8217;d written to implement the method in the paper. This code produced the image above.</p>
<pre class="brush: plain; title: ; notranslate">
&quot;&quot;&quot;
Author: Piotr A. Zolnierczuk (zolnierczukp at ornl dot gov)

Based on a paper by:
Drawing an elephant with four complex parameters
Jurgen Mayer, Khaled Khairy, and Jonathon Howard,
Am. J. Phys. 78, 648 (2010), DOI:10.1119/1.3254017
&quot;&quot;&quot;
import numpy as np
import pylab

# elephant parameters
p1, p2, p3, p4 = (50 - 30j, 18 +  8j, 12 - 10j, -14 - 60j )
p5 = 40 + 20j # eyepiece

def fourier(t, C):
    f = np.zeros(t.shape)
    A, B = C.real, C.imag
    for k in range(len(C)):
        f = f + A[k]*np.cos(k*t) + B[k]*np.sin(k*t)
    return f

def elephant(t, p1, p2, p3, p4, p5):
    npar = 6
    Cx = np.zeros((npar,), dtype='complex')
    Cy = np.zeros((npar,), dtype='complex')

    Cx[1] = p1.real*1j
    Cx[2] = p2.real*1j
    Cx[3] = p3.real
    Cx[5] = p4.real

    Cy[1] = p4.imag + p1.imag*1j
    Cy[2] = p2.imag*1j
    Cy[3] = p3.imag*1j

    x = np.append(fourier(t,Cx), [-p5.imag])
    y = np.append(fourier(t,Cy), [p5.imag])

    return x,y

x, y = elephant(np.linspace(0,2*np.pi,1000), p1, p2, p3, p4, p5)
pylab.plot(y,-x,'.')
pylab.show()
</pre>
<p><strong>Related posts</strong>:</p>
<p><a href="http://www.johndcook.com/blog/2011/05/25/crude-models/">Advantages of crude models</a><br />
<a href="http://www.johndcook.com/blog/2011/01/12/occams-razor-bayes-theorem/">Occam&#8217;s razor and Bayes theorem</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2011/06/21/how-to-fit-an-elephant/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Impure math</title>
		<link>http://www.johndcook.com/blog/2011/06/15/impure-math/</link>
		<comments>http://www.johndcook.com/blog/2011/06/15/impure-math/#comments</comments>
		<pubDate>Wed, 15 Jun 2011 16:57:35 +0000</pubDate>
		<dc:creator>John</dc:creator>
				<category><![CDATA[Math]]></category>
		<category><![CDATA[Statistics]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=8733</guid>
		<description><![CDATA[When Samuel Hansen said in his interview &#8220;You&#8217;re not a pure mathematician&#8221; I agreed without thinking, but later the statement bothered me a little. I know what he meant: considering the two categories of pure math and applied math, you&#8217;d put yourself in the latter category. Which is true.
But the term &#8220;pure&#8221; math can be [...]]]></description>
			<content:encoded><![CDATA[<p>When Samuel Hansen said in his <a href="http://www.johndcook.com/blog/2011/06/14/interview-on-strongly-connected-components/">interview</a> &#8220;You&#8217;re not a pure mathematician&#8221; I agreed without thinking, but later the statement bothered me a little. I know what he meant: considering the two categories of pure math and applied math, you&#8217;d put yourself in the latter category. Which is true.</p>
<p>But the term &#8220;pure&#8221; math can be misleading, as if everyone else does <em>impure</em> math. Applied math is not an alternative to theoretical math. Applied mathematicians prove theorems etc. We work on applications <em>in addition</em> to doing what is expected of pure mathematicians. The difference between pure and applied math is motivation, not content. Applied math is motivated by direct application to non-mathematical problems. Pure math seeks to advance math for its own sake. <em>Both are important</em>.</p>
<p>Statistics uses the terms &#8220;theoretical&#8221; and &#8220;applied&#8221; rather than &#8220;pure&#8221; and &#8220;applied.&#8221; Math doesn&#8217;t use &#8220;theoretical&#8221; as an antithesis to &#8220;applied&#8221; because applied math is theoretical. But unlike math, being &#8220;applied&#8221; in statistics does mean you&#8217;re often (too often?) excused from proving theorems. The first time I was a coauthor on a statistics paper I was surprised to find out you could publish with just simulation results and no theorems. This happens in applied math as well, but not nearly as often as it does in applied statistics.</p>
<p>On the other hand, when I hear the term &#8220;applied statistics&#8221; I want to ask &#8220;Is there any other kind?&#8221; Statistics <em>is</em> applied (and theoretical!) though some statisticians work more directly on applications than others. As <a href="http://www.stat.columbia.edu/~cook/movabletype/archives/2007/11/how_to_tell_the.html">Andrew Gelman</a> quips, the difference between theoretical and applied statisticians is that</p>
<p style="padding-left: 30px;">The theoretical statistician uses <em>x</em>, the applied statistician uses <em>y</em> (because we reserve <em>x</em> for predictors).</p>
<p>I assume that statement wasn&#8217;t meant to be taken literally, but I agree with the sentiment that the distinction between theoretical and applied statistics can be exaggerated. I&#8217;d say the same applies to pure and applied math.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2011/06/15/impure-math/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Advantages of crude models</title>
		<link>http://www.johndcook.com/blog/2011/05/25/crude-models/</link>
		<comments>http://www.johndcook.com/blog/2011/05/25/crude-models/#comments</comments>
		<pubDate>Wed, 25 May 2011 12:27:28 +0000</pubDate>
		<dc:creator>John</dc:creator>
				<category><![CDATA[Math]]></category>
		<category><![CDATA[Statistics]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=8598</guid>
		<description><![CDATA[One advantage of crude models is that we know they are crude and will not try to read too much from them. With more sophisticated models,
… there is an awful temptation to squeeze the lemon until it is dry and to present a picture of the future which through its very precision and verisimilitude carries [...]]]></description>
			<content:encoded><![CDATA[<p>One advantage of crude models is that we know they are crude and will not try to read too much from them. With more sophisticated models,</p>
<blockquote><p>… there is an awful temptation to squeeze the lemon until it is dry and to present a picture of the future which through its very precision and verisimilitude carries conviction. Yet a man who uses an imaginary map, thinking it is a true one, is like to be worse off than someone with no map at all; for he will fail to inquire whenever he can, to observe every detail on his way, and to search continuously with all his senses and all his intelligence for indications of where he should go.</p></blockquote>
<p>From <a href="http://www.amazon.com/gp/product/0881791695/ref=as_li_ss_tl?ie=UTF8&amp;tag=theende-20&amp;linkCode=as2&amp;camp=217145&amp;creative=399349&amp;creativeASIN=0881791695">Small is Beautiful</a> by E. F. Schumacher.</p>
<p>Crude models are <a href="http://www.johndcook.com/blog/2008/07/08/whats-wrong-with-paper/">easier to implement</a>. They may also be <a href="http://www.johndcook.com/blog/2010/10/18/titanic-effect/">more robust</a> and <a href="http://www.johndcook.com/blog/2011/01/12/occams-razor-bayes-theorem/">better descriptions of reality</a>.</p>
<p>Obviously crude models are not always better. But I like to have some evidence that a complex model is worthwhile before I invest too much effort in it. And I&#8217;m well aware of forces that <a href="http://www.johndcook.com/blog/2010/04/05/rewarding-complexity/">reward complexity</a> for its own sake.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2011/05/25/crude-models/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Works well versus well understood</title>
		<link>http://www.johndcook.com/blog/2011/05/10/well-understood/</link>
		<comments>http://www.johndcook.com/blog/2011/05/10/well-understood/#comments</comments>
		<pubDate>Tue, 10 May 2011 18:19:48 +0000</pubDate>
		<dc:creator>John</dc:creator>
				<category><![CDATA[Clinical trials]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Probability and Statistics]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=8487</guid>
		<description><![CDATA[While I was looking up the Tukey quote in my earlier post, I ran another of his quotes:
The test of a good procedure is how well it works, not how well it is understood.
At some level, it&#8217;s hard to argue against this. Statistical procedures operate on empirical data, so it makes sense that the procedures [...]]]></description>
			<content:encoded><![CDATA[<p>While I was looking up the Tukey quote in my <a href="http://www.johndcook.com/blog/2011/05/09/does-the-answer-matter/">earlier post</a>, I ran another of his quotes:</p>
<blockquote><p>The test of a good procedure is how well it works, not how well it is understood.</p></blockquote>
<p>At some level, it&#8217;s hard to argue against this. Statistical procedures operate on empirical data, so it makes sense that the procedures themselves be evaluated empirically.</p>
<p>But I question whether we really know that a statistical procedure works well if it isn&#8217;t well understood. Specifically, I&#8217;m skeptical of complex statistical methods whose only credentials are a handful of simulations. &#8220;We don&#8217;t have any theoretical results, buy hey, it works well in practice. Just look at the simulations.&#8221;</p>
<p><strong>Every method works well on the scenarios its author publishes</strong>, almost by definition. If the method didn&#8217;t handle a scenario well, the author would publish a different scenario. Even if the author didn&#8217;t select the most flattering scenarios, he or she may simply not have considered unflattering scenarios. The latter is particularly understandable, almost inevitable.</p>
<p>Simulation results would have more credibility if an adversary rather than an advocate chose the scenarios. Even so, an adversary and an advocate may share the same blind spots and not explore certain situations. Unless there&#8217;s a way to argue that a set of scenarios adequately samples the space of possible inputs, it&#8217;s hard to have a great deal of confidence in a method based on simulation results alone.</p>
<p><strong>Related posts</strong>:</p>
<p><a href="http://www.johndcook.com/blog/2010/10/19/buggy-simulation-code-is-biased/">Buggy code is biased code</a><br />
<a href="http://www.johndcook.com/blog/2010/01/12/software-sins-of-omission/">Software sins of omission</a><br />
<a href="http://www.johndcook.com/blog/2011/01/12/occams-razor-bayes-theorem/">Occam&#8217;s razor and Bayes&#8217; theorem</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2011/05/10/well-understood/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Move on to the next question</title>
		<link>http://www.johndcook.com/blog/2011/05/09/does-the-answer-matter/</link>
		<comments>http://www.johndcook.com/blog/2011/05/09/does-the-answer-matter/#comments</comments>
		<pubDate>Mon, 09 May 2011 15:30:46 +0000</pubDate>
		<dc:creator>John</dc:creator>
				<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Probability and Statistics]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=8470</guid>
		<description><![CDATA[Here&#8217;s a recent discussion from Math Overflow.
Q: I have some data points and, when I plot them on R, it looks like a  normal distribution. I want to know how well my data fits the normal  distribution. What kind of test should I do?
A: There&#8217;s actually a much broader question that you should [...]]]></description>
			<content:encoded><![CDATA[<p>Here&#8217;s a recent discussion from <a href="http://mathoverflow.net/questions/63972/fit-to-a-normal-distribution">Math Overflow</a>.</p>
<blockquote><p><strong>Q</strong>: I have some data points and, when I plot them on R, it looks like a  normal distribution. I want to know how well my data fits the normal  distribution. What kind of test should I do?</p>
<p><strong>A</strong>: There&#8217;s actually a much broader question that you should be asking  yourself here: does it matter whether your data really is normally  distributed, or will the procedures that you&#8217;re going to perform on the  data be reasonably robust in the presence of a distribution that is only  approximately normal? …</p></blockquote>
<p>The person asking the question was already satisfied that his data were approximately normal. So it was time to move on to the next question: Does what I want to do next work well for approximately normal data? (There&#8217;s no point asking whether your data<strong> </strong><em>is</em> normal; it&#8217;s not. Normality is an idealization.)</p>
<p>We&#8217;re often tempted to add decimal places to the answer to one question instead of moving on to the next question. Maybe we don&#8217;t even realize what the next question should be. Or maybe we do know but we want stay with the familiar. In either case, this quote from John Tukey comes to mind.</p>
<blockquote><p>An approximate answer to the right problem is worth a good deal more than an exact answer to an approximate problem.</p></blockquote>
<p><strong>Related post</strong>:</p>
<p><a href="http://www.johndcook.com/blog/2010/08/11/what-distribution-does-my-data-have/">What distribution does my data have?</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2011/05/09/does-the-answer-matter/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Teaching Bayesian stats backward</title>
		<link>http://www.johndcook.com/blog/2011/04/20/teaching-bayesian-stats-backward/</link>
		<comments>http://www.johndcook.com/blog/2011/04/20/teaching-bayesian-stats-backward/#comments</comments>
		<pubDate>Wed, 20 Apr 2011 15:04:42 +0000</pubDate>
		<dc:creator>John</dc:creator>
				<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Bayesian]]></category>
		<category><![CDATA[Education]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=8362</guid>
		<description><![CDATA[Most presentations of Bayesian statistics I&#8217;ve seen start with elementary examples of Bayes&#8217; Theorem. And most of these use the canonical example of testing for rare diseases. But the connection between these examples and Bayesian statistics is not obvious at first. Maybe this isn&#8217;t the best approach.
What if we begin with the end in mind? [...]]]></description>
			<content:encoded><![CDATA[<p>Most presentations of Bayesian statistics I&#8217;ve seen start with elementary examples of Bayes&#8217; Theorem. And most of these use the canonical example of <a href="http://www.johndcook.com/rarediseases.pdf">testing for rare diseases</a>. But the connection between these examples and Bayesian statistics is not obvious at first. Maybe this isn&#8217;t the best approach.</p>
<p>What if we <a href="http://www.amazon.com/gp/product/0743269519/ref=as_li_ss_tl?ie=UTF8&amp;tag=theende-20&amp;linkCode=as2&amp;camp=217145&amp;creative=399349&amp;creativeASIN=0743269519">begin with the end in mind</a>? Bayesian calculations produce posterior probability distributions on parameters. An effective way to teach Bayesian statistics might be to start there. Suppose we had probability distributions on our parameters. Never mind where they came from. Never mind classical objections that say you can&#8217;t do this. What if you could? If you had such distributions, what could you do with them?</p>
<p>For starters, point estimation and interval estimation become trivial. You could, for example, use the distribution mean as a point estimate and the area between two quantiles as an interval estimate. The distributions tell you far more than  point estimates or interval estimates could; these estimates are simply summaries of the information contained in the distributions.</p>
<p>It makes logical sense to start with Bayes&#8217; Theorem since that&#8217;s the tool used to construct posterior distributions. But I think it makes <em>pedagogical</em> sense to start with the posterior distribution and work backward to how one would come up with such a thing.</p>
<p>Bayesian statistics is so named because Bayes&#8217; Theorem is essential to its calculations. But that&#8217;s a little like classical statistics Central Limitist statistics because it relies heavily on the Central Limit Theorem.</p>
<p>The key idea of Bayesian statistics is to represent all uncertainty by probability distributions. That idea can be obscured by an early emphasis on calculations.</p>
<p><strong>Related posts</strong>:</p>
<p><a href="http://www.johndcook.com/blog/2011/02/02/david-spiegelhalter/">Interview with David Spiegelhalter</a><br />
<a href="http://www.johndcook.com/blog/2011/01/12/occams-razor-bayes-theorem/">Occam&#8217;s razor and Bayes&#8217; theorem</a><br />
<a href="http://www.johndcook.com/blog/2009/04/28/reasons-to-use-bayesian-inference/">Four reasons to use Bayesian inference</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2011/04/20/teaching-bayesian-stats-backward/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>Significance testing and Congress</title>
		<link>http://www.johndcook.com/blog/2011/04/14/significance-testing-and-congress/</link>
		<comments>http://www.johndcook.com/blog/2011/04/14/significance-testing-and-congress/#comments</comments>
		<pubDate>Thu, 14 Apr 2011 13:55:51 +0000</pubDate>
		<dc:creator>John</dc:creator>
				<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Bayesian]]></category>
		<category><![CDATA[Probability and Statistics]]></category>
		<category><![CDATA[Science]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=8310</guid>
		<description><![CDATA[The US Supreme Court&#8217;s criticism of significance testing has been in the news lately. Here&#8217;s a criticism of significance testing involving the US Congress. Consider the following syllogism.

If a person is an American, he is not a member of Congress.
This person is a member of Congress.
Therefore he is not American.

The initial premise is false, but [...]]]></description>
			<content:encoded><![CDATA[<p>The US Supreme Court&#8217;s criticism of significance testing has been in the news lately. Here&#8217;s a criticism of significance testing involving the US Congress. Consider the following syllogism.</p>
<ol>
<li>If a person is an American, he is not a member of Congress.</li>
<li>This person is a member of Congress.</li>
<li>Therefore he is not American.</li>
</ol>
<p>The initial premise is false, but the reasoning is correct if we assume the initial premise is true.</p>
<p>The premise that Americans are never members of Congress is clearly false. But it&#8217;s almost true! The probability of an American being a member of Congress is quite small, about 535/309,000,000. So what happens if we try to salvage the syllogism above by inserting &#8220;probably&#8221; in the initial premise and conclusion?</p>
<ol>
<li>If a person is an American, he is <strong>probably</strong> not a member of Congress.</li>
<li>This person is a member of Congress.</li>
<li>Therefore he is <strong>probably</strong> not American.</li>
</ol>
<p>What went wrong? The probability is backward. We want to know the   probability that someone is American given he is a member of   Congress, not the probability he is a member of Congress given he is American.</p>
<p>Science continually uses flawed reasoning analogous to the example above. We start with a &#8220;null hypothesis,&#8221; a hypothesis we seek to disprove. If our data are highly unlikely assuming this hypothesis, we reject that hypothesis.</p>
<ol>
<li>If the null hypothesis is correct, then these data are highly unlikely.</li>
<li>These data have occurred.</li>
<li>Therefore, the null hypothesis is highly unlikely.</li>
</ol>
<p>Again the probability is backward. We want to know the probability of the <em>hypothesis</em> given the <em>data</em>, not the probability of the <em>data</em> given the <em>hypothesis</em>.</p>
<p>We can&#8217;t reject a null hypothesis just because we&#8217;ve seen data that are rare under this hypothesis. Maybe our data are even more rare under the alternative. It is rare for an American to be in Congress, but it is even more rare for someone who is not American to be in the US Congress!</p>
<p>I found this illustration in <a href="http://faculty.washington.edu/tgill/Earth%20is%20Round%20p%2005.pdf">The Earth is Round (p &lt; 0.05)</a> by Jacob Cohen (1994). Cohen in turn credits Pollard and Richardson (1987) in his references.</p>
<p><strong>Related posts</strong>:</p>
<p><a href="http://www.johndcook.com/blog/2011/04/13/pericchi-statistical-significance/">How insignificant is significance testing?</a><br />
<a href="http://www.johndcook.com/blog/2008/11/18/five-criticisms-of-significance-testing/">Five criticisms of significance testing</a><br />
<a href="http://www.johndcook.com/blog/2008/02/07/most-published-research-results-are-false/">Most published research results are false</a><br />
<a href="http://www.johndcook.com/blog/2009/05/04/classical-statistics-in-a-nutshell/">Classical statistics in a nutshell</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2011/04/14/significance-testing-and-congress/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>How insignificant is statistical significance?</title>
		<link>http://www.johndcook.com/blog/2011/04/13/pericchi-statistical-significance/</link>
		<comments>http://www.johndcook.com/blog/2011/04/13/pericchi-statistical-significance/#comments</comments>
		<pubDate>Wed, 13 Apr 2011 20:48:56 +0000</pubDate>
		<dc:creator>John</dc:creator>
				<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Probability and Statistics]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=8297</guid>
		<description><![CDATA[Luis Pericchi sent me a brief note commenting on the recent US Supreme Court decision involving statistical significance and medical reporting. Here is his paper, about a page and a half.
How insignificant is statistical significance? (PDF)
Related post: Significance testing and Congress
]]></description>
			<content:encoded><![CDATA[<p>Luis Pericchi sent me a brief note commenting on the recent <a href="http://online.wsj.com/article/SB10001424052748703712504576235683249040812.html">US Supreme Court decision</a> involving statistical significance and medical reporting. Here is his paper, about a page and a half.</p>
<p style="padding-left: 30px;"><a href="http://www.johndcook.com/SupremeCourtRuling2.pdf">How insignificant is statistical significance?</a> (PDF)</p>
<p>Related post: <a href="http://www.johndcook.com/blog/2011/04/14/significance-testing-and-congress/">Significance testing and Congress</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2011/04/13/pericchi-statistical-significance/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Saved by symmetry</title>
		<link>http://www.johndcook.com/blog/2011/03/31/saved-by-symmetry/</link>
		<comments>http://www.johndcook.com/blog/2011/03/31/saved-by-symmetry/#comments</comments>
		<pubDate>Thu, 31 Mar 2011 15:24:44 +0000</pubDate>
		<dc:creator>John</dc:creator>
				<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Integration]]></category>
		<category><![CDATA[Math]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=8178</guid>
		<description><![CDATA[When I solve a problem by appealing to symmetry, students&#8217; jaws drop. They look at me as if I&#8217;d pulled a rabbit out of a hat. 
I used think of these tricks as common knowledge, but now I think they&#8217;re common knowledge in some circles (e.g. physics) and not as common in others. These tricks [...]]]></description>
			<content:encoded><![CDATA[<p>When I solve a problem by appealing to symmetry, students&#8217; jaws drop. They look at me as if I&#8217;d pulled a rabbit out of a hat. </p>
<p>I used think of these tricks as common knowledge, but now I think they&#8217;re common knowledge in some circles (e.g. physics) and not as common in others. These tricks are simple, but not as many people as I&#8217;d thought have been trained to spot opportunities to apply them.</p>
<p><span id="more-8178"></span></p>
<p>Here&#8217;s an example. </p>
<p><strong>Pop quiz 1</strong>: Evaluate the following intimidating integral.</p>
<p style="text-align:center"><img src="http://www.johndcook.com/odd_integrand.png" alt="\int_{-\infty}^\infty x \log(1 + x^2) e^{-x^2}\, dx" width="178" height="42" /></p>
<p><strong>Solution</strong>: Zero, by symmetry, because the integrand is odd. </p>
<p>The integrand is an odd function (i.e. f(-x) = -f(x)), and the integrand of an odd function over a symmetric interval is zero. This is because the region below the <em>x</em>-axis is symmetric to the region above the <em>x</em>-axis as the following graph shows.</p>
<p style="text-align:center"><img src="http://www.johndcook.com/odd_graph.png" alt="plot of x \log(1 + x^2) exp(-x^2)" width="360" height="223" /></p>
<p>The example above is not artificial: similar calculations come up constantly in applications. For example, the Fourier series of an odd function contains only sine terms, no cosine terms. Why? The integrals to compute the Fourier coefficients for the cosine terms all involve odd functions integrated over an interval symmetric about zero.</p>
<p>Another common application of symmetry is evaluating derivatives. The derivative of an odd function is an even function and <em>vice versa</em>.</p>
<p><strong>Pop quiz 2</strong>: What is the coefficient of <em>x</em><sup>5</sup> in the Taylor series of cos(1 + x<sup>2</sup>)?</p>
<p><strong>Solution</strong>: Zero, by symmetry, because cos(1 + x<sup>2</sup>) is an even function.</p>
<p>Odd functions of <em>x</em> have only odd powers of x in their Taylor series and even functions have only even powers of <em>x</em> in their Taylor series. Why? Because the coefficients come from derivatives evaluated at zero. </p>
<p>If <em>f</em>(<em>x</em>) is an odd function, all of its even derivatives are odd functions. These derivatives are zero at <em>x</em> = 0, and so all the coefficients of even powers of <em>x</em> are zero. A similar argument shows that even functions have only even powers of <em>x</em> in their Taylor series.</p>
<p>Symmetry tricks are obvious in hindsight. The hard part is learning to recognize when they apply. Symmetries are harder to recognize, but also more valuable, in complex situations. The key is to think about the problem you&#8217;re trying to solve before you dive into heads-down calculation.</p>
<p><strong>Related posts</strong>:</p>
<p><a href="http://www.johndcook.com/blog/2008/10/14/api-symmetry/">API symmetry</a><br />
<a href="http://www.johndcook.com/blog/2010/03/15/adding-simplicity/">Adding simplicity</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2011/03/31/saved-by-symmetry/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>A support one-liner</title>
		<link>http://www.johndcook.com/blog/2011/03/15/a-support-one-liner/</link>
		<comments>http://www.johndcook.com/blog/2011/03/15/a-support-one-liner/#comments</comments>
		<pubDate>Tue, 15 Mar 2011 16:11:45 +0000</pubDate>
		<dc:creator>John</dc:creator>
				<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Probability and Statistics]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=8078</guid>
		<description><![CDATA[This morning I had a fun support request related to our software. The exchange took place over email but it could have fit into a couple Twitter messages. Would that all requests could be answered so succinctly.
Question:
Do you have R code to compute P(X &#62; Y) where X ~ gamma(ax, bx) and Y ~ gamma(ay, [...]]]></description>
			<content:encoded><![CDATA[<p>This morning I had a fun support request related to our software. The exchange took place over email but it could have fit into a couple Twitter messages. Would that all requests could be answered so succinctly.</p>
<p>Question:</p>
<blockquote><p>Do you have R code to compute P(X &gt; Y) where X ~ gamma(ax, bx) and Y ~ gamma(ay, by)?</p></blockquote>
<p>Response:</p>
<blockquote><p>ineq &lt;- function(ax, bx, ay, by) pbeta(bx/(bx+by), ay, ax)</p></blockquote>
<p>For more on the problem and the solution, see <a href="http://www.bepress.com/mdandersonbiostat/paper54/">Exact calculation of inequality probabilities</a>.</p>
<p><strong>Related links</strong>:</p>
<p><a href="https://biostatistics.mdanderson.org/SoftwareDownload/SingleSoftware.aspx?Software_Id=9">Inequality Calculator software</a><br />
<a href="http://www.johndcook.com/blog/2009/11/20/random-inequalities-ix/">Blog posts on random inequalities</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2011/03/15/a-support-one-liner/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Absence of evidence</title>
		<link>http://www.johndcook.com/blog/2011/02/22/absence-of-evidence/</link>
		<comments>http://www.johndcook.com/blog/2011/02/22/absence-of-evidence/#comments</comments>
		<pubDate>Tue, 22 Feb 2011 13:54:40 +0000</pubDate>
		<dc:creator>John</dc:creator>
				<category><![CDATA[Science]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Probability and Statistics]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=7888</guid>
		<description><![CDATA[Here&#8217;s a little saying that irritates me:
Absence of evidence is not evidence of absence.
It&#8217;s the kind of thing a Sherlock Holmes-like character might say in a detective novel. The idea is that we can&#8217;t be sure something doesn&#8217;t exist just because we haven&#8217;t seen it yet.
What bothers me is that the statement misuses the word [...]]]></description>
			<content:encoded><![CDATA[<p>Here&#8217;s a little saying that irritates me:</p>
<blockquote><p>Absence of evidence is not evidence of absence.</p></blockquote>
<p>It&#8217;s the kind of thing a Sherlock Holmes-like character might say in a detective novel. The idea is that we can&#8217;t be sure something doesn&#8217;t exist just because we haven&#8217;t seen it yet.</p>
<p>What bothers me is that the statement misuses the word &#8220;evidence.&#8221; The statement would be correct if we substituted &#8220;proof&#8221; for &#8220;evidence.&#8221; We can&#8217;t conclude with absolute certainty that something doesn&#8217;t exist just because we haven&#8217;t yet proved that it does. But <em>evidence</em> is not the same as <em>proof</em>.</p>
<p>Why do we believe that dodo birds are extinct? Because no one has seen one in three centuries. That is, there is an absence of evidence that they exist. That is tantamount to evidence that they do not exist. It&#8217;s logically possible that a dodo bird is alive and well somewhere, but there is overwhelming evidence to suggest this is not the case.</p>
<p>Evidence can lead to the wrong conclusion. Why did scientists believe that the coelacanth was extinct? Because no one had seen one except in fossils. The species was believed to have gone extinct 65 million years ago. But in 1938 a fisherman caught one. Absence of evidence is not <em>proof</em> of absence.</p>
<p style="text-align:center"><img src="http://www.johndcook.com/coelacanth.jpeg" alt="coelacanth, a fish once thought to be extinct" width="400" height="145" /></p>
<p>Though it is not proof, absence of evidence is <strong>unusually strong</strong><em> </em>evidence due to subtle statistical result. Compare the following two scenarios.</p>
<p><strong>Scenario 1</strong>: You&#8217;ve sequenced the DNA of a large number prostate tumors and found that not one had a particular genetic mutation. How confident can you be that prostate tumors never have this mutation?</p>
<p><strong>Scenario 2</strong>: You&#8217;ve found that 40% of prostate tumors in your sample have a particular mutation. How confident can you be that 40% of all prostate tumors have this mutation?</p>
<p>It turns out you can have more confidence in the first scenario than the second. If you&#8217;ve tested <em>N</em> subjects and not found the mutation, the length of your confidence interval around zero is proportional to <em>N</em>. But if you&#8217;ve tested <em>N</em> subjects and found the mutation in 40% of subjects, the length of your confidence interval around 0.40 is proportional to √<em>N</em>. So, for example, if <em>N</em> = 10,000 then the former interval has length on the order of 1/10,000 while the latter interval has length on the order of 1/100. This is known as the <strong>rule of three</strong>. You can find both a frequentist and a Bayesian justification of the rule <a href="http://www.johndcook.com/blog/2010/03/30/statistical-rule-of-three/">here</a>.</p>
<p>Absence of evidence is unusually strong evidence that something is at least rare, though it&#8217;s not proof. Sometimes you catch a coelacanth.</p>
<p><strong>Related posts</strong>:</p>
<p><a href="http://www.johndcook.com/blog/2010/03/30/statistical-rule-of-three/">Estimating the chances of something that hasn&#8217;t happened</a><br />
<a href="http://www.johndcook.com/blog/2008/01/10/complementary-validation/">Complementary validation</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2011/02/22/absence-of-evidence/feed/</wfw:commentRss>
		<slash:comments>27</slash:comments>
		</item>
		<item>
		<title>Like Laplace, only more so</title>
		<link>http://www.johndcook.com/blog/2011/02/17/like-laplace-only-more-so/</link>
		<comments>http://www.johndcook.com/blog/2011/02/17/like-laplace-only-more-so/#comments</comments>
		<pubDate>Thu, 17 Feb 2011 15:21:59 +0000</pubDate>
		<dc:creator>John</dc:creator>
				<category><![CDATA[Math]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Bayesian]]></category>
		<category><![CDATA[Probability and Statistics]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=7834</guid>
		<description><![CDATA[The Laplace distribution is pointy in the middle and fat in the tails relative to the normal distribution.This post is about a probability distribution that is more pointy in the middle and fatter in the tails.

Here are pictures of the normal and Laplace (a.k.a. double exponential) distributions.
Normal:

Laplace:

The normal density is proportional to exp(- x2/2) and [...]]]></description>
			<content:encoded><![CDATA[<p>The Laplace distribution is pointy in the middle and fat in the tails relative to the normal distribution.This post is about a probability distribution that is more pointy in the middle and fatter in the tails.</p>
<p><span id="more-7834"></span></p>
<p>Here are pictures of the normal and Laplace (a.k.a. double exponential) distributions.</p>
<p>Normal:</p>
<p style="text-align:center"><img src="http://www.johndcook.com/normal_022011.png" alt="" width="360" height="235" /></p>
<p>Laplace:</p>
<p style="text-align:center"><img src="http://www.johndcook.com/laplace_022011.png" alt="" width="360" height="235" /></p>
<p>The normal density is proportional to exp(- <em>x</em><sup>2</sup>/2) and the Laplace distribution is proportional to exp(-|<em>x</em>|). Near the origin, the normal density looks like 1 &#8211; <em>x</em><sup>2</sup>/2 and the Laplace density looks like 1 &#8211; |<em>x</em>|. And as <em>x</em> gets large, the normal density goes to zero much faster than the Laplace.</p>
<p>Now let&#8217;s look at the distribution with density</p>
<p style="text-align: center;"><img src="http://www.johndcook.com/log1pxn2.png" alt="\log\left( 1 + \frac{1}{x^2} \right)" width="153" height="42" /></p>
<p>I don&#8217;t know a name for this. I asked on <a href="http://stats.stackexchange.com/questions/7029/does-the-distribution-log1-x-2-2-pi-have-a-name">Cross Validated</a> whether there was a name for this distribution and no knew of one. The density is related to the bounds on a density presented in <a href="http://dx.doi.org/10.1093/biomet/asq017">this paper</a>. Here&#8217;s a plot.</p>
<p style="text-align:center"><img src="http://www.johndcook.com/log1p_022011.png" alt="" width="360" height="235" /></p>
<p>The density is unbounded near the origin, blowing up like -2 log( |<em>x</em>| ) as <em>x</em> approaches 0, and so is more pointed than the Laplace density. As x becomes large, log(1 + <em>x</em><sup>-2</sup>) is asymptotically <em>x</em><sup>-2</sup> so the distribution has the same tail behavior as a Cauchy distribution, much heavier tailed than the Laplace density.</p>
<p>Here&#8217;s a plot of this new density and the Laplace density together to make the contrast more clear.</p>
<p style="text-align:center"><img src="http://www.johndcook.com/laplace2.png" alt="" width="360" height="235" /></p>
<p>As William Huber pointed out in his answer on Cross Validated, this density has a closed-form CDF:</p>
<p style="text-align: center;">F(<em>x</em>) = 1/2 + (arctan(<em>x</em>) &#8211; <em>x</em> log( sin( arctan(<em>x</em>) ) ))/π</p>
<p>The paper mentioned above used a similar density as a Bayesian prior distribution in situations where many observations were expected to be small, though large values were expected as well.</p>
<p><strong>Related posts:</strong></p>
<p><a href="http://www.johndcook.com/distribution_chart.html">Probability distribution relationship chart</a><br />
<a href="http://www.johndcook.com/blog/2010/08/30/robust-prior-illustration/">Robust prior illustration</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2011/02/17/like-laplace-only-more-so/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>The end of hard-edged science?</title>
		<link>http://www.johndcook.com/blog/2011/02/14/the-end-of-hard-edged-science/</link>
		<comments>http://www.johndcook.com/blog/2011/02/14/the-end-of-hard-edged-science/#comments</comments>
		<pubDate>Mon, 14 Feb 2011 14:06:02 +0000</pubDate>
		<dc:creator>John</dc:creator>
				<category><![CDATA[Science]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Bayesian]]></category>
		<category><![CDATA[Probability and Statistics]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=7798</guid>
		<description><![CDATA[Bradley Efron says that science is moving away from things like predicting sunrise times and toward predicting things like the weather. The trend is away from studying precisely predictable systems, what Efron calls &#8220;hard-edged science,&#8221; and toward studying systems &#8220;where predictability is tempered by a heavy dose of randomness.&#8221;
Hard-edged science still dominates public perceptions, but [...]]]></description>
			<content:encoded><![CDATA[<p>Bradley Efron says that science is moving away from things like predicting sunrise times and toward predicting things like the weather. The trend is away from studying precisely predictable systems, what Efron calls &#8220;hard-edged science,&#8221; and toward studying systems &#8220;where predictability is tempered by a heavy dose of randomness.&#8221;</p>
<blockquote><p>Hard-edged science still dominates public perceptions, but the attention of modern scientists has swung heavily toward rainfall-like subjects, the kind where random behavior plays a major role. … Deterministic Newtonian science is majestic, and the basis of modern science too, but a few hundred years of it pretty much exhausted nature’s storehouse of precisely predictable events. Subjects like biology, medicine, and economics require a more flexible scientific world view, the kind we statisticians are trained to understand.</p></blockquote>
<p>Certainly there is increased interest in systems containing &#8220;a heavy dose of randomness&#8221; but can we really say that we have &#8220;pretty much exhausted nature&#8217;s storehouse of precisely predictable effects&#8221;?</p>
<p>Source: <a href="http://www-stat.stanford.edu/~ckirby/brad/papers/2005NEWModernScience.pdf">Modern Science and the Bayesian-Frequentist Controversy</a></p>
<p><strong>Related posts</strong>:</p>
<p><a href="http://www.johndcook.com/blog/2011/01/17/scientific-results-fading-over-time/">Scientific results fading over time</a><br />
<a href="http://www.johndcook.com/blog/2011/01/12/occams-razor-bayes-theorem/">Occam&#8217;s razor and Bayes&#8217; theorem</a><br />
<a href="http://www.johndcook.com/blog/2010/02/25/the-law-of-medium-numbers/">The law of medium numbers</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2011/02/14/the-end-of-hard-edged-science/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>Interview with David Spiegelhalter</title>
		<link>http://www.johndcook.com/blog/2011/02/02/david-spiegelhalter/</link>
		<comments>http://www.johndcook.com/blog/2011/02/02/david-spiegelhalter/#comments</comments>
		<pubDate>Thu, 03 Feb 2011 04:25:33 +0000</pubDate>
		<dc:creator>John</dc:creator>
				<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Bayesian]]></category>
		<category><![CDATA[Probability and Statistics]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=7721</guid>
		<description><![CDATA[Samuel Hansen interviews David Spiegelhalter on his mathematical podcast Strongly Connected Components. From the show notes:
On today’s episode of Strongly Connected Components Samuel Hansen called  up the Winton Professor for the Public Understanding of Risk, as well  as Senior Scientist in the MRC Biostatistics Unit, David Spiegelhalter.  They discussed the true meaning [...]]]></description>
			<content:encoded><![CDATA[<p>Samuel Hansen interviews David Spiegelhalter on his mathematical podcast <a href="http://acmescience.com/shows/scc-shows/708">Strongly Connected Components</a>. From the show notes:</p>
<blockquote><p>On today’s episode of Strongly Connected Components Samuel Hansen called  up the Winton Professor for the Public Understanding of Risk, as well  as Senior Scientist in the MRC Biostatistics Unit, David Spiegelhalter.  They discussed the true meaning of risk, the importance of the Bayesian  Method, how to get a lot of citations, and even a bit about the bookies.</p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2011/02/02/david-spiegelhalter/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>When it works, it works really well</title>
		<link>http://www.johndcook.com/blog/2011/01/27/when-it-works-it-works-really-well/</link>
		<comments>http://www.johndcook.com/blog/2011/01/27/when-it-works-it-works-really-well/#comments</comments>
		<pubDate>Thu, 27 Jan 2011 13:30:37 +0000</pubDate>
		<dc:creator>John</dc:creator>
				<category><![CDATA[Business]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Math]]></category>
		<category><![CDATA[Probability and Statistics]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=7670</guid>
		<description><![CDATA[Stephen Stigler [1] compares least-squares methods to the iPhone:
In the United States many consumers are entranced by the magic of the new iPhone, even though they can only use it with the AT&#38;T system, a system noted for spotty coverage &#8212; even no receivable signal at all under some conditions. But the magic available when [...]]]></description>
			<content:encoded><![CDATA[<p>Stephen Stigler [1] compares least-squares methods to the iPhone:</p>
<blockquote><p>In the United States many consumers are entranced by the magic of the new iPhone, even though they can only use it with the AT&amp;T system, a system noted for spotty coverage &#8212; even no receivable signal at all under some conditions. But the magic available when it does work overwhelms the very real shortcomings. Just so, least-squares will remain the tool of choice unless someone concocts a robust methodology that can perform the same magic, a step that would require the suspension of the laws of mathematics.</p></blockquote>
<p>In other words, least-squares, like the iPhone, <strong>works so well when it does work that it&#8217;s OK that it fails miserably now and then</strong>. Maybe so, but that depends on context.</p>
<p>In his quote, Stigler argues that Americans feel that missing a phone call occasionally is an acceptable trade-off for the features of the iPhone. Many people would agree. But if you&#8217;re If you&#8217;re on a transplant waiting list, you might prefer more reliable coverage to a nicer phone.</p>
<p>It&#8217;s not enough to talk about <em>probabilities</em> of failure without also talking about <em>consequences</em> of failure. For example, the consequences of missing a phone call are greater for some people than for others.</p>
<p>Least-squares is a mathematically convenient way to place a cost on errors: the cost is proportional to the square of the size of the error. That&#8217;s often reasonable in application, but not always. In some applications, the cost is simply proportional to the size of error. In other applications, it doesn&#8217;t matter how large an error is once it above some threshold. Sometimes the cost of errors is asymmetric: over-estimating has a different cost than under-estimating by the same amount. Sometimes you&#8217;re more worried about the worst case than the average case. One size does not fit all.</p>
<p>[1] Stephen M. Stigler, The Changing History of Robustness, American Statistician, Vol. 64, No. 4. November 2010. (Written before Verizon announced it would be supporting the iPhone)</p>
<p><strong>Related posts</strong>:</p>
<p><a href="http://www.johndcook.com/blog/2011/01/24/more-theoretical-power-less-real-power/">More theoretical power, less real power</a><br />
<a href="http://www.johndcook.com/blog/2009/01/28/cost-benefit-analysis-versus-benefit-only-analysis/">Cost-benefit analysis versus benefit-only analysis</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2011/01/27/when-it-works-it-works-really-well/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>More theoretical power, less real power</title>
		<link>http://www.johndcook.com/blog/2011/01/24/more-theoretical-power-less-real-power/</link>
		<comments>http://www.johndcook.com/blog/2011/01/24/more-theoretical-power-less-real-power/#comments</comments>
		<pubDate>Mon, 24 Jan 2011 15:31:19 +0000</pubDate>
		<dc:creator>John</dc:creator>
				<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Probability and Statistics]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=7646</guid>
		<description><![CDATA[Suppose you&#8217;re deciding between two statistical methods. You pick the one that has more power. This increases your chances of making a correct decision in theory while possibly lowering your chances of actually concluding the truth. The subtle trap is that the meaning of &#8220;in theory&#8221; changes because you have two competing theories.
When you compare [...]]]></description>
			<content:encoded><![CDATA[<p>Suppose you&#8217;re deciding between two statistical methods. You pick the one that has more power. This increases your chances of making a correct decision <em>in theory</em> while possibly lowering your chances of actually concluding the truth. The subtle trap is that the meaning of &#8220;in theory&#8221; changes because you have two competing theories.</p>
<p>When you compare the power of two methods, you&#8217;re evaluating each method&#8217;s probability of success <em>under its own assumptions</em>. In other words, <strong>you&#8217;re picking the method that has the better opinion of itself</strong>. Thus the more powerful method is not necessarily the method that has the better chance of leading you to a correct conclusion.</p>
<p>Comparing power alone is not enough. You also need to evaluate whether a method makes realistic assumptions and whether it is robust to deviations from its assumptions.</p>
<p><strong>Related posts</strong>:<br />
<a href="http://www.johndcook.com/blog/2008/02/07/most-published-research-results-are-false/"><br />
Most published research results are false</a><br />
<a href="http://www.johndcook.com/blog/2009/03/11/robust-statistics/">Canonical examples from robust statistics</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2011/01/24/more-theoretical-power-less-real-power/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>A couple preprints</title>
		<link>http://www.johndcook.com/blog/2011/01/20/a-couple-preprints/</link>
		<comments>http://www.johndcook.com/blog/2011/01/20/a-couple-preprints/#comments</comments>
		<pubDate>Thu, 20 Jan 2011 14:47:51 +0000</pubDate>
		<dc:creator>John</dc:creator>
				<category><![CDATA[Clinical trials]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Bayesian]]></category>
		<category><![CDATA[Biostatistics]]></category>
		<category><![CDATA[Probability and Statistics]]></category>

		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=7626</guid>
		<description><![CDATA[Here are a couple new preprints.
Block-adaptive randomization.
A proposed method for limiting the size of runs in a response-adaptive clinical trial.
Skeptical and optimistic robust priors for clinical trials.
Joint work with Jairo Fúquene and Luis Pericchi from University of Puerto Rico.
]]></description>
			<content:encoded><![CDATA[<p>Here are a couple new preprints.</p>
<p><a href="http://www.bepress.com/mdandersonbiostat/paper63/">Block-adaptive randomization</a>.<br />
A proposed method for limiting the size of runs in a response-adaptive clinical trial.</p>
<p><a href="http://www.johndcook.com/SkepticalOptimistic.pdf">Skeptical and optimistic robust priors for clinical trials</a>.<br />
Joint work with Jairo Fúquene and Luis Pericchi from University of Puerto Rico.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.johndcook.com/blog/2011/01/20/a-couple-preprints/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

<!-- Dynamic Page Served (once) in 1.119 seconds -->

