<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: The cult of significance testing</title>
	<atom:link href="http://www.johndcook.com/blog/2008/12/03/the-cult-of-significance-testing/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.johndcook.com/blog/2008/12/03/the-cult-of-significance-testing/</link>
	<description>The blog of John D. Cook</description>
	<lastBuildDate>Thu, 18 Mar 2010 03:02:57 -0400</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: John Venier</title>
		<link>http://www.johndcook.com/blog/2008/12/03/the-cult-of-significance-testing/comment-page-1/#comment-10569</link>
		<dc:creator>John Venier</dc:creator>
		<pubDate>Fri, 05 Dec 2008 19:13:42 +0000</pubDate>
		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=885#comment-10569</guid>
		<description>It seems that a lot of lucrative drugs are found to have &#039;better&#039; replacements just as their patents are expiring.  The manufacturer holds the patent on the replacement, naturally.  Often these replacements are metabolites of the original drug which are found to have the same action as the original drug.  I suspect that the studies which demonstrate a (statistically) significant increase in effectiveness are extremely overpowered and that the clinical significance is nil.  

Another &#039;improvement&#039; strategy is simply using the more active stereoisomer (enantiomer) exclusively instead of both.  The non-racemic drug can be patented, named, and marketed as a new drug, resulting in literally billions of dollars of profit.  Prilosec and Nexium are a great example of this.  I would imagine that with billions of dollars at stake AstraZenica tried every way imaginable to demonstrate any statistically significant improvement, however meager.  From what I understand (and I am no specialist) there is actually no clinically meaningful difference between the drugs, but Nexium has a shiny new patent and Prilosec is now available as a generic OTC.</description>
		<content:encoded><![CDATA[<p>It seems that a lot of lucrative drugs are found to have &#8216;better&#8217; replacements just as their patents are expiring.  The manufacturer holds the patent on the replacement, naturally.  Often these replacements are metabolites of the original drug which are found to have the same action as the original drug.  I suspect that the studies which demonstrate a (statistically) significant increase in effectiveness are extremely overpowered and that the clinical significance is nil.  </p>
<p>Another &#8216;improvement&#8217; strategy is simply using the more active stereoisomer (enantiomer) exclusively instead of both.  The non-racemic drug can be patented, named, and marketed as a new drug, resulting in literally billions of dollars of profit.  Prilosec and Nexium are a great example of this.  I would imagine that with billions of dollars at stake AstraZenica tried every way imaginable to demonstrate any statistically significant improvement, however meager.  From what I understand (and I am no specialist) there is actually no clinically meaningful difference between the drugs, but Nexium has a shiny new patent and Prilosec is now available as a generic OTC.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: John Johnson</title>
		<link>http://www.johndcook.com/blog/2008/12/03/the-cult-of-significance-testing/comment-page-1/#comment-10567</link>
		<dc:creator>John Johnson</dc:creator>
		<pubDate>Fri, 05 Dec 2008 18:18:28 +0000</pubDate>
		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=885#comment-10567</guid>
		<description>What amazes me is all the Phase 2 studies that &quot;fail to meet endpoint&quot; or &quot;miss statistical significance&quot; when those studies aren&#039;t even powered in the first place. Or, even worse, the practice of &quot;p-value fishing,&quot; i.e. the attempt to cover up a worthless drug by showing a p-value less than 0.05.

This &quot;cult of statistical significance&quot; has led to many bizarre behaviors and habits in the scientific community, ones I think we would do well to expunge.</description>
		<content:encoded><![CDATA[<p>What amazes me is all the Phase 2 studies that &#8220;fail to meet endpoint&#8221; or &#8220;miss statistical significance&#8221; when those studies aren&#8217;t even powered in the first place. Or, even worse, the practice of &#8220;p-value fishing,&#8221; i.e. the attempt to cover up a worthless drug by showing a p-value less than 0.05.</p>
<p>This &#8220;cult of statistical significance&#8221; has led to many bizarre behaviors and habits in the scientific community, ones I think we would do well to expunge.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: ekzept</title>
		<link>http://www.johndcook.com/blog/2008/12/03/the-cult-of-significance-testing/comment-page-1/#comment-10511</link>
		<dc:creator>ekzept</dc:creator>
		<pubDate>Thu, 04 Dec 2008 04:31:19 +0000</pubDate>
		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=885#comment-10511</guid>
		<description>There is a heartening discussion of this in Kraemer and Thiemann, &lt;em&gt;How Many Subjects? Statistical Power Analysis In Research&lt;/em&gt;. The other context it is dealt with is in the careful comparison of Bayesian vs Frequentist decision theory, per J.O.Berger&#039;s treatment, &lt;em&gt;Statistical Decision Theory and Bayesian Analysis&lt;/em&gt; which I found thorough but difficult.  Finally, a recent one: D.J.Murdoch, Y.-L. Tsai, J.Adcock, &quot;P-values are random variables&quot;, &lt;em&gt;The American Statistician&lt;/em&gt;, 62(3), 242-245.</description>
		<content:encoded><![CDATA[<p>There is a heartening discussion of this in Kraemer and Thiemann, <em>How Many Subjects? Statistical Power Analysis In Research</em>. The other context it is dealt with is in the careful comparison of Bayesian vs Frequentist decision theory, per J.O.Berger&#8217;s treatment, <em>Statistical Decision Theory and Bayesian Analysis</em> which I found thorough but difficult.  Finally, a recent one: D.J.Murdoch, Y.-L. Tsai, J.Adcock, &#8220;P-values are random variables&#8221;, <em>The American Statistician</em>, 62(3), 242-245.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: John Venier</title>
		<link>http://www.johndcook.com/blog/2008/12/03/the-cult-of-significance-testing/comment-page-1/#comment-10489</link>
		<dc:creator>John Venier</dc:creator>
		<pubDate>Wed, 03 Dec 2008 18:28:42 +0000</pubDate>
		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=885#comment-10489</guid>
		<description>Amen to that.  Some examples come to mind -- Linus Pauling&#039;s testing of the benefits of vitamin C showed statistical significance which was clinically trivial -- his study had a large sample size.

Every time you see an ad on TV which says something like &quot;no toothpaste was shown to be better than ours&quot; mentally add &quot;in our test with no power&quot;.

One professor I know did some consulting for an unnamed hospital which was being sued by patients who contacted infections after hip replacement surgery.  The professor was hired as an expert witness to testify whether the frequency of such infections at that hospital was worse than expected or not.  When he analzyed the data as a whole there was no significant evidence that their rate was higher than expected.  But the hospital had three surgical teams which performed those surgeries.  When the data were grouped according to teams, one team was clearly much more likely to leave patients with an infection.  As it turned out, that team was relatively new compared to the other two, and evidently it makes a big difference how long a team has been working together.

I&#039;ve seen other cutoffs in other fields -- 0.10 and 0.20 in fields such as psychology, sociology, and biology.   But the papers I saw still used cutoffs to assess significance.  I recall that they did publish the actual p-values that they obtained.</description>
		<content:encoded><![CDATA[<p>Amen to that.  Some examples come to mind &#8212; Linus Pauling&#8217;s testing of the benefits of vitamin C showed statistical significance which was clinically trivial &#8212; his study had a large sample size.</p>
<p>Every time you see an ad on TV which says something like &#8220;no toothpaste was shown to be better than ours&#8221; mentally add &#8220;in our test with no power&#8221;.</p>
<p>One professor I know did some consulting for an unnamed hospital which was being sued by patients who contacted infections after hip replacement surgery.  The professor was hired as an expert witness to testify whether the frequency of such infections at that hospital was worse than expected or not.  When he analzyed the data as a whole there was no significant evidence that their rate was higher than expected.  But the hospital had three surgical teams which performed those surgeries.  When the data were grouped according to teams, one team was clearly much more likely to leave patients with an infection.  As it turned out, that team was relatively new compared to the other two, and evidently it makes a big difference how long a team has been working together.</p>
<p>I&#8217;ve seen other cutoffs in other fields &#8212; 0.10 and 0.20 in fields such as psychology, sociology, and biology.   But the papers I saw still used cutoffs to assess significance.  I recall that they did publish the actual p-values that they obtained.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Scott</title>
		<link>http://www.johndcook.com/blog/2008/12/03/the-cult-of-significance-testing/comment-page-1/#comment-10484</link>
		<dc:creator>Scott</dc:creator>
		<pubDate>Wed, 03 Dec 2008 15:35:03 +0000</pubDate>
		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=885#comment-10484</guid>
		<description>I could not agree more with this, John. It amazes me that everyone &quot;knows&quot; that effect size is important and the .05 cutoff is arbitrary, but published research continues to center around significance anyway. As you know, not only does decreasing the sample size remove significance, even the tiniest effect becomes significant if your sample size is large enough. In the field of education this means that almost anything is significant when thousands of children are tested. (Just about anything in a classroom has at least a tiny effect on just about anything else.) I am currently distressed over certain questionable practices that are being pushed because research has shown their &quot;significance&quot; even though effect sizes are clearly very small. Unfortunately the currently published research on these practices is not talking about the effect sizes at all.

John, how can academia&#039;s knowledge of statistics be so sophisticated and their research be so ignorant of these basics?</description>
		<content:encoded><![CDATA[<p>I could not agree more with this, John. It amazes me that everyone &#8220;knows&#8221; that effect size is important and the .05 cutoff is arbitrary, but published research continues to center around significance anyway. As you know, not only does decreasing the sample size remove significance, even the tiniest effect becomes significant if your sample size is large enough. In the field of education this means that almost anything is significant when thousands of children are tested. (Just about anything in a classroom has at least a tiny effect on just about anything else.) I am currently distressed over certain questionable practices that are being pushed because research has shown their &#8220;significance&#8221; even though effect sizes are clearly very small. Unfortunately the currently published research on these practices is not talking about the effect sizes at all.</p>
<p>John, how can academia&#8217;s knowledge of statistics be so sophisticated and their research be so ignorant of these basics?</p>
]]></content:encoded>
	</item>
</channel>
</rss>
