<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: John Tukey&#8217;s median of medians</title>
	<atom:link href="http://www.johndcook.com/blog/2009/06/23/tukey-median-ninther/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.johndcook.com/blog/2009/06/23/tukey-median-ninther/</link>
	<description>The blog of John D. Cook</description>
	<lastBuildDate>Sat, 11 Feb 2012 01:10:06 -0500</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: CogitoErgoCogitoSum</title>
		<link>http://www.johndcook.com/blog/2009/06/23/tukey-median-ninther/comment-page-1/#comment-36502</link>
		<dc:creator>CogitoErgoCogitoSum</dc:creator>
		<pubDate>Fri, 16 Apr 2010 17:55:36 +0000</pubDate>
		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=2535#comment-36502</guid>
		<description>You didnt discuss the effectiveness of the ninther at estimating the actual mean.  How accurate is it?  You seem only concerned with computer efficiency and not at all with the entire point of doing it in the first place.  The point was to estimate the mean, right? So how close does a ninther get?</description>
		<content:encoded><![CDATA[<p>You didnt discuss the effectiveness of the ninther at estimating the actual mean.  How accurate is it?  You seem only concerned with computer efficiency and not at all with the entire point of doing it in the first place.  The point was to estimate the mean, right? So how close does a ninther get?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tim Walker</title>
		<link>http://www.johndcook.com/blog/2009/06/23/tukey-median-ninther/comment-page-1/#comment-19887</link>
		<dc:creator>Tim Walker</dc:creator>
		<pubDate>Thu, 25 Jun 2009 03:16:08 +0000</pubDate>
		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=2535#comment-19887</guid>
		<description>A tangent, since I&#039;m by no means a mathematician . . . You may already be familiar with this, but Richard Hamming talked about Tukey&#039;s personal characteristics and work methods in his outstanding lecture, &lt;a href=&quot;http://www.paulgraham.com/hamming.html&quot; rel=&quot;nofollow&quot;&gt;&quot;You and Your Research,&quot;&lt;/a&gt; which I&#039;ve cited many times on my own blog.</description>
		<content:encoded><![CDATA[<p>A tangent, since I&#8217;m by no means a mathematician . . . You may already be familiar with this, but Richard Hamming talked about Tukey&#8217;s personal characteristics and work methods in his outstanding lecture, <a href="http://www.paulgraham.com/hamming.html" rel="nofollow">&#8220;You and Your Research,&#8221;</a> which I&#8217;ve cited many times on my own blog.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Phil H</title>
		<link>http://www.johndcook.com/blog/2009/06/23/tukey-median-ninther/comment-page-1/#comment-19818</link>
		<dc:creator>Phil H</dc:creator>
		<pubDate>Wed, 24 Jun 2009 08:06:25 +0000</pubDate>
		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=2535#comment-19818</guid>
		<description>There are 2 useful things in the ninther concept: approximations to bulk statistics and the calculation of a median without any sorting.

Generalising from 3 groups of 3 to n groups of m, we could still calculate a median from a series of chunks of the dataset, but we would need to sort.

This would suggest problems when working on Very Large Datasets, but consider the case of the most annoying dataset - randomised data. If we take a chunk of data from this dataset, we can approximate the statistics of the bulk with the statistics of the chunk, or a series of chunks. 

The answer may be, then, to randomly pick out a series of 9-value chunks, and calculate a series of ninthers. That way the number of comparisons per total values can  be less than 1.

O(1) to O(N), depending on how accurate you require your statistics to be.</description>
		<content:encoded><![CDATA[<p>There are 2 useful things in the ninther concept: approximations to bulk statistics and the calculation of a median without any sorting.</p>
<p>Generalising from 3 groups of 3 to n groups of m, we could still calculate a median from a series of chunks of the dataset, but we would need to sort.</p>
<p>This would suggest problems when working on Very Large Datasets, but consider the case of the most annoying dataset &#8211; randomised data. If we take a chunk of data from this dataset, we can approximate the statistics of the bulk with the statistics of the chunk, or a series of chunks. </p>
<p>The answer may be, then, to randomly pick out a series of 9-value chunks, and calculate a series of ninthers. That way the number of comparisons per total values can  be less than 1.</p>
<p>O(1) to O(N), depending on how accurate you require your statistics to be.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Karl Ove Hufthammer</title>
		<link>http://www.johndcook.com/blog/2009/06/23/tukey-median-ninther/comment-page-1/#comment-19812</link>
		<dc:creator>Karl Ove Hufthammer</dc:creator>
		<pubDate>Wed, 24 Jun 2009 06:47:28 +0000</pubDate>
		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=2535#comment-19812</guid>
		<description>It’s also worth noting that you actually &lt;em&gt;don’t&lt;/em&gt; have to sort the entire data set to calculate the median; you only have to do a partial sort ensuring that the number at position (n+1)/2 is correct.

This is the way the median calculation is implemented in R. But for larger data sets, the partial sort algorithm (in R) actually isn’t any faster than a full sort, and a full sort is used.</description>
		<content:encoded><![CDATA[<p>It’s also worth noting that you actually <em>don’t</em> have to sort the entire data set to calculate the median; you only have to do a partial sort ensuring that the number at position (n+1)/2 is correct.</p>
<p>This is the way the median calculation is implemented in R. But for larger data sets, the partial sort algorithm (in R) actually isn’t any faster than a full sort, and a full sort is used.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Daniel Lemire</title>
		<link>http://www.johndcook.com/blog/2009/06/23/tukey-median-ninther/comment-page-1/#comment-19779</link>
		<dc:creator>Daniel Lemire</dc:creator>
		<pubDate>Tue, 23 Jun 2009 20:56:23 +0000</pubDate>
		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=2535#comment-19779</guid>
		<description>I wrote a somewhat related paper (with max-min instead of the median):

Daniel Lemire, Streaming Maximum-Minimum Filter Using No More than Three Comparisons per Element. Nordic Journal of Computing, 13 (4), pages 328-339, 2006.
http://arxiv.org/abs/cs.DS/0610046

I wonder how these approaches compare.</description>
		<content:encoded><![CDATA[<p>I wrote a somewhat related paper (with max-min instead of the median):</p>
<p>Daniel Lemire, Streaming Maximum-Minimum Filter Using No More than Three Comparisons per Element. Nordic Journal of Computing, 13 (4), pages 328-339, 2006.<br />
<a href="http://arxiv.org/abs/cs.DS/0610046" rel="nofollow">http://arxiv.org/abs/cs.DS/0610046</a></p>
<p>I wonder how these approaches compare.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

<!-- Dynamic Page Served (once) in 0.472 seconds -->

