How to calculate percentiles in memory-bound applications

by John on April 28, 2008

I just published a new article on CodeProject: Calculating percentiles in memory-bound applications.

Finding the percentiles of a list of samples is trivial if you can read all the samples into memory and sort the list. If the list is too big to fit into memory, it’s still not terribly difficult, but there are a couple subtleties if you haven’t thought about it before.

The application that motivated the code presented in the article generates a large amount of data. You could think of the software as generating an enormous matrix one row at a time, then computing percentiles of each column. For example, in one problem the matrix had 800,000 columns and 2,000 rows of floating point numbers.

{ 1 comment… read it below or add one }

1

Chronos Phenomena 10.19.09 at 16:59

Hi John,

I see you have a lot of interest in fast computations. Your way of doing percentiles is very advanced.

I’m facing the problem at the moment, which is to calculate rolling min and max, without looping trough the whole window. Say I have 1 milion numbers, and I want to calculate min and max for each window of 1000 numbers?

Leave a Comment

You can use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>