I just published a new article on CodeProject: Calculating percentiles in memory-bound applications.
Finding the percentiles of a list of samples is trivial if you can read all the samples into memory and sort the list. If the list is too big to fit into memory, it’s still not terribly difficult, but there are a couple subtleties if you haven’t thought about it before.
The application that motivated the code presented in the article generates a large amount of data. You could think of the software as generating an enormous matrix one row at a time, then computing percentiles of each column. For example, in one problem the matrix had 800,000 columns and 2,000 rows of floating point numbers.
One thought on “How to calculate percentiles in memory-bound applications”
I see you have a lot of interest in fast computations. Your way of doing percentiles is very advanced.
I’m facing the problem at the moment, which is to calculate rolling min and max, without looping trough the whole window. Say I have 1 milion numbers, and I want to calculate min and max for each window of 1000 numbers?