I just published a new article on CodeProject: Calculating percentiles in memory-bound applications.
Finding the percentiles of a list of samples is trivial if you can read all the samples into memory and sort the list. If the list is too big to fit into memory, it’s still not terribly difficult, but there are a couple subtleties if you haven’t thought about it before.
The application that motivated the code presented in the article generates a large amount of data. You could think of the software as generating an enormous matrix one row at a time, then computing percentiles of each column. For example, in one problem the matrix had 800,000 columns and 2,000 rows of floating point numbers.