I recently learned about the Linux command line utility
shuf from browsing The Art of Command Line. This could be useful for random sampling.
Given just a file name,
shuf randomly permutes the lines of the file.
With the option
-n you can specify how many lines to return. So it’s doing sampling without replacement. For example,
shuf -n 10 foo.txt
would select 10 lines from
Actually, it would select at most 10 lines. You can’t select 10 lines without replacement from a file with less than 10 lines. If you ask for an impossible number of lines, the
-n option is ignored.
You can also sample with replacement using the
-r option. In that case you can select more lines than are in the file since lines may be reused. For example, you could run
shuf -r -n 10 foo.txt
to select 10 lines drawn with replacement from
foo.txt, regardless of how many lines
foo.txt has. For example, when I ran the command above on a file containing
alpha beta gamma
I got the output
beta gamma gamma beta alpha alpha gamma gamma beta
I don’t know how
shuf seeds its random generator. Maybe from the system time. But if you run it twice you will get different results. Probably.