How long computer operations take

The following table is from Peter Norvig’s essay Teach Yourself Programming in Ten Years. All times are in units of nanoseconds.

execute typical instruction 1
fetch from L1 cache memory 0.5
branch misprediction 5
fetch from L2 cache memory 7
Mutex lock/unlock 25
fetch from main memory 100
send 2K bytes over 1Gbps network 20,000
read 1MB sequentially from memory 250,000
fetch from new disk location (seek) 8,000,000
read 1MB sequentially from disk 20,000,000
send packet US to Europe and back 150,000,000


2 thoughts on “How long computer operations take

  1. And here we see one limitation of current algorithm analysis techniques: They don’t take the differences between varying kinds of storage into account at all. They assume that memory is one flat plane, as opposed to the hierarchy our caching makes it into, leading to the (potential) heartbreak of cache-oblivious algorithms.

    So… what? Does anyone know about good work done to remedy this in the context of abstract algorithm analysis?

  2. @Chris: That’s not true. Everyone doing any kind of practical algorithm analysis cares about the costs of operations. For simple tutorial examples, check out Jon Bentley’s awesome book Programming Pearls. Are you just thinking of asymptotic analyses like you learn in intro CS classes?

    If you look at, say, matrix algorithm analyses, they’re all over the problem of memory locality to overcome the register/L1/L2/(L3)/RAM lags. As are the database and text search algorithms.

    I think the biggest surprise to most people is the hit for calling RAM. Everyone knows disk is slow (though this needs the intermediate SSD seek numbers), but people don’t usually understand that memory is really really slow compared to core ops.

    This matters for large statistical models (and large matrix ops). For instance, in our multiple gigabyte language models, significant numbers of contexts cause cache misses, which dominate the processing time. I remember thinking I was going crazy the first time profiling this effect — I kept adding arithmetic operations after the memory fetch and found they took no additional time. All the time was just waiting for data to trickle in from memory.

Comments are closed.