# The science of waiting in line

There’s a branch of math that studies how people wait in line: queueing theory. It’s not just about people standing in line, but about any system with clients and servers.

An introduction to queueing theory, about what you’d learn in one or two lectures, is very valuable for understanding how the world around you works. But after a brief introduction, you quickly hit diminishing return on your effort, unless you want to specialize in the theory or you have a particular problem you’re interested in.

The simplest models in queueing theory have nice closed-form solutions. They’re designed to be easy to understand, not to be highly realistic in application. They make simplifying assumptions, and minor variations no longer have nice solutions. But they make very useful toy models.

One of the insights from basic queueing theory is that variability in arrival and service times matters a great deal. This is unlike revenue, for example, where you can estimate total revenue simply by multiplying the estimated number of customers by the estimated average revenue per customer.

A cashier who can handle a stream of customers on average is in big trouble. If customers arrived at regular intervals and all required the same amount of time to serve, then if things are OK on average, they’re OK. But if there’s not much slack, and there’s the slightest bit of variability in arrival or service times, long lines will form.

I work through a specific example here. The example also shows that when a system is near capacity, adding a new server can make a huge improvement. In that example, going from one server to two doesn’t just cut the waiting time in half, but reduces it by a couple orders of magnitude.

As I mentioned at the beginning, queueing theory doesn’t just apply to shoppers and cashiers. It also applies to diverse situations such as cars entering a highway and page requests hitting a web server. See more on the latter here.

# Server utilization: Joel on queuing

Joel Spolsky of Joel on Software fame mentions queuing theory in the latest StackOverflow podcast. He mentions a rule of thumb that wait times go up quickly as server utilization exceeds 80%. The same principle applies whether you’re talking about computer servers or human servers. I hadn’t heard of that rule of thumb, though I knew that you don’t want anywhere near 100% utilization. Here’s the graph that justifies Joel’s remark.

The vertical axis is wait time. The horizontal access is utilization.

Here are the details. Basic queuing theory assumes customer arrivals are Poisson distributed with rate λ. Service times are exponentially distributed with rate μ. The ratio λ/μ is called utilization ρ. If this ratio is greater than 1, that says customers are arriving faster than they can be served, and so the line will grow without bound. If the ratio is less than 1, the line will reach some steady state on average.

The average waiting time is W = 1/(μ-λ). Now assume the service time μ is fixed and the arrival rate λ = ρ μ. Then W = 1/μ(1-ρ) and so the wait time is proportional to 1/(1-ρ). As the utilization ρ approaches 1, the wait time goes to infinity. The graph above plots 1/(1-ρ). As Joel said, the curve does go up quickly when ρ exceeds 0.8.

Related post: What happens when you add a new teller?

# What happens when you add a new teller?

Suppose a small bank has only one teller. Customers take an average of 10 minutes to serve and they arrive at the rate of 5.8 per hour. What will the expected waiting time be? What happens if you add another teller?

We assume customer arrivals and customer service times are random (details later). With only one teller, customers will have to wait nearly five hours on average before they are served. But if you add a second teller, the average waiting time is not just cut in half; it goes down to about 3 minutes. The waiting time is reduced by a factor of 93x.

Why was the wait so long with one teller? There’s not much slack in the system. Customers are arriving every 10.3 minutes on average and are taking 10 minutes to serve on average. If customer arrivals were exactly evenly spaced and each took exactly 10 minutes to serve, there would be no problem. Each customer would be served before the next arrived. No waiting.

The service and arrival times have to be very close to their average values to avoid a line, but that’s not likely. On average there will be a long line, 28 people. But with a second teller, it’s not likely that even two people will arrive before one of the tellers is free.

Here are the technical footnotes. This problem is a typical example from queuing theory. Customer arrivals are modeled as a Poisson process with λ = 5.8/hour. Customer service times are assumed to be exponential with mean 10 minutes. (The Poisson and exponential distribution assumptions are common in queuing theory. They simplify the calculations, but they’re also realistic for many situations.) The waiting times given above assume the model has approached its steady state. That is, the bank has been open long enough for the line to reach a sort of equilibrium.

Queuing theory is fun because it is often possible to come up with surprising but useful results with simple equations. For example, for a single server queue, the expected waiting time is λ/(μ(μ – λ)) where λ the the arrival rate and μ is the service rate. In our example, λ = 5.8 per hour and μ = 6 per hour. Some applications require more complicated models that do not yield such simple results, but even then the simplest model may be a useful first approximation.

Related post: Server utilization: Joel on queuing