Yesterday my wife and I watched our daughter’s junior varsity soccer game. Several statistical questions came to mind.

Larger schools tend to have better sports teams. If the talent distributions of a large school and a small school are the same, the larger school will have a better team because its players are the best from a larger population. If one school is twice as big as another, its team may consist of the top 5% while the other school’s team consists of its top 10%.

Does size benefit a school’s top (varsity) team or its second (junior varsity) team more? Would you expect more variability in varsity or junior varsity scores? Does your answer depend on whether you assume a thin-tailed (e.g. normal) or thick tailed (e.g. Cauchy) distribution on talent?

What if two schools have the same size, but one has a better athletic program, say due to better coaching. Suppose this shifts the center of the talent distribution. Does such a shift benefit varsity or junior varsity teams more?

Suppose both the varsity and junior varsity teams from two schools are playing each other, as was the case last night. If you know the outcome of the junior varsity game, how much should that influence your prediction of the outcome of the varsity game? Has anyone looked at this, either as an abstract model or by analyzing actual scores?

**Related post**: Probability of winning the World Series

A quick R simulation:

small = replicate(10000, sort(rnorm(100), decreasing = TRUE)[1:10])

big = replicate(10000, sort(rnorm(200), decreasing = TRUE)[1:10])

rowMeans(small)

rowMeans(big)

rowMeans(big) – rowMeans(small)

[1] 0.2473696 0.2636497 0.2827461 0.2968154 0.3096685 0.3205282 0.3300471 0.3381803 0.3475136 0.3571561

seems to suggest that, with a normal distribution, the lower ranks (hence the JV team) benefit more from size. With a Cauchy distribution, the reverse seems to be true, although I wouldn’t trust the average when using a distribution that has no expected value. I also tried the t distribution with varying degrees of freedom. With more degrees of freedom, the lower ranks benefit more from size relative to the higher ranks, with different ranks getting approximately the same benefit from size somewhere around 9 degrees of freedom. This value may depend on the specific sizes and ranks being considered, so take it with a grain of salt.

I don’t need a simulation to say that adding a constant to the talent distribution will benefit all order statistics by the same amount. Of course, in real life a better coaching program is probably not accurately modeled by adding a constant, and the result probably depends heavily on the specifics of the coaching program.

The last question is difficult to answer via a simulation as it isn’t clear how to accurately simulate the outcome of a game, but would be easy to answer given a database of such games.