Suppose you have a function of *n* variables *f*. The *k*th derivative of *f* is a *k*th order tensor [1] with *n*^{k} components.

Not all those tensor components are unique. Assuming our function *f* is smooth, the order in which partial derivatives are taken doesn’t matter. It only matters which variables you differentiate with respect to and how many times.

There is a potentially unique component of the *k*th order derivative for every choice of *k* items (the variables you’re differentiating with respect to) from a set of *n* items (the variables) with replacement (because you can differentiate with respect to the same variable multiple times). Richard Stanley introduced the notation

for this quantity, and it turns out that

See Counting selections with replacement.

If *n* or *k* are large, this is a very large number. Suppose you have a function of *n* = 1,000 variables and you want to take the 5th derivative. The derivative has 10^{15} components, but “only” about 10^{13} distinct components. 8,416,958,750,200 to be exact. A floating point number (IEEE 754 double) takes up 8 bytes, so storing the distinct components would require 62,711 gigabytes. So you could store the distinct components if you had a thousand computers with 64 GB of RAM each.

An application that depends on high-order derivatives of functions of many variables has to exploit some structure, such as sparsity or symmetry, and not directly compute and store the derivatives.

By the way, if both are large, the approximate number of components estimated via Stirling’s approximation is

where *m* = *n* − 1.

***

[1] The value of the function is a real number, a scalar, a 0th order tensor. The first derivative, the gradient, is a vector, a 1st order tensor. The second derivative, the Hessian, is a matrix, a 2nd order tensor. There aren’t more common names for tensors of order three and higher. You could think of a 3rd order tensor as a three dimensional box of numbers, a stack of matrices.