Statistical functions in Excel

Depending on your expectations, you may have different reactions to the statistical function support in Excel. If you expect anything similar to a statistical package, you’ll be sorely disappointed. But if you think of Excel as a spreadsheet for everybody that sometimes lets you do statistical tasks right there without having to open up a statistical package, you’ll be pleased.

I was looking into the functions in Excel 2007 while preparing for a class I taught yesterday. I wanted to emphasize that certain functions are everywhere, not only in mathematical packages like Mathematica and R, but also in Python and even Excel.

Excel’s set of functions is inconsistent, both in the functionality provided and in the names it uses. Having an asymmetric API makes it harder to remember what is available and how to use it. On the other hand, the most commonly needed functions are available. The functions are individually reasonable even though they do not fit together into a simple pattern.

For details, see my notes Probability distributions in Excel 2007.

I discovered along the way that Excel has a GAMMALN function to compute the logarithm of the Gamma function Γ(x). This is a very useful function to have, even more useful than the Gamma function itself for reasons explained here.

Related links:

Comparison of data analysis packages from Brendan O’Connor

R, Excel, and the Windows clipboard (good tips in the comments)

Tagged with:
Posted in Computing
10 comments on “Statistical functions in Excel
  1. Wei says:

    Excel is not a statistical package, no matter what it has. You could use a pair of scissors to cut your nails,but you can only do it comfortably using a nail clipper. Teaching your students R instead of M$ Excel will benefit them in long run.

  2. John says:

    I agree that Excel is not a statistical package, but it can be useful in statistics. It’s a reasonable tool for light-weight statistics, ideal for people who know just enough statistics to be dangerous. :-)

    I use lots of tools: Mathematica, R, Python, … and even Excel. But I’ve done more statistical work with C++ than any other language.

  3. Josh Hemann says:

    I have been meaning to check out the Resolver Systems guys:

    You have all of the convenience of Excel but with an IronPython interface for macro work.

  4. John says:

    Josh: I’d like to try that too. The Ironclad project that they developed along the way is useful if you want to call SciPy from IronPython.

  5. Josh Hemann says:

    Yeah, I think a lot of tests for Scipy and Numpy already pass, but of course there are other important C extensions to doing analytical work in Python and I don’t know what their status is, and I don’t know enough about .NET yet to take the plunge.

    By the way, not to start a language war, but to students I talk to (and I am not a teacher, so this is not too common) I advocate Python. It is increasingly used in analytical work, and it is a far more expressive, general purpose language than R or SAS or…, so you’ll be able to use it in a variety of settings. Ideally, learn as many languages as you can, but it would be hard to go wrong focusing on Python.

  6. Wei says:

    LOL. On the ground of languages for statistical, mathematical and general-purpose computing, we see a blossom of all kinds. But bear it in your mind that Open source ones will always be the ones growing most prosperously.

  7. John says:

    Josh: you won’t get an argument from me. Python is an elegant, general-purpose language with powerful numerical libraries. That’s better than a numerical library with a language tacked on that isn’t suitable for general programming.

    Wei: I’m not such a fan of open source. There are some great open source projects — LaTeX, Python, etc. — but I’ve had a lot of disappointments with open source software.

  8. Matt says:

    My experience is kinda similar to John’s — I know and use Mathematica (like it for symbolic computation), MATLAB (simple / non-time-critical matrix manipulation tasks), R (trying out fancy statistics packages), had a short episodes with Ox (C-like syntax, specialized for econometrics) and STATA (for traditional statistics/econometrics models on small data). Oh, yeah, there was SAS too, but the language is way too ugly (kinda tempted to try out commercial version of R, S-PLUS, which can handle very large data). Also used JAVA and C# in academic setting, currently interested in learning Haskell or OCaml (just to get the feeling for purer FP).

    Ultimately though, for most of the “serious” task (incl. my research) I’ve chosen C++ and it’s proven excellent so far — very fast, quite portable (easy to share & exchange ideas), and actually pleasant to program [though it took a while -- I'm talking about "modern C++" here, and that includes clean C++ code with STL, Boost, STLSoft libraries, making heavy use of generic programming and functional programming and modern C++ idioms (RAII, Smart Pointers, etc. -- with C++'s automatic memory mgmt. you don't really need garbage collection)]. I’ve found that most people who complain about C++ being “hard” often also think it’s an object-oriented programming language like Java where you have to wrap everything in classes (thankfully, it’s not) and/or have been exposed to pre-1998 C-with-classes style standard (which can lead to code that, admittedly, is quite ugly). The problem is that there is still a lot of resources/books/websites/code around using the bad coding practices, etc., which may give a bad (and wrong) impression and it takes some time/experience to get to the good stuff (so there is some truth behind steep-learning-curve argument). Still, having done that, I wouldn’t change C++ for anything else — it’s very expressive and feels “just right.”

  9. Priyanka Jain says:

    Is there any in built function for uniform distribution, erlang distribution, bernoulli trial of statistics in ms excel

  10. John says:

    Priyanka: There may be functions for the three distributions you mention, but they’re all special cases of distributions in my notes. A uniform distribution is a beta(1, 1). An Erlang distribution is a beta distribution with an integer shape parameter. A Bernoulli distribution is a binomial distribution with n = 1.