Statistical dead end

I get suspicious when I hear people ask about third and fourth moments (skewness and kurtosis). I’ve heard these terms far more often from people who don’t understand statistics than from people who do.

There are two common errors people often have in mind when they bring up skewness and kurtosis.

First, they implicitly believe that distributions can be boiled down to three or four numbers. Maybe they had an elementary statistics course in which everything boiled down to two moments — mean and variance — and they suspect that’s not enough, that advanced statistics extends elementary statistics by looking at third or fourth moments. “There are more things in heaven and earth, Horatio, than are dreamt of in your philosophy.” The path forward is not considering higher and higher moments.

This leads to a second and closely related problem. Interest in third and fourth moments sounds like hearkening back to the moment-matching approach to statistics. Moment matching was a simple idea for estimating distribution parameters:

  1. Set population means equal to sample means.
  2. Set population variances equal to sample variances.
  3. Solve the resulting equations for distribution parameters.

There’s more to moment matching that that, but that’s enough for this discussion. It’s a very natural approach, which is probably why it still persists. But it’s also a statistical dead end.

Moment matching is the most convenient approach to finding estimators in some cases. However, there is another approach to statistics that has largely replaced moment matching, and that’s maximum likelihood estimation: find the parameters that make the data most likely.

Both moment matching and maximum likelihood are intuitively appealing ideas. Sometimes they lead to the same conclusions but often they do not. They competed decades ago and maximum likelihood won. One reason is that maximum likelihood estimators have better theoretical properties. Another reason is that maximum likelihood estimation provides a unified approach that isn’t thwarted by difficulties in solving algebraic equations.

There are good reasons to be concerned about higher moments (including fractional moments) though these are primarily theoretical. For example, higher moments are useful in quantifying the error in the central limit theorem. But there are not a lot of elementary applications of higher moments in contemporary statistics.

Diagram of gamma function identities

The gamma function has a large number of identities relating its values at one point to values at other points. By focusing on just the function arguments and not the details of the relationships, a simple pattern emerges. Most of the identities can be derived from just four fundamental identities:

  • Conjugation
  • Addition
  • Reflection
  • Multiplication

The same holds for functions derived from the gamma function: log gamma, digamma, trigramma, etc.

For the details of the relationships, see Identities for gamma and related functions.

Other diagrams:

 

Business literature

Lately I’ve read several writers critical of popular business books. One oft-repeated criticism is that some of the companies featured in Good to Great aren’t doing so great and therefore the book was wrong.

I’ve never looked at business books this way. I see them as literature. They have stories that may provoke your thinking, but they’re not providing scientific laws. I wouldn’t say Good to Great was “wrong” any more than I’d say To Kill a Mockingbird was “wrong.”

The difference, of course, is that novels don’t aspire to an aura of scientific certainty while business books do. Business books often presume to offer universal laws when they only offer anecdotes. That doesn’t mean these books are not valuable. Anecdotes can be quite valuable. However, the value of an anecdote lies not in what it literally conveys but in the thoughts it stirs in your mind.

Business authors sometimes analyze reams of data. I wish they wouldn’t bother. I prefer business writers who don’t pretend to be scientific. “Here’s what I think. Here’s a story that illustrates my point. Your mileage may vary.”

If someone does produce a high-quality study of some class of companies at some point in time, the study is still an anecdote to a reader in different circumstances. A statistically rigorous study of Fortune 500 companies is not directly applicable to someone running a taco stand. It may not even be directly applicable to someone running a Fortune 500 company a few years later.

The taco stand owner may get as much insight from someone’s memoir of running a single large company as from a rigorous study of hundreds of large companies. (He may also get valuable insight from To Kill a Mockingbird.)

P.S. Although I’m saying business books are like literature, I must add that I hate business parables. The ones I’ve read are just terrible. No one would ever read one of these books for its literary merit, and when you strip away the campy prose there isn’t much content left.

How to visualize Bessel functions

Bessel functions are sometimes called cylindrical functions because they arise naturally from physical problems stated in cylindrical coordinates. Bessel functions have a long list of special properties that make them convenient to use. But because so much is known about them, introductions to Bessel functions can be intimidating. Where do you start? My suggestion is to start with a few properties that give you an idea what their graphs look like.

A textbook introduction would define the Bessel functions and carefully develop their properties, eventually getting to the asymptotic properties that are most helpful in visualizing the functions. Maybe it would be better to learn how to visualize them first, then go back and define the functions and show they look as promised. This way you can have a picture in your head to hold onto as you go over definitions and theorems.

I’ll list a few results that describe how Bessel functions behave for large and small arguments. By putting these two extremes together, we can have a fairly good idea what the graphs of the functions look like.

There are two kinds of Bessel functions, J(z) and Y(z). Actually, there are more, but we’ll just look at these two. These functions have a parameter ν (Greek nu) written as a subscript. The functions Jν(z) are called “Bessel functions of the first kind” and the functions Yν are called … wait for it … “Bessel functions of the second kind.” (Classical mathematics is like classical music as far as personal names followed by ordinal numbers: Beethoven’s fifth symphony, Bessel functions of the first kind, etc.)

Roughly speaking, you can think of J‘s and Y‘s like cosines and sines. In fact, for large values of z, J(z) is very nearly a cosine and Y(z) is very nearly a sine. However, both are shifted by a phase φ that depends on ν and are dampened by 1 /√z . That is,  for large values of z, J(z) is roughly proportional to cos(z – φ)/√z and Y(z) is roughly proportional to sin(z – φ)/√z.

More precisely, as z goes to infinity, we have

J_nu(z) sim sqrt{frac{2}{pi z}} cosleft(z - frac{1}{2} nu pi - frac{pi}{4} right)

and

 Y_nu(z) sim sqrt{frac{2}{pi z}} sinleft(z - frac{1}{2} nu pi - frac{pi}{4} right)

These statements hold as long as |arg(z)| < π. The error in each is on the order O(1/|z|).

Now lets look at how the functions behave as z goes to zero. For small z, Jν behaves like zν and Yν behaves like z. Specifically,

J_nu(z) sim frac{z^nu}{2^nu, Gamma(1 + nu)}

and for ν > 0,

Y_nu(z) sim -frac{2^nu Gamma(nu)}{pi x^nu}

If ν = 0, we have

Y_0(z) sim -frac{2}{pi} logfrac{2}{z}

Now let’s use the facts above to visualize a couple plots.

First we plot J1 and J5. For large values of z, we expect J1(z) to behave like cos(z – φ) / √z where φ = 3π/4 and we expect J5(z) to behave like cos(z – φ) / √z where φ = 11π/4. The two phases differ by 2π and so the two functions should be nearly in phase.

For small z, we expect J1(z) to be roughly proportional z  and so its graph comes into the origin at an angle. We expect J5(z) to be roughly proportional to z5 and so its graph should be flat near the origin.

The graph below shows that the function graphs look like we might have imagined from the reasoning above. Notice that the amplitude of the oscillations is decreasing like 1/√z.

Next we plot J1 and Y1. For large arguments we would expect these functions to be a quarter period out of phase, just like cosine and sine, since asymptotically J1 is proportional to cos(z – 3π/4) / √z and Y1 is proportional to sin(z – 3π/4) / √z. For small arguments, J1(z) is roughly proportional to z and Y1(z) is roughly proportional to -1/z. The graph below looks as we might expect.

Related posts

Fall Twitter giveaway

I have seven daily tip accounts on Twitter. These accounts post once a day, Monday through Friday, plus occasional unscheduled posts.

SansMouse icon Windows keyboard shortcuts
RegexTip icon Regular expression tips
TeXtip icon TeX and LaTeX tips
ProbFact icon Probability
AlgebraFact icon Algebra and number theory
TopologyFact icon Topology and geometry
AnalysisFact icon Real and complex analysis

 

Please help new people find out about these accounts by linking to this post or by recommending your favorite account on Twitter.

I’m going to do another giveaway like the one I did last April. I’ll draw a winner from the tweets that mention one of these accounts. The winner gets a choice of a T-shirt or coffee mug with one of the Twitter account logos. The winner from the Spring drawing chose a coffee mug with the RegexTip logo.

The beauty of windmills and power lines

Windmills were considered eyesores in 17th century Holland. Now we believe they are beautiful. And so they are. But there is a prejudice to presume that industrial things are not beautiful. We learned to see the beauty in windmills after artists painted them.

Alain de Botton discusses the beauty of windmills and power lines in his interview on EconTalk.

Many of the industrial things in the world are considered ugly, not because they are ugly, but because nobody has come along to point out that they might be beautiful. … A lot of times we call things beautiful or ugly because artists have been there and shaped our sensibilities. … and in a small way, that’s what my book is about: finding beauty where genuinely there is beauty but it gets missed.

Related posts

Applied topology and Dante: an interview with Robert Ghrist

Robert Ghrist A few weeks ago I discovered Robert Ghrist via his website. Robert is a professor of mathematics and electrical engineering. He describes his research as applied topology, something I’d never heard of. (Topology has countless applications to other areas of mathematics, but I’d not heard of much work directly applying topology to practical physical problems.) In addition to his work in applied topology, I was intrigued by Robert’s interest in old books.

The following is a lightly-edited transcript of a phone conversation Robert and I had September 9, 2010.

* * *

JC: When I ran across your website one thing that grabbed my attention was your research in applied topology. I’ve studied applied math and I’ve studied topology, but the two are very separate in my mind. I was intrigued to hear you combine them.

RG: Those two are separate in a lot of people’s minds, but not for long. It’s one of those things that the time has come and it’s clear that the tools that were developed for very abstract, esoteric problems have really concrete value with respect to modern challenges in data, or systems analysis.

JC: Could you give some examples of that?

RG: Certainly. One of the first groups of people who do full-scale applied algebraic topology were Gunnar Carlsson’s group at Stanford doing applications to data analysis. The setup is you have a collection of data points in a space, a point cloud, that is a discrete representation of some interesting structure. For example, you might want to know how many connected components does this data set have. That might correspond to different features. For example, this might come from customer surveys that a corporation has out. It’s trying to cluster these customers. Or if it is medical data, say they are trying to discern different types of cancers. Then you might look at what statisticians call clustering, grouping data sets into connected components.

Well, topologists know that’s just the first step in a larger program of finding global structure. Besides having connectivity properties, spaces can have holes in them of various types. There are formal algebraic methods for finding and classifying those holes. That’s homology, cohomology, homotopy theory. And applying that to data turns out to give some really revolutionary techniques that don’t rely on projecting the data set down to a two-dimensional picture and trying to visualize what’s happening. It’s automatic.

There are similarly themed application in the work I do in engineering systems where you take data that comes from, say, a network of sensors or a communications network or a networked connection on computers or a networked collection of autonomous robots. And you try and take all that local information, say in the context of sensor networks, you take collections of local data and try to patch it together to give you global understanding of an environment. That kind of local-to-global transition is what the techniques of topology were built to do. And they are surprisingly efficacious in these very applied problems.

JC: I’m familiar with Van Kampen’s theorem in homotopy. Is that the sort of thing you’re talking about?

RG: Yeah, homotopy tends to be less computable than homology. Homology is much more natural in these contexts. The corresponding principle is the Mayer-Vietoris principle, the homological analog of Van Kampen. And the Mayer-Vietoris principle is really telling you something about integration, how you stitch together local bits of data, how you integrate it into a global understanding of your network. That is a very deep idea that is very important to transitioning from local to global data.

JC: One mental block I have when I’m thinking about these sort of things is that in my mind topology seems sorta fragile in the sense that one random connection can change something. Connect just one pair of dots and suddenly a disconnected space becomes a connected space. It seems that would be a problem in these settings where you make some sort of topological statement, but any missing data or erroneous data invalidates your conclusions. But I remember you said something that was sort of the opposite on your website, that topological methods can be more robust. I could see that, but I’m having trouble resolving these two views.

RG: There are different types of robustness that are critical in different types of applications. Because the constructs of algebraic topology are invariant under homotopy or deformation, it turns out to be very robust with respect to, for example, coordinate changes. That is extremely useful when you’re dealing with data that has some anchor to the physical world.

Let’s say data that’s being collected by our cell phones. Put a couple sensors on a cell phone and we have lots and lots of interesting data streams. And that data is tied to physical locations. But you might not know exactly where it is. GPS doesn’t work so well when you’re inside a building, for example. In contexts like that, you want the kind of robustness that doesn’t depend on having a carefully laid-out coordinate system.

Now robustness with respect to noise, especially robustness with respect to errors, is a much more difficult problem to solve in general. But even in that case there are some topological tools that in specific examples can be deployed. This gets into slightly more technical stuff involving persistent homology and topological properties that persist over a range of samplings.

JC: Could you give an example of how knowing the homology of a data set might tell you something about a physical phenomena?

RG: One example that I’ve worked with a lot has to do with coverage problems in sensor networks. Let’s say we’re talking about cell phone coverage, because everyone’s familiar with that. If you get into a hole in the cell phone network where you’ve dropped coverage, that’s frustrating. You’d like to know whether you have full coverage over an area or not, whether you have holes.

This gets much more critical when you’re talking about a security setting. You have video cameras or satellites that cover a region and you want whether you’ve covered everything, or whether there are holes where you are missing information. One of the things I’ve used homology theory for is to give criteria that guarantee coverage, that guarantee no holes in your sensor network based exclusive on coordinate-free data. So even though the sensors may not know where they are, and only know the identities of the sensors near by, it’s still possible to verify coverage based on homological criteria.

JC: Interesting. I suppose especially if you’re looking at higher-dimensional data you can’t just draw a lot of circles on a map and see whether they overlap. You have to do something more computational.

RG: Exactly. Especially if those circles are in motion and you want to know what’s happening over time. Especially if there are no coordinates and you don’t know where to draw the circles to see how things overlap.

One thing I want to get across when I’m talking with people is that I view a mathematics library the same way an archaeologist views a prime digging site. There are all these wonderful treasures that are buried there and hidden from the rest of the world. If you pick up a typical book on sheaf theory, for example, it’s unreadable. But it’s full of stuff that is very, very important to solving really difficult problems. And I have this vision of digging through the obscure text and finding these gems and exporting them over to the engineering college and other domains where these tools can find utility.

Now, a lot of esoteric mathematics has already crossed the fence. No one will claim any more that number theory is useless. But topology is the place where you have the most-useful, least-used tools. So that’s my vision for what I want to see happen in mathematics and what I’m trying to accomplish.

JC: Switching gears a little bit, another thing on your site that caught my eye was your quote about old books.

Reading anything less than 50 years old is like drinking new wine: permissible once or twice a year and usually followed by regret and a headache.

I thought about C. S. Lewis’ exhortation to mix in old books with your reading of new books because each age has its own blind spots. Old authors have their own blind spots, but maybe they’re complementary to the ones we have.

RG: Exactly. I definitely have followed that dictum. Maybe a little too much so, in that I rarely read anything modern at all. When it comes to books. I don’t follow that rule when it comes to music or movies or blogs. But on the level of books, there is so much good stuff out there that has stood the test of time, I don’t run out of interesting things to read. I had the wonderful experience as a college student to take a great books-type course that involved a lot of reading, a lot of discussion. Really changed my outlook and got me loving the classics and really living inside a lot of those books.

JC: Were you a math major when you took this great books class?

RG: No, I was an engineering major. My undergraduate degree was in engineering. I came to math a little late in life.

JC: You said you were particularly fond of Dante.

RG: That’s correct. Yeah, I’ve lived in that book [The Divine Comedy] a long time and I still find new and very engaging ideas in it every time I crack it open. Most people don’t get very far past the Inferno. The Inferno is the exciting, action-movie part of the story. But the later parts of the story — purgatory, paradise — those are really nice places to live.

JC: Have you had to do a lot of historical research to be able to read Dante?

RG: If you get a good translation with a good set of notes, that makes it much easier. I find the translation by Dorothy Sayers has excellent notes. She got turned on to Dante late in life. There’s an interesting story. While they were having bombings in the early 1940’s, she was going down to the bomb shelter to spend a few hours, she decided to grab a book off the shelf on the way down. She saw a copy of Dante. She said “You know, I’ve never really read Dante.” Pulled it off the shelf. For the next two days she didn’t eat or sleep. She was engrossed with the story and how masterful it was. And she wound up devoting the rest of her life to mastering the Italian and producing her own translation.

Infinite is easier than big

Here are two common but unhelpful was to think about infinity.

  1. Infinity makes things harder.
  2. Infinity is a useless academic abstraction.

Neither of these is necessarily true. Problems are often formulated in terms of infinity to make things easier and to solve realistic problems. Infinity is usually a simplification. Think of infinity as “so big I don’t have to worry about how big it is.” Here are three examples.

In computer science, a Turing machine is an idealization of a computer. It is said to have an infinite tape that it reads back and forth. You could think of that as saying you can have as long a tape as you need.

Physics homework problems deal with infinite capacitors. Don’t think of that as meaning a capacitor bigger than the universe. Interpret the problem as saying that the width of the capacitor is so large relative to its thickness that you don’t have to worry about edge effects.

In calculus, you could think of infinite series as a sequence of finite approximations. A Taylor series, for example, is a compact way of expressing an unlimited sequence of approximations of a function. You can get as close to the function as you’d like by including enough terms in the sequence.

The infinite case often guides our thinking about the big case. Take the Taylor series example. A Taylor series isn’t just a formal series of polynomials. A Taylor series converges in some region. That says the infinite sequence of terms don’t behave arbitrarily. They get close to something as the terms increase. Knowing that the infinite sum converges tells you how to think about finite approximations you select from the series.

When I went to grad school, my intention was to study functional analysis. Essentially this means infinite dimensional vector spaces. That sounds terribly abstract and useless, but it can be quite practical. My background in functional analysis served me well when I went on to study partial differential equations and numerical analysis.

Infinite dimensional spaces guide our thinking about large finite dimensional spaces. If you want to solve a practical problem in high dimensions, the infinite dimensional case may be a better guide than ordinary three dimensional space. Continuity in infinite dimensional spaces requires structure that may not be apparent in low dimensions. Thinking about the infinite case may prepare you to exploit that structure in a large finite dimensional problem.

Related posts