Disappearing data projections

Suppose you have data in an N-dimensional space where N is large and consider the cube [-1, 1]N. The coordinate basis vectors start in the center of the cube and poke out through the middle of the faces. The diagonals of the cube run from the center to one of the corners.

If your points cluster along one of the coordinate axes, then projecting them to that axis will show the full width of the data. But if your points cluster along one of the diagonal directions, the projection along every coordinate axis will be a tiny smudge near the origin. There are a lot more diagonal directions than coordinate directions, 2N versus N, and so there are a lot of orientations of your points that could be missed by every coordinate projection.

Here’s the math behind the loose statements above. The diagonal directions of the form (±1, ±1, …, ±1). A unit vector in one of these directions will have the form (1/√N)(±1, ±1, …, ±1) and so its inner product with any of the coordinate basis vectors is 1/√N, which goes to zero as N gets large. Said another way, taking a set of points along a diagonal and projecting it to a coordinate axis divides its width by √N.

Confidence

Zig Ziglar said that if you increase your confidence, you increase your competence. I think that’s generally true. Of course you could be an idiot and become a more confident idiot. In that case confidence just makes things worse [1]. But otherwise when you have more confidence, you explore more options, and in effect become more competent.

There are some things you may need to learn not for the content itself but for the confidence boost. Maybe you need to learn them so you can confidently say you didn’t need to. Also, some things you need to learn before you can see uses for them. (More on that theme here.)

I’ve learned several things backward in the sense of learning the advanced material before the elementary. For example, I studied PDEs in graduate school before having mastered the typical undergraduate differential equation curriculum. That nagged at me. I kept thinking I might find some use for the undergrad tricks. When I had a chance to teach the undergrad course a couple times, I increased my confidence. I also convinced myself that I didn’t need that material after all.

My experience with statistics was similar. I was writing research articles in statistics before I learned some of the introductory material. Once again the opportunity to teach the introductory material increased my confidence. The material wasn’t particularly useful, but the experience of having taught it was.

Related post: Psychological encapsulation


[1] See Yeats’ poem The Second Coming:

The best lack all conviction, while the worst
Are full of passionate intensity.

 

Probability approximations

This week’s resource post lists notes on probability approximations.

Do we even need probability approximations anymore? They’re not as necessary for numerical computation as they once were, but they remain vital for understanding the behavior of probability distributions and for theoretical calculations.

Textbooks often leave out details such as quantifying the error when discussion approximations. The following pages are notes I wrote to fill in some of these details when I was teaching.

See also blog posts tagged Probability and statistics and the Twitter account ProbFact.

Last week: Numerical computing resources

Next week: Miscellaneous math notes

More data, less accuracy

Statistical methods should do better with more data. That’s essentially what the technical term “consistency” means. But with improper numerical techniques, the the numerical error can increase with more data, overshadowing the decreasing statistical error.

There are three ways Bayesian posterior probability calculations can degrade with more data:

  1. Polynomial approximation
  2. Missing the spike
  3. Underflow

Elementary numerical integration algorithms, such as Gaussian quadrature, are based on polynomial approximations. The method aims to exactly integrate a polynomial that approximates the integrand. But likelihood functions are not approximately polynomial, and they become less like polynomials when they contain more data. They become more like a normal density, asymptotically flat in the tails, something no polynomial can do. With better integration techniques, the integration accuracy will improve with more data rather than degrade.

With more data, the posterior distribution becomes more concentrated. This means that a naive approach to integration might entirely miss the part of the integrand where nearly all the mass is concentrated. You need to make sure your integration method is putting its effort where the action is. Fortunately, it’s easy to estimate where the mode should be.

The third problem is that software calculating the likelihood function can underflow with even a moderate amount of data. The usual solution is to work with the logarithm of the likelihood function, but with numerical integration the solution isn’t quite that simple. You need to integrate the likelihood function itself, not its logarithm. I describe how to deal with this situation in Avoiding underflow in Bayesian computations.

***

If you’d like help with statistical computation, let’s talk.

Managing Emacs windows

When you have Emacs split into multiple windows, how do you move your cursor between windows? How do you move the windows around?

Update (27 Jan 2015): You may want to skip the post below and use ace-window.  I plan to use it rather than my solution below. Also, see Sacha Chua’s video on window management, including using ace-window.

Moving the cursor between windows

You can use C-x o to move the cursor to the “other” window. That works great when you only have two windows: C-x o toggles between them.

When you have more windows, C-x o moves to the next window in top-to-bottom, left-to-right order. So if you have five windows, for example, you have to go to the “other” window up to four times to move to each of the windows.

The cursor is in the A window. Repeatedly executing C-x o will move the cursor to C, B, and D.

There are more direct commands:

  • windmove-up
  • windmove-down
  • windmove-left
  • windmove-right

You might map these to Control or Alt plus the arrow keys to have a memorable key sequence to navigate windows. However, such keybindings may conflict with existing keybindings, particularly in org-mode.

Moving windows

You can move windows around with the buffer-move. (Technically the windows stay fixed and the buffer contents moves around.)

Super and Hyper keys

Since the various Control arrow keys and Alt arrow keys are spoken for in my configuration, I thought I’d create Super and Hyper keys so that Super-up would map to windmove-up etc.

Several articles recommend mapping the Windows key to Super and the Menu key to Hyper on Windows. The former is problematic because so many of the Windows key combinations are already used by the OS. I did get the latter to work by adding the following to the branch of my init.el that runs on Windows.

        (setq w32-pass-apps-to-system nil
              w32-apps-modifier 'hyper)
        (global-set-key (kbd "<H-up>")    'windmove-up)
        (global-set-key (kbd "<H-down>")  'windmove-down)
        (global-set-key (kbd "<H-left>")  'windmove-left)
        (global-set-key (kbd "<H-right>") 'windmove-right)  

I haven’t figured out yet how to make Super or Hyper keys work on Linux. Suggestions for setting up Super and Hyper on Linux or Windows would be much appreciated.

Advice for going solo

Two years ago I left my job at MD Anderson to become an independent consultant. When people ask me what I learned or what advice I’d give, here are some of the things I usually say.

You can’t transition gradually

I’ve done consulting on the side throughout my career, and I planned to ramp up my consulting to the point that I could gradually transition into doing it full time. That never happened. I had to quit my day job before I had the time and credibility to find projects.

When you have a job and you tell someone you can work 10 hours a week on their project, working evenings and weekends, you sound like an amateur. But when you have your own business and tell someone you can only allocate 10 hours a week to their project, you sound like you’re in high demand and they’re glad you could squeeze them in.

When I left MD Anderson, I had one small consulting project lined up. I had been working on a second project, but it ended sooner than expected. (I was an expert witness on a case that settled out of court.) The result was that I started my consulting career with little work, and I imagine that’s common.

Things move slowly

As soon as I announced on my blog that I was going out on my own, someone from a pharmaceutical company contacted me saying he was hoping I’d quit my job because he had a project for me, helping improve his company’s statistical software development. First day, first lead. This is going to be easy! Even though he was eager to get started, it was months before the project started and months more before I got paid.

In general, the larger a client is, the longer it takes to get projects started, and the longer they sit on invoices. (Google is an exception to the latter; they pay invoices fairly quickly.) Small clients negotiate projects quickly and pay quickly. They can help with your cash flow while you’re waiting on bigger clients.

Build up savings

This is a corollary to the discussion above. You might not have much income for the first few months, so you need several month’s living expenses in the bank before you start.

Other lessons

If you’re thinking about going out on your own and would like to talk about it, give me a call or send me an email. My contact information is listed here.

 

 

Numerical computing resources

This week’s resource post: some numerical computing pages on this site.

See also the Twitter account SciPyTip and numerical programming articles I’ve written for Code Project.

Last week: Regular expressions

Next week: Probability approximations

Striving for simplicity, arriving at complexity

This post is a riff on a line from Mathematics without Apologies, the book I quoted yesterday.

In an all too familiar trade-off, the result of striving for ultimate simplicity is intolerable complexity; to eliminate too-long proofs we find ourselves “hopelessly lost” among the too-long definitions. [emphasis added]

It’s as if there’s some sort of conservation of complexity, but not quite in the sense of a physical conservation law. Conservation of momentum, for example, means that if one part of a system loses 5 units of momentum, other parts of the system have to absorb exactly 5 units of momentum. But perceived complexity is psychological, not physical, and the accounting is not the same. By moving complexity around we might increase or decrease the overall complexity.

The opening quote suggests that complexity is an optimization problem, not an accounting problem. It also suggests that driving the complexity of one part of a system to its minimum may disproportionately increase the complexity of another part. Striving for the simplest possible proofs, for example, could make the definitions much harder to digest. There’s a similar dynamic in programming languages and programs.

Larry Wall said that Scheme is a beautiful programming language, but every Scheme program is ugly. Perl, on the other hand, is ugly, but it lets you write beautiful programs. Scheme can be simple because it requires libraries and applications to implement functionality that is part of more complex languages. I had similar thoughts about COM. It was an elegant object system that lead to hideous programs.

Scheme is a minimalist programming language, and COM is a minimalist object framework. By and large the software development community prefers complex languages and frameworks in hopes of writing smaller programs. Additional complexity in languages and frameworks isn’t felt as strongly as additional complexity in application code. (Until something breaks. Then you might have to explore parts of the language or framework that you had blissfully ignored before.)

The opening quote deals specifically with the complexity of theorems and proofs. In context, the author was saying that the price of Grothendieck’s elegant proofs was a daunting edifice of definitions. (More on that here.) Grothendieck may have taken this to extremes, but many mathematicians agree with the general approach of pushing complexity out of theorems and into definitions. Michael Spivak defends this approach in the preface to his book Calculus on Manifolds.

… the proof of [Stokes’] theorem is, in the mathematician’s sense, an utter triviality — a straight-forward calculation. On the other hand, even the statement of this triviality cannot be understood without a horde of definitions … There are good reasons why the theorems should all be easy and the definitions hard. As the evolution of Stokes’ theorem revealed, a single simple principle can masquerade as several difficult results; the proofs of many theorems involve merely stripping away the disguise. The definitions, on the other hand, serve a twofold purpose: they are rigorous replacements for vague notions, and machinery for elegant proofs. [emphasis added]

Mathematicians like to push complexity into definitions like software developers like to push complexity into languages and frameworks. Both strategies can make life easier on professionals while making it harder on beginners.

Related post: A little simplicity goes a long way

Regular expression resources

Continuing the series of resource posts each Wednesday, this week we have notes on regular expressions:

See also blog posts tagged regular expressions and the RegexTip Twitter account.

Last week: Probability resources

Next week: Numerical computing resources

Problem solvers and theory builders

From Mathematics without Apologies:

It’s conventional to classify mathematicians as “problem solvers” or “theory builders,” depending on temperament. My experiences and the sources I consulted in writing this book convince me that curiosity about problems guides the growth of theories, rather than the other way around. Alexander Grothendieck and Robert Langlands … count among the most ambitious of all builders of mathematical theories, but everything they built was addressed to specific problems with ancient roots.

Related post: Examples bring a subject to life

New development in cancer research scandal

My interest in the Anil Potti scandal started when my former colleagues could not reproduce the analysis in one of Potti’s papers. (Actually, they did reproduce the analysis, at great effort, in the sense of forensically determining the erroneous steps that were carried out.) Two years ago, the story was on 60 Minutes. The straw that broke the camel’s back was not bad science but résumé padding.

It looks like the story is a matter of fraud rather than sloppiness. This is unfortunate because sloppiness is much more pervasive than fraud, and this could have made a great case study of bad analysis. However, one could look at it as a case study in how good analysis (by the folks at MD Anderson) can uncover fraud.

Now there’s a new development in the Potti saga. The latest issue of The Cancer Letter contains letters by whistle-blower Bradford Perez who warned officials at Duke about problems with Potti’s research.

Blog seventh anniversary

Seven years

This blog is seven years old today. I’ve written 2,273 posts so far, a little less than one per day.

Over the holidays I combed through older posts looking for typos, broken links, etc. I fixed a lot of little things, but I’m sure I missed a few. If you find any problems, please let me know.

Most popular posts

Here are some of the most popular posts for the last three years.

In 2011 I split my most popular post list into three parts:

In 2010 I split the list into two parts:

I didn’t post a list of most popular posts for 2009, but the most popular post that year was Why programmers are not paid in proportion to their productivity.

Finally, my most popular posts for 2008.

Other writing

Here are a few other places you can find things I write:

Probability resources

Each Wednesday I post a list of notes on some topic. This week it’s probability.

See also posts tagged probability and statistics and the Twitter account ProbFact.

Last week: Python resources

Next week: Regular expression resources

Photo review

Here are some of the photos I took on my travels last year.

Bicycles on the Google campus in Mountainview, California:
Google bicycles

Sunrise at Isle Vista, California:
Sunrise

View from University of California Santa Barbara:

Reflection of the Space Needle in the EMP museum in Seattle, Washington:

Paradise Falls, Thousand Oaks, California:

Bed and breakfast in Geldermalsen, The Netherlands:

Amsterdam, The Netherlands:

Chinatown in San Francisco, California

Heidelberg, Germany

Bringing back Coltrane

In the novel Chasing Shadows the bad guys have built a time machine named Magog.

“The bottom line is this. And it is hard for me to believe. They are going to use Magog to bring someone back from the past.”

Jack did not blink or move. His heart was beating very quickly now, but he merely said wryly, “Yes? Who are they going to bring back? Tell me it is John Coltrane.”

“You are never really serious are you?”

“I’m very serious about my music. Why are bad guys always bringing back people that everyone was glad to see go the first time? We could use more Coltrane …”

Related post: Nunc dimittis