Uncategorized

Some frequently asked questions

I don’t have an FAQ page per se, but I’ve written a few blog posts where I answer some questions, and here I’ll answer a few more.

Should I get a PhD?

See my answer here and take a look at some of the other answers on the same site.

Do you have any advice for people going out on their own?

Yes. See my post Advice for going solo.

Shortly after I went out on my own, I wrote this post responding to questions people had about my particular situation. My answers there remain valid, except one. I said that planned to do anything I can do well that also pays well. That was true at the time, but I’ve gotten a little more selective since then.

Can you say more about the work you’ve been doing?

Only in general terms. For example, I did some work with psychoacoustics earlier this year, and lately I’ve been working with medical device startups and giving expert testimony.

Nearly all the work I do is covered under NDA (non-disclosure agreement). Occasionally a project will be public, such as the white paper I wrote for Hitachi Data Systems comparing replication and erasure coding. But usually a project is confidential, though I hope to be able to say more about some projects after they come to market.

Miscellaneous other questions

I wrote an FAQ post of sorts a few years ago. Here are the questions from that post that people still ask fairly often.

Any more questions?

You can use this page to send me a question and see my various contact information. The page also has a link to a vCard you could import into your contact manager.

A different kind of network book

Yesterday I got a review copy of The Power of Networks. There’s some math inside, but not much, and what’s there is elementary.

I’d say it’s not a book about networks per se but a collection of topics associated with networks: cell phone protocols, search engines, auctions, recommendation engines, etc. It would be a good introduction for non-technical people who are curious about how these things work. More technically inclined folks probably already know much of what’s here.

Speeding up R code

People often come to me with R code that’s running slower than they’d like. It’s not unusual to make the code 10 or even 100 times faster by rewriting it in C++.

Not all that speed improvement comes from changing languages. Some of it comes from better algorithms, eliminating redundancy, etc.

Why bother optimizing?

If code is running 100 times slower than you’d like, why not just run it on 100 processors? Sometimes that’s the way to go. But maybe the code doesn’t split up easily into pieces that can run in parallel. Or maybe you’d rather run the code on your laptop than send it off to the cloud. Or maybe you’d like to give your code to someone else and you want them to be able to run the code conveniently.

Optimizing vs rewriting R

It’s sometimes possible to tweak R code to make it faster without rewriting it, especially if it is naively using loops for things that could easily be vectorized. And it’s possible to use better algorithms without changing languages.

Beyond these high-level changes, there are a number of low-level changes that may give you a small speed-up. This way madness lies. I’ve seen blog posts to the effect “I rewrote this part of my code in the following non-obvious way, and for reasons I don’t understand, it ran 30% faster.” Rather than spending hours or days experimenting with such changes and hoping for a small speed up, I use a technique fairly sure to give a 10x speed up, and that is rewriting (part of) the code in C++.

If the R script is fairly small, and if I have C++ libraries to replace all the necessary R libraries, I’ll rewrite the whole thing in C++. But if the script is long, or has dependencies I can’t replace, or only has a small section where nearly all the time is spent, I may just rewrite that portion in C++ and call it from R using Rcpp.

Simulation vs analysis

The R programs I’ve worked on often compute something approximately by simulation that could be calculated exactly much faster. This isn’t because the R language encourages simulation, but because the language is used by statisticians who are more inclined to use simulation than analysis.

Sometimes a simulation amounts to computing an integral. It might be possible to compute the integral in closed form with some pencil-and-paper work. Or it might be possible to recognize the integral as a special function for which you have efficient evaluation code. Or maybe you have to approximate the integral, but you can do it more efficiently by numerical analysis than by simulation.

Redundancy vs memoization

Sometimes it’s possible to speed up code, written in any language, simply by not calculating the same thing unnecessarily. This could be something simple like moving code out of inner loops that doesn’t need to be there, or it could be something more sophisticated like memoization.

The first time it sees a function called with a new set of arguments, memoization saves the result and creates a way to associate the arguments with the result in some sort of look-up table, such as a hash. The next time the function is called with the same argument, the result is retrieved from memory rather than recomputed.

Memoization works well when the set of unique arguments is fairly small and the calculation is expensive relative to the cost of looking up results. Sometimes the set of potential arguments is very large, and it looks like memoization won’t be worthwhile, but the set of actual arguments is small because some arguments are used over and over.

 Related post:

Turning math inside-out

Here’s one of the things about category theory that takes a while to get used to.

Mathematical objects are usually defined internally. For example, the Cartesian product P of two sets A and B is defined to be the set of all ordered pairs (ab) where a comes from A and b comes from B. The definition of P depends on the elements of A and B but it does not depend on any other sets.

Category theory turns this inside-out. Operations such as taking products are not defined in terms of elements of objects. Category theory makes no use of elements or subobjects [1]. It defines things by how they act, not their inner workings. People often stress what category theory does not depend on, but they less often stress what it does depend on. The definition of the product of two objects in any category depends on all objects in that category: The definition of the product of objects A and B contains the phrase “such that for any other object X …” [More on categorical products].

The payoff for this inside-out approach to products is that you can say something simultaneously about everything that acts like a product, whether it’s products of sets, products of fields (i.e. that they don’t exist), products of groups, etc. You can’t say something valid across multiple categories if you depend on details unique to one categories.

This isn’t unique to products. Universal properties are everywhere. That is, you see definitions containing “such that for any other object X …” all the time. In this sense, category theory is extremely non-local. The definition of a widget often depends on all widgets.

There’s a symmetry here. Traditional definitions depend on the internal workings of objects, but only on the objects themselves. There are no third parties involved in the definition. Categorical definitions have zero dependence on internal workings, but depend on the behavior of everything in the category. There are an infinite number of third parties involved! [2] You can have a definition that requires complete internal knowledge but zero external knowledge, or a definition that requires zero internal knowledge and an infinite amount of external knowledge.

Related: Applied category theory

* * *

[1] Category theory does have notions analogous to elements and subsets, but they are defined the same way everything else is in category theory, in terms of objects and morphisms, not by appealing to the inner structure of objects.

[2] You can have a category with a finite number of objects, but usually categories are infinite. In fact, they are usually so large that they are “classes” of objects rather than sets.

Mathematical modeling for medical devices

We’re about to see a lot of new, powerful, inexpensive medical devices come out. And to my surprise, I’ve contributed to a few of them.

Growing compute power and shrinking sensors open up possibilities we’re only beginning to explore. Even when the things we want to observe elude direct measurement, we may be able to infer them from other things that we can now measure accurately, inexpensively, and in high volume.

In order to infer what you’d like to measure from what you can measure, you need a mathematical model. Or if you’d like to make predictions about the future from data collected in the past, you need a model. And that’s where I come in. Several companies have hired me to help them create medical devices by working on mathematical models. These might be statistical models, differential equations, or a combination of the two. I can’t say much about the projects I’ve worked on, at least not yet. I hope that I’ll be able to say more once the products come to market.

I started my career doing mathematical modeling (partial differential equations) but wasn’t that interested in statistics or medical applications. Then through an unexpected turn of events, I ended up spending a dozen years working in the biostatistics department of the world’s largest cancer center.

After leaving MD Anderson and starting my consultancy, several companies have approached me for help with mathematical problems associated with their idea for a medical device. These are ideal projects because they combine my earlier experience in mathematical modeling with my more recent experience with medical applications.

If you have an idea for a medical device, or know someone who does, let’s talk. I’d like to help.

 

Solar power and applied math

The applied math featured here tends to be fairly sophisticated, but there’s a lot you can do with the basics as we’ll see in the following interview with Trevor Dawson of Borrego Solar, a company specializing in grid-connected solar PV systems.

Trevor Dawson

JC: Can you say a little about yourself?

TD: I’m Trevor Dawson, I’m 25, born in the California Bay Area. I enjoy wood working, ceramics, soccer and travelling. I consider myself an environmentalist.

JC: What is your role at Borrego Solar?

TD: I am a Cost Reduction Analyst and I focus on applying Lean principles to identify and remove waste from both our internal processes and construction in the field. I use data to identify problems, prioritize them, and to verify the effectiveness of our solutions. I work with a variety of teams to reduce the cost or time of our projects.

Solar is a very fast-paced industry. Policy changes and technological improvements are being developed quickly and we have to respond quickly. A key function of my job is to assign measurable cost benefits to new practices and help ensure Borrego Solar continues to be an industry leader.

JC: What is your technical background and education?

TD: I graduated with a Bachelors of Science in Industrial & Systems Engineering (IE) from the University of Washington. I spent 3.5 years as an IE implementing process improvements on Boeing’s 777 Manufacturing Wing Line in Seattle, WA. I gained valuable experience in Lean, schedule optimization, design of experiments, and big data efforts. At Borrego, I get to apply those skills to help accelerate the adoption of the most time-tested, renewable energy source of all: the sun.

JC: What math, physics, or technical skills do you use every day?

TD: Addition, algebra, and simple statistics. I like to think I’ve mastered the basics. I also use a lot of my industrial engineering training to help gather and analyze data like design of experiments, time studies, and lean problem solving methodology.

I mostly work in Excel and use Power Pivot to drive large, cumbersome data into neat summary tables. Although the analysis can be a challenge, the hard work is rolling it up and presenting it in a way that is meaningful and convincing. When you’re suggesting a business decision, especially when it challenges the norm, your internal customers want to know the answer but they are equally interested in your process. For example, how does the business case change if a defined constraint or coefficient changes? The solar industry is dynamic and still maturing, so we have to be especially poised in our decision-making.

JC: What do you use much less than you expected?

TD: Calculus. I spent so much time learning calculus and even other things like differential equations but haven’t had much opportunity to apply them. However, I do think calculus taught me important practical problem solving skills and I put that to use now tackling large problems that span multiple pages.

JC: What math or technical skill do you wish you had more of or understood better?

TD: Excel programming and design. Excel rules the world, and although I was introduced to it at school, I think more intense courses should be commonplace. Regarding design, execution is the hardest part of any business decision, and design would help communicate results and suggestions much more effectively. A business needs verifiable proof that the suggested change is real and if executed will perform as predicted. This stage of verifying the effectiveness of a project could be improved with better design skills and may even reduce the amount of touch time and communications all the way through from inception to completion of a project.

JC: Anything else you’d like us to know?

TD: Go solar!

Amistics

Neal Stephenson coins a useful word Amistics in his novel Seveneves:

… it was a question of Amistics, which was a term that had been coined ages ago by a Moiran anthropologist to talk about the choices that different cultures made as to which technologies they would, and would not, make part of their lives. The word went all the way back to the Amish people … All cultures did this, frequently without being consciously aware that they had made collective choices.

Related post by Kevin Kelly: Amish Hackers

Retronyms and Backronyms

gear shift for a car with an automatic transmission

A retronym is a new name created for an old thing, often made necessary by technological changes. For example, we have terms like “postal mail” or “snail mail” for what used to simply be “mail” because email has become the default. What was once called a “transmission” is now called a “manual transmission” since most cars (at least in the US) now have an automatic transmission.

A backronym is sort of a fictional etymology, such as a meaning retrofitted to an acronym. Richard Campbell explains Structured Query Language is a backronym for SQL.

IBM’s first database was not relational. Its second database, DB2, was a sequel to its first database, and so they wanted to call its query language SEQUEL but they were unable to copyright the name. So they dropped the vowels, shortened it to SQL. Later someone came up with the backronym “Structured Query Language.”

The APGAR score for newborns is a mnemonic backronym. Virginia Apgar came up with her scoring system ten years before someone came up with the backronym Appearance, Pulse, Grimace, Activity and Respiration.

Flood control parks

flooded park

The park in the photo above flooded. And that’s a good thing. It’s designed to flood so that homes don’t.

It’s not really a park that flooded. It’s a flood control project that most of the time doubles as a park. Ordinarily the park has a lake, but a few days a year the park is a lake.

Harris County, Texas has an unusually large amount of public recreational land. One reason the county can afford this is that some of the recreational land serves two purposes.

Other Houston-area posts:

Kalman filters and bottom-up learning

radio antennae

Kalman filtering is a mixture of differential equations and statistics. Kalman filters are commonly used in tracking applications, such as tracking the location of a space probe or tracking the amount of charge left in a cell phone battery. Kalman filters provide a way to synthesize theoretical predictions and actual measurements, accounting for error in both.

Engineers naturally emphasize the differential equations and statisticians naturally emphasize the statistics. Both perspectives are valuable, but in my opinion/experience, the engineering perspective must come first.

From an engineering perspective, a Kalman filtering problem starts as a differential equation. In an ideal world, one would simply solve the differential equation and be done. But the experienced engineer realizes his or her differential equations don’t capture everything. (Unlike the engineer in this post.) Along the road to the equations at hand, there were approximations, terms left out, and various unknown unknowns.

The Kalman filter accounts for some level of uncertainty in the process dynamics and in the measurements taken. This uncertainty is modeled as randomness, but this doesn’t mean that there’s necessarily anything “random” going on. It simply acknowledges that random variables are an effective way of modeling miscellaneous effects that are unknown or too complicated to account for directly. (See Random is as random does.)

The statistical approach to Kalman filtering is to say that it is simply another estimation problem. You start from a probability model and apply Bayes’ theorem. That probability model has a term inside that happens to come from a differential equation in practice, but this is irrelevant to the statistics. The basic Kalman filter is a linear model with normal probability distributions, and this makes a closed-form solution for the posterior possible.

You’d be hard pressed to start from a statistical description of Kalman filtering, such as that given here, and have much appreciation for the motivating dynamics. Vital details have simply been abstracted away. As a client told me once when I tried to understand his problem starting from the top-down, “You’ll never get here from there.”

The statistical perspective is complementary. Some things are clear from the beginning with the statistical formulation that would take a long time to see from the engineering perspective. But while both perspectives are valuable, I believe it’s easier to start on the engineering end and work toward the statistics end rather than the other way around.

History supports this claim. The Kalman filter from the engineering perspective came first and its formulation in terms of Bayesian statistics came later. Except that’s not entirely true.

Rudolf Kálmán published his seminal paper in 1960 and four years later papers started to come out making the connection to Bayesian statistics. But while Kálmán and others were working in the US starting from the engineering end, Ruslan Stratonovich was working in Russia starting from the statistical end. Still, I believe it’s fair to say that most of the development and application of Kalman filters has proceeded from the engineering to the statistics rather than the other way around.

More on Kalman filters

 

Top tweets

I had a couple tweets this week that were fairly popular. The first was a pun on the musical Hamilton and the Hamiltonian from physics. The former is about Alexander Hamilton (1755–1804) and the latter is named after William Rowan Hamilton (1805–1865).

The second was a sort of snowclone, a variation on the line from the Bhagavad Gita that J. Robert Oppenheimer famously quoted in reference to the atomic bomb: