Mean residual time

If something has survived this far, how much longer is it expected to survive? That’s the question answered by mean residual time.

For a positive random variable X, the mean residual time for X is a function eX(t) given by

e_X(t) = E(X - t \mid X > t) = \int_t^\infty \frac{1 - F_X(x)}{1-F_X(t)} \, dx

provided the expectation and integral converge. Here F(t) is the CDF, the probability that X is less than t.

For an exponential distribution, the mean residual time is constant. For a Pareto (power law) distribution, the mean residual time is proportional to t. This has an interesting consequence, known as the Lindy effect.

Now let’s turn things around. Given function a function e(t), can we find a density function for a positive random variable with that mean residual time? Yes.

The equation above yields a differential equation for F, the CDF of the distribution.

If we differentiate both sides of

e(t) (1 - F(t)) = \int_t^\infty 1 - F(x)\, dx

with respect to t and rearrange, we get the first order differential equation

F'(t) + g(t)\, F(t) = g(t)


g(t) = \frac{e'(t) + 1}{e(t)}

The initial condition must be F(0) = 0 because we’re looking for the distribution of a positive random variable, i.e. the probability of X being less than zero must be 0. The solution is then

F(t) = 1 - \frac{e(0)}{e(t)} \exp\left( -\int_0^t \frac{dx}{e(x)} \right)

This means that for a desired mean residual time, you can use the equation above to create a CDF function to match. The derivative of the CDF function gives the PDF function, so differentiate both sides to get the density.

No use for old things

From Brave New World:

“But why is [Shakespeare] prohibited?” asked the Savage. …

The Controller shrugged his shoulders. “Because it’s old; that’s the chief reason. We haven’t any use for old things here.”

“Even when they’re beautiful?”

“Particularly when they’re beautiful. Beauty’s attractive, and we don’t want people to be attracted by old things. We want them to like the new ones.”

Related: Chronological snobbery

Pure math and physics

From Paul Dirac, 1938:

Pure mathematics and physics are becoming ever more closely connected, though their methods remain different. One may describe the situation by saying that the mathematician plays a game in which he himself invents the rules while the physicist plays a game in which the rules are provided by Nature, but as time goes on it becomes increasingly evident that the rules which the mathematician finds interesting are the same as those which Nature has chosen.

Example of unit testing R code with testthat

Here’s a little example of using Hadley Wickham’s testthat package for unit testing R code.

The function below computes the real roots of a quadratic. All that really matters for our purposes is that the function can return 0, 1, or 2 numbers and it could raise an error.

    real.roots <- function(a, b, c)
        if (a == 0.)
            stop("Leading term cannot be zero")

        d = b*b - 4*a*c # discriminant

        if (d < 0)
           rr = c()
        else if (d == 0)
           rr = c( -b/(2*a) )
            rr = c( (-b - sqrt(d))/(2*a), 
                    (-b + sqrt(d))/(2*a)  )


To test this code with testthat we create another file for tests. The name of the file should begin with test so that testthat can recognize it as a file of test code. So let name the file containing the code above real_roots.R and the file containing its tests test_real_roots.R.

The test file needs to read in the file being tested.


Now let’s write some tests for the case of a quadratic with two real roots.

    test_that("Distinct roots", {

        roots <- real.roots(1, 7, 12)

        expect_that( roots, is_a("numeric") )
        expect_that( length(roots), equals(2) )
        expect_that( roots[1] < roots[2], is_true() )

This tests that we get back two numbers and that they are sorted in increasing order.

Next we find the roots of (x + 3000)2 = x2 + 6000x + 9000000. We’ll test whether we get back -3000 as the only root. In general you can’t expect to get an exact answer, though in this case we do since the root is an integer. But we’ll show in the next example how to test for equality with a given tolerance.

    test_that("Repeated root", {

        roots <- real.roots(1, 6000, 9000000)

        expect_that( length(roots), equals(1) )

        expect_that( roots, equals(-3000) )

        # Test whether ABSOLUTE error is within 0.1 
        expect_that( roots, equals(-3000.01, tolerance  = 0.1) )

        # Test whether RELATIVE error is within 0.1
        # To test relative error, set 'scale' equal to expected value.
        # See base R function all.equal for optional argument documentation.
        expect_equal( roots, -3001, tolerance  = 0.1, scale=-3001) 

To show how to test code that should raise an error, we’ll find the roots of 2x + 3, which isn’t a quadratic. Notice that you can test whether any error is raised or you can test whether the error message matches a given regular expression.

    test_that("Polynomial must be quadratic", {

        # Test for ANY error                     
        expect_that( real.roots(0, 2, 3), throws_error() )

        # Test specifically for an error string containing "zero"
        expect_that( real.roots(0, 2, 3), throws_error("zero") )

        # Test specifically for an error string containing "zero" or "Zero" using regular expression
        expect_that( real.roots(0, 2, 3), throws_error("[zZ]ero") )

Finally, here are a couple tests that shouldn’t pass.

    test_that("Bogus tests", {

        x <- c(1, 2, 3)

        expect_that( length(x), equals(2.7) )
        expect_that( x, is_a("data.frame") )

To run the tests, you can run test_dir or test_file. If you are at the R command line and your working directory is the directory containing the two files above, you could run the tests with test_dir("."). In this case we have only one file of test code, but if we had more test files test_dir would find them, provided the file names begin with test.

* * *

Related: Help integrating R into your environment

Singular Value Consulting, LLC

The name of my business is Singular Value Consulting, LLC.

Math people may catch the allusion to singular value decomposition (SVD). I hope that non-math folks will interpret “singular value” to mean something like “singularly valuable.”

One way to think of an SVD is a pair of coordinate systems that give a linear transformation the simplest representation. So metaphorically, SVD is getting to the core of a problem and producing a simple solution.

For some less serious mathematical company names, see this list.

See this page for some ideas of the kinds of things Singular Value Consulting could do for your company.

Computing skewness and kurtosis in one pass

If you compute the standard deviation of a data set by directly implementing the definition, you’ll need to pass through the data twice: once to find the mean, then a second time to accumulate the squared differences from the mean. But there is an equivalent algorithm that requires only one pass and that is more accurate than the most direct method. You can find the code for implementing it here.

You can also compute the higher sample moments in one pass. I’ve extended the previous code to compute skewness and kurtosis in one pass as well.

The new code also lets you split your data, say to process it in parallel on different threads, and then combine the statistics, in the spirit of map-reduce.

Lastly, I’ve posted analogous code for simple linear regression.

A statistical problem with “nothing to hide”

One problem with the nothing-to-hide argument is that it assumes innocent people will be exonerated certainly and effortlessly. That is, it assumes that there are no errors, or if there are, they are resolved quickly and easily.

Suppose the probability of a correctly analyzing an email or phone call is not 100% but 99.99%. In other words, there’s one chance in 10,000 of an innocent message being incriminating. Imagine authorities analyzing one message each from 300,000,000 people, roughly the population of the United States. Then around 30,000 innocent people will have some ‘splaining to do. They will have to interrupt their dinner to answer questions from an agent knocking on their door, or maybe they’ll spend a few weeks in custody. If the legal system is 99.99% reliable, then three of them will go to prison.

Now suppose false positives are really rare, one in a million. If you analyze 100 messages from each person rather than just one, you’re approximately back to the scenario above.

Scientists call indiscriminately looking through large amounts of data “a fishing expedition” or “data dredging.” One way to mitigate the problem of massive false positives from data dredging is to demand a hypothesis: before you look through the data, say what you’re hoping to prove and why you think it’s plausible.

The legal analog of a plausible hypothesis is a search warrant. In statistical terms, “probable cause” is a judge’s estimation that the prior probability of a hypothesis is moderately high. Requiring scientists to have a hypothesis and requiring law enforcement to have a search warrant both dramatically reduce the number of false positives.

Related posts

The weight of code

From Bjorn Freeman-Benson’s talk Airplanes, Spaceships, and Missiles: Engineering Lessons from Famous Projects

Bjorn is discussing the ferrite core memory of the Apollo guidance system.

These are very, very robust memory systems. … But the problem is that they actually have weight to them. Core memory actually weighs a bunch, so when you’re writing your program for the lunar module … every line of code that you wrote had a consequence in weight. And you could measure how heavy your code was at the end of a compile line. … It’s an interesting analogy to keep in mind because in fact even today our code has weight. It doesn’t really have physical weight … Our code has psychological weight because every line of code we write has to be maintained. It has to be supported. It has to be operated.

Related posts


Bottom-up exposition

I wish more authors followed this philosophy:

The approach I have taken here is to try to move always from the particular to the general, following through the steps of the abstraction process until the abstract concept emerges naturally. … at the finish it would be quite appropriate for the reader to feel that (s)he had just arrived at the subject, rather than reached the end of the story.

From the preface here (ISBN 0486450260).

When books start at the most abstract point, I feel like saying to the author “Thank you for the answer, but what was the question?”

Orwellian vs Huxleyian

Orwell and Huxley wrote contrasting dystopian books. In Orwell’s 1984, people are controlled by overt totalitarian power. In Huxley’s Brave New World, people are lulled into submission.

Orwellian became a common adjective:

But Huxleyian didn’t:

Neither did Huxleyan:

Orwellian gets about 55 times as many hits as Huxleyian and 104 times as many hits as Huxleyan.

Google’s Ngram viewer gives similar results. Apparently we are concerned about overt power, but we don’t care about a world in which people don’t care.