Travel

This week I spent two days in Los Angeles visiting a client and two days in Austin for the SciPy conference.

The project in LA is very interesting, but not something I’m free to talk about. The conference was fun, especially because I got to see in person several people that I have only talked to online. Both trips stirred up a lot ideas that are going to take a while to process.

I may be going to Seattle before too long, but other than that I’m done traveling for now.

Clever visual proof

There are as many yellow dots above the bottom row of the triangle as there are pairs of purple dots on the bottom row. To see this, note that every yellow dot determines a pair of purple dots by projecting it down to the left and to the right. Conversely, you can go up from any pair of purple dots up to a yellow dot.

Via sark

Related post: Tetrahedral numbers

Statistical evidence versus legal evidence

This is the job of a juror in the US legal system described in statistical terms:

Compute the posterior probability of a defendant’s guilt conditioned on the admissible evidence, starting with a prior biased toward innocence. Report “guilty” if the posterior mean probability of guilt is above a level referred to as “beyond reasonable doubt.”

A juror is not to compute a probability conditioned on all evidence, only admissible evidence. One of the purposes of voir dire is to identify potential jurors who do not understand this concept and strike them from the jury pool. Very few jurors would understand or use the language of conditional probability, but a competent juror must understand that some facts are not to be taken into consideration in reaching a verdict.

For example, the fact that someone has been arrested, indicted by a grand jury, and brought to trial is not itself to be considered evidence of guilt. It is not legal evidence, but it certainly is statistical evidence: People on trial are more likely to be guilty of a crime than people who are not on trial.

This sort of schizophrenia is entirely proper. Statistical tendencies apply to populations, but trials are about individuals. The goal of a trial is to make a correct decision in an individual case, not to make correct decisions on average.[1]  Also, the American legal system embodies the belief that false positives are much worse than false negatives. [2]

Thinking of a verdict as a conditional probability allows a juror to simultaneously believe personally that someone is probably guilty while remaining undecided for legal purposes.

Related: A statistical problem with “nothing to hide”

Footnotes:

[1] Jury instructions are implicitly Bayesian rather than frequentist in the sense that jurors are asked to come up with a degree of belief. They are not asked to imagine an infinite sequence of similar trials etc.

[2] For example, Benjamin Franklin said  “That it is better 100 guilty Persons should escape than that one innocent Person should suffer, is a Maxim that has been long and generally approved.” In decision theory vocabulary, this is a highly asymmetric loss function.

How to convert frequency to pitch

I saw somewhere that James Earl Jones’ speaking voice is around 85 Hz. What musical pitch is that?

Let P be the frequency of some pitch you’re interested in and let C = 261.626 be the frequency of middle C. If h is the number of half steps from C to P then

P / C = 2h/12.

Taking logs,

h = 12 log(P / C) / log 2.

If P = 85, then h = -19.46. That is, James Earl Jones’ voice is about 19 half-steps below middle C, around the F an octave and a half below middle C.

More details on the derivation above here.

 

Photo credit Wikipedia
Music image created using Lilypod.

Leading digits and quadmath

My previous post looked at a problem that requires repeatedly finding the first digit of kn where k is a single digit but n may be on the order of millions or billions.

The most direct approach would be to first compute kn as a very large integer, then find it’s first digit. That approach is slow, and gets slower as n increases. A faster way is to look at the fractional part of log kn = n log k and see which digit it corresponds to.

If n is not terribly big, this can be done in ordinary precision. But when n is large, multiplying log k by n and taking the fractional part brings less significant digits into significance. So for very large n, you need extra precision. I first did this in Python using SymPy, then switched to C++ for more speed. There I used the quadmath library for gcc. (If n is big enough, even quadruple precision isn’t enough. An advantage to SymPy over quadmath is that the former has arbitrary precision. You could, for example, set the precision to be 10 more than the number of decimal places in n in order to retain 10 significant figures in the fractional part of n log k.)

The quadmath.h header file needs to be wrapped in an extern C declaration. Otherwise gcc will give you misleading error messages.

The 128-bit floating point type __float128 has twice as many bits as a double. The quadmath functions have the same name as their standard math.h counterparts, but with a q added on the end, such as log10q and fmodq below.

Here’s code for computing the leading digit of kn that illustrates using quadmath.

#include <cmath>
extern "C" {
#include <quadmath.h>
}

__float128 logs[11];

for (int i = 2; i <= 10; i++)
    logs[i] = log10q(i + 0.0);

int first_digit(int base, long long exponent)
{
    __float128 t = fmodq(exponent*logs[base], 1.0);
    for (int i = 2; i <= 10; i++)
        if (t < logs[i])
            return i-1;
}

The code always returns because t is less than 1.

Caching the values of log10q saves repeated calls to a relatively expensive function. So does using the search at the bottom rather than computing powq(10, t).

The linear search at the end is more efficient than it may seem. First, it’s only search a list of length 9. Second, because of Benford’s law, the leading digits are searched in order of decreasing frequency, i.e. most inputs will cause first_digit to return early in the search.

When you compile code using quadmath, be sure to add -lquadmath to the compile command.

Related posts

Gelfand’s question

Gelfands’s question asks whether there is a positive integer n such that the first digits of jn base 10 are all the same for j = 2, 3, 4, …, 9. (Thanks to @republicofmath for pointing out this problem.) This post will explore Gelfand’s question via probability.

The MathWorld article on Gelfand’s question says that the answer is no for values of n less than 100,000. That range seemed small to me. My intuition was that you’d need to try larger values of n to have a reasonable chance of finding a solution.

So how big a value of n should you explore? To answer that question, consider picking n at random. If the leading digits of jn were randomly distributed, what would the chances be that n would satisfy Gelfand’s question? Leading digits are not random, but they do often have regular distributions.

Suppose the leading digits are uniformly distributed among 1 to 9, and that the leading digits for each base are independent. Then the probability of jn beginning with 1 (or any other chosen digit) for all j from 2 to 9 would be (1/9)8, and the probability of all leading digits matching any digit would be 9 times larger. That would say the probability of all leading digits matching would be on the order of 2 × 10-7, so we should try on the order of 107 or 108 values of n to have a decent chance of finding one.

But there’s a problem with the above argument: leading digits are not uniformly distributed. Benford’s law says that leading digits are often distributed very unevenly. When a system follows Benford’s law, the probability of a leading digit being d is log10(1 + 1/d). An uneven distribution of leading digits raises the probability that all the digits will match. If the leading digits of jn follow Benford’s law, and are independent, then the probability of all matching is 6.84 × 10-5, two orders of magnitude higher than assuming a uniform distribution.

Do the leading digits of jn follow Benford’s law? Yes, at least approximately, based on testing values of n from 1 to 1,000.

Now suppose the probability of a randomly selected value of n satisfying Gelfand’s question is p = 6.84 × 10-5. If we tried 100,000 random values of n, the probability of having no successes would be about 0.00107. So my intuition was wrong. If the random heuristic given here were correct, we’d stand a good chance of seeing a solution if we tried values of n up to 100,000.

Where does the probabilistic heuristic go wrong? In the assumption that the distributions of the leading digits are independent.

Let xj be the sequence jn for n running from 1 to 1000. Some of the xj are correlated and some are not.

Some of the correlations are easy to see. For example, the sequences for x2 and x4 have correlation 0.493. That’s because 4n = (2n)2. So if you know the first digit of 2n, you can often guess the first digit of 4n. In light of this, it’s not surprising that x2 is also correlated with x8, and x3 is correlated with x9.

You might speculate that xj and xk are uncorrelated if and only if j and k are relatively prime. Some sequences, such as x2 and x3 support that guess. But x4 and x5 are negatively correlated, r = -0.433, for reasons that are not obvious. I suspect that the negative correlations are the key to understanding the question.

***

Update: I have verified that the answer to Gelfand’s question is no for n up to 1010.

What have you been doing?

Several people have asked me what kind of work I’ve been doing since I went out on my own earlier this year. So far I’ve done a lot of fairly small projects, though I have one large project that’s just getting started. (The larger the project and client, the longer it takes to get rolling.)

Here are some particular things I’ve been doing.

  • Helped a company improve their statistical software development process
  • Modeled the efficiency and reliability of server configurations
  • Analyzed marketing and sales data
  • Coached someone in technical professional development
  • Wrote an article for an online magazine
  • Helped a company integrate R into their software product
  • Reviewed the mathematical code in a video game
  • Researched and coded up some numerical algorithms

If something on this list sounds like something you’d like for me to do with your company, please let me know.

 

Continuous quantum

David Tong argues that quantum mechanics is ultimately continuous, not discrete.

In other words, integers are not inputs of the theory, as Bohr thought. They are outputs. The integers are an example of what physicists call an emergent quantity. In this view, the term “quantum mechanics” is a misnomer. Deep down, the theory is not quantum. In systems such as the hydrogen atom, the processes described by the theory mold discreteness from underlying continuity. … The building blocks of our theories are not particles but fields: continuous, fluid-like objects spread throughout space. … The objects we call fundamental particles are not fundamental. Instead they are ripples of continuous fields.

Source: The Unquantum Quantum, Scientific American, December 2012.

Mean residual time

If something has survived this far, how much longer is it expected to survive? That’s the question answered by mean residual time.

For a positive random variable X, the mean residual time for X is a function eX(t) given by

e_X(t) = E(X - t \mid X > t) = \int_t^\infty  \frac{1 - F_X(x)}{1-F_X(t)} \, dx

provided the expectation and integral converge. Here F(t) is the CDF, the probability that X is less than t.

For an exponential distribution, the mean residual time is constant. For a Pareto (power law) distribution, the mean residual time is proportional to t. This has an interesting consequence, known as the Lindy effect.

Now let’s turn things around. Given function a function e(t), can we find a density function for a positive random variable with that mean residual time? Yes.

The equation above yields a differential equation for F, the CDF of the distribution.

If we differentiate both sides of

e(t) (1 - F(t)) = \int_t^\infty 1 - F(x)\, dx

with respect to t and rearrange, we get the first order differential equation

F'(t) + g(t)\, F(t) = g(t)

where

g(t) = \frac{e'(t) + 1}{e(t)}

The initial condition must be F(0) = 0 because we’re looking for the distribution of a positive random variable, i.e. the probability of X being less than zero must be 0. The solution is then

F(t) = 1 - \frac{e(0)}{e(t)} \exp\left( -\int_0^t \frac{dx}{e(x)} \right)

This means that for a desired mean residual time, you can use the equation above to create a CDF function to match. The derivative of the CDF function gives the PDF function, so differentiate both sides to get the density.

No use for old things

From Brave New World:

“But why is [Shakespeare] prohibited?” asked the Savage. …

The Controller shrugged his shoulders. “Because it’s old; that’s the chief reason. We haven’t any use for old things here.”

“Even when they’re beautiful?”

“Particularly when they’re beautiful. Beauty’s attractive, and we don’t want people to be attracted by old things. We want them to like the new ones.”

Related: Chronological snobbery

Pure math and physics

From Paul Dirac, 1938:

Pure mathematics and physics are becoming ever more closely connected, though their methods remain different. One may describe the situation by saying that the mathematician plays a game in which he himself invents the rules while the physicist plays a game in which the rules are provided by Nature, but as time goes on it becomes increasingly evident that the rules which the mathematician finds interesting are the same as those which Nature has chosen.

Example of unit testing R code with testthat

Here’s a little example of using Hadley Wickham’s testthat package for unit testing R code. You can read more about testthat here.

The function below computes the real roots of a quadratic. All that really matters for our purposes is that the function can return 0, 1, or 2 numbers and it could raise an error.

real.roots <- function(a, b, c)
{
    if (a == 0.)
        stop("Leading term cannot be zero")

    d = b*b - 4*a*c # discriminant

    if (d < 0)
       rr = c()
    else if (d == 0)
       rr = c( -b/(2*a) )
    else
        rr = c( (-b - sqrt(d))/(2*a), 
                (-b + sqrt(d))/(2*a)  )

    return(rr)
}

To test this code with testthat we create another file for tests. The name of the file should begin with test so that testthat can recognize it as a file of test code. So let name the file containing the code above real_roots.R and the file containing its tests test_real_roots.R.

The test file needs to read in the file being tested.

source("real_roots.R")

Now let’s write some tests for the case of a quadratic with two real roots.

test_that("Distinct roots", {

    roots <- real.roots(1, 7, 12)

    expect_that( roots, is_a("numeric") )
    expect_that( length(roots), equals(2) )
    expect_that( roots[1] < roots[2], is_true() )
})

This tests that we get back two numbers and that they are sorted in increasing order.

Next we find the roots of (x + 3000)2 = x2 + 6000x + 9000000. We’ll test whether we get back -3000 as the only root. In general you can’t expect to get an exact answer, though in this case we do since the root is an integer. But we’ll show in the next example how to test for equality with a given tolerance.

test_that("Repeated root", {

    roots <- real.roots(1, 6000, 9000000)

    expect_that( length(roots), equals(1) )

    expect_that( roots, equals(-3000) )

    # Test whether ABSOLUTE error is within 0.1 
    expect_that( roots, equals(-3000.01, tolerance  = 0.1) )

    # Test whether RELATIVE error is within 0.1
    # To test relative error, set 'scale' equal to expected value.
    # See base R function all.equal for optional argument documentation.
    expect_equal( roots, -3001, tolerance  = 0.1, scale=-3001) 
})

To show how to test code that should raise an error, we’ll find the roots of 2x + 3, which isn’t a quadratic. Notice that you can test whether any error is raised or you can test whether the error message matches a given regular expression.

test_that("Polynomial must be quadratic", {

    # Test for ANY error                     
    expect_that( real.roots(0, 2, 3), throws_error() )

    # Test specifically for an error string containing "zero"
    expect_that( real.roots(0, 2, 3), throws_error("zero") )

    # Test specifically for an error string containing "zero" or "Zero" using regular expression
    expect_that( real.roots(0, 2, 3), throws_error("[zZ]ero") )
})

Finally, here are a couple tests that shouldn’t pass.

test_that("Bogus tests", {

    x <- c(1, 2, 3)

    expect_that( length(x), equals(2.7) )
    expect_that( x, is_a("data.frame") )
})

To run the tests, you can run test_dir or test_file. If you are at the R command line and your working directory is the directory containing the two files above, you could run the tests with test_dir("."). In this case we have only one file of test code, but if we had more test files test_dir would find them, provided the file names begin with test.

Singular Value Consulting, LLC

The name of my business is Singular Value Consulting, LLC.

Math people may catch the allusion to singular value decomposition (SVD). I hope that non-math folks will interpret “singular value” to mean something like “singularly valuable.”

One way to think of an SVD is a pair of coordinate systems that give a linear transformation the simplest representation. So metaphorically, SVD is getting to the core of a problem and producing a simple solution.

For some less serious mathematical company names, see this list.