Expert witness experiences

The world of expert testimony seems mysterious from the outside, so I thought some readers would find it interesting to have a glimpse inside.

Kinds of experts

There are two kinds of experts, consulting experts and testifying experts. These names mean what they say: consulting experts consult with their clients, and testifying experts testify. Usually a lawyer will retain an expert with the intention of having this person testify, but the expert starts out as a de facto consulting expert.

Working with lawyers

Working with lawyers is quite pleasant. The division of labor is crystal clear: you are hired to be an expert on some topic, they are the experts in matters of law, and the streams don’t cross. You’re treated with deference and respect. Even if a lawyer knows something about your field of expertise, it’s not their role to opine on it.

I’ve never had a lawyer try to twist my arm. It’s not in their interests to do so. I’ve told lawyers things they were disappointed to hear, but I’ve never had a lawyer argue with me.

Turning down projects

I’ve turned down engagements when it was immediately apparent that the client didn’t have a statistical case. (They may have a legal case, but that’s not my bailiwick.) Sometimes lawyers are grasping at straws, and they may try a statistical argument as a last resort.

One person approached me to do a statistical analysis of one data point. Not to be outdone, someone once asked me to do a statistical analysis based on absolutely no data. I told both that I’d need a little more data to go on.

Nature of the work

John Tukey said that the best part of being a statistician is that you get to play in everyone else’s back yard. I’d expand that to applied math more generally. You can’t be expected to be an expert in everything, but you are expected to come up to speed quickly on the basics of problem domain.

Work on legal cases is confidential, but so is almost everything else I do. However, an intellectual property case I worked on took this to a higher level. I was only allowed to work at opposing counsel’s office, on their laptop, without an internet connection, and without a phone. That was an interesting exercise.

There’s a lot of hurry-up and wait with legal work. A project can be dormant and presumably dead, then suddenly pop back up. This isn’t unique to legal work, but it seems more frequent or more extreme with legal work.


Law firms do everything by the hour. I mostly work by the project, but I’ll work by the hour for lawyers. There are occasional exceptions, but hourly billing is firmly ingrained in legal culture. And reasonably so: it’s hard to say in advance how much work something will take. Sometimes when you can reasonably anticipate the scope of a task you can do it fixed bid.

Law firms typically pass through all expenses. So even if a firm hires you, their client is responsible for paying you. You don’t get paid until the law firm gets paid, which can sometimes take a while.


A few years ago I had to fly around a fair amount. That was fun for a while but it got old. I haven’t had to travel for work since the pandemic and I’m OK with that.

Eliminating terms from higher-order differential equations

This post ties together two earlier posts: the previous post on a change of variable to remove a term from a polynomial, and an older post on a change of variable to remove a term from a differential equation. These are different applications of the same idea.

A linear differential equation can be viewed as a polynomial in the differential operator D applied to the function we’re solving for. More on this idea here. So it makes sense that a technique analogous to the technique used for “depressing” a polynomial could work similarly for differential equations.

In the differential equation post mentioned above, we started with the equation

y'' + p(x) y' + q(x) y = 0

and reduced it to

u'' + r(x) u = 0

using the change of variable

u(x) = \exp\left( \frac{1}{2} \int^x p(t)\,dt\right ) y(x)

So where did this change of variables come from? How might we generalize it to higher-order differential equations?

In the post on depressing a polynomial, we started with a polynomial

p(x) = ax^n + bx^{n-1} + cx^{n-2} + \cdots

and use the change of variables

x = t - \frac{b}{na}

to eliminate the xn-1 term. Let’s do something analogous for differential equations.

Let P be an nth degree polynomial and consider the differential equation

P(D) y = 0

We can turn this into a differential

Q(D) u = 0

where the polynomial

Q(D) = P\left(D - \frac{p}{n}\right)

has no term involving Dn-1 by solving

\left(D - \frac{p}{n}\right) u = D y

which leads to

u(x) = \exp\left( \frac{1}{n} \int^x p(t)\,dt\right ) y(x)

generalizing the result above for second order ODEs.

How to depress a general polynomial

This post showed how to do a change of variables to remove the quadratic term from a cubic equation. Here we will show that the technique works more generally to remove the xn-1 term from an nth degree polynomial.

We will use big-O notation O(xk) to mean terms involving x to powers no higher than k. This is slightly unusual, because typically big-O notation is used when some variable is tending to a limit, and we’re not taking limits here.

Let’s start with an nth degree polynomial

p(x) = ax^n + bx^{n-1} + cx^{n-2} + \cdots

Here a is not zero, or else we wouldn’t have an nth degree polynomial.
The following calculation shows that the change of variables

x = t - \frac{b}{na}

results in an nth degree polynomial in t with no term involving xn – 1.

 \begin{align*} p\left(t - \frac{b}{na}\right) &= a\left(t - \frac{b}{na}\right)^n + b\left(t - \frac{b}{na}\right)^{n-1} + {\cal O}(t^{n-2}) \\ &= a\left(t^n - \frac{b}{a}t^{n-1} + {\cal O}(t^{n-2})\right) + b\left( t^{n-1} + {\cal O}(t^{n-2})\right) + {\cal O}(t^{n-2}) \\ &= a t^n + {\cal O}(t^{n-2}) \end{align*}

Finite fields

This approach works over real or complex numbers. It even works over finite fields too, if you can divide by na.

I’ve mentioned a couple times that the Weierstrass form of an elliptic curve

y^2 = x^3 + a x + b

is the most general except when working over a field of characteristic 2 or 3. The technique above breaks down because 3a may not be invertible in a field of characteristic 2 or 3.

Related posts

How to solve a cubic equation

The process for solving a cubic equation seems like a sequence of mysterious tricks. I’d like to try to make the steps seem a little less mysterious.

Depressed cubic

The previous post showed how to reduce a general cubic equation to one in the form

x^3 + cx + d = 0

which is called a “depressed cubic.” In a nutshell, you divide by the leading coefficient then do a simple change of variables that removes the quadratic term.

Now what? This post will give a motivated but completely ahistorical approach for removing the linear term cx.


Suppose we don’t know how to solve cubic equations. What do we know how to solve? Quadratic equations. So a natural question to ask is how we might find a quadratic equation that has the same roots as our cubic equation. Well, how can you tell in general whether two polynomials have a common root? Resultants.

This is the point where we completely violate historical order. Tartaglia discovered a general solution to depressed cubic equations in the 16th century [1], but Sylvester introduced the resultant in the 19th century. Resultants were a great idea, but not a rabbit out of a hat. It’s not far fetched that some sort of determinant could tell you whether two polynomials have a common factor since this is analogous to two sets of vectors having overlapping spans. I found the idea of using resultants in this context in [2].

Tschirnhaus transformation

In 1683, Tschirnhaus published the transform that in modern terminology amounts to finding a polynomial T(x, y) that has zero resultant with a depressed cubic.

Tschirnhaus assumed his polynomial T has the form

T(x, y) = x^2 + a x + \frac{2}{3} c + y

Let’s take the resultant of our cubic and Tschirnhaus’ quadratic using Mathematica.

    Resultant[x^3 + c x + d, x^2 + a x + 2 c/3 + y, x]

This gives us

y^3 + \frac{1}{27} \left(27 a^2 c+81 a d-9 c^2\right)y + \frac{1}{27} \left(-27 a^3 d+18 a^2 c^2+27 a c d+2 c^3+27 d^2\right)

which is a cubic equation in y. If the coefficient of y were zero, then we could solve the cubic equation for y by simply taking a cube root. But we can make that happen by our choice of a, i.e. we pick a to solve the quadratic equation

27 a^2 c+81 a d-9 c^2 = 0

So we solve this equation for a, plug either root for a into the expression for the resultant, then solve for y. Then we take that value of y and find where Tschirnhaus’ polynomial is zero by solving the quadratic equation

x^2 + a x + \frac{2}{3} c + y = 0

We solved for a value of y that makes the resultant zero, so our original polynomial and Tschirnhaus’ polynomial have a common root. So one of the roots of the equation above is a root of our original cubic equation.


[1] In this blog post, we first reduced the general quadratic to the depressed form, then solved the depressed form. This isn’t the historical order. Tartaglia came up with a general solution to the depressed cubic equation, but was not able to solve equations containing a quadratic term.

[2] Victor Adamchik and David Jeffrey. Polynomial Transformations of Tschirnhaus, Bring and Jerrard. ACM SIGSAM Bulletin, Vol 37, No. 3, September 2003.

How to depress a cubic

The title of this post sounds like the opening to a bad joke: How do you depress a cubic? … Insert your own punch line.

A depressed cubic is a simplified form of a cubic equation. The odd-sounding terminology suggests that this is a very old idea, older than the current connotation of the word depressed. That is indeed the case. According to Etymonline the term depress originally meant “put down by force, conquer” and the psychological use of the of the word came later. To this day you’ll occasionally hear of a button being depressed.

A depressed cubic equation is depressed in the sense that the quadratic term has been removed. Such an equation has the form

x^3 + cx + d = 0

Once you’ve put the equation in depressed form you’ve conquered the quadratic term.

So how do you put a cubic equation in depressed form? First, divide by the leading coefficient, then use the change of variables [1]

x = t - \frac{b}{3}.

For example, suppose we start with the equation

11x^3 + 19x^2 + 20x + 22 = 0.

We first turn this into

x^3 + \frac{19}{11} x^2 + \frac{20}{11}x + 2 = 0

Then we set x = t – 19/33. This gives us

t^3 + \frac{299}{363} t + \frac{47972}{35937} = 0

which has no quadratic term.

We can use Mathematica to show that this works in general:

    Simplify[x^3 + b x^2 + c x + d /. x -> (t - b/3)]

This returns

t^3 + \left(c-\frac{t b^2}{3}\right) t + \left(d + \frac{2 b^3}{27}-\frac{c b}{3}\right)
This post shows that an analogous change of variables works for higher-order polynomials as well.

Application to elliptic curves

If you look into elliptic curves, you’ll often see them defined as a set of points satisfying

y^2 = x^3 + a x + b

Why no quadratic term? Because you can always remove it using the process above. Well, not quite always. The depression trick doesn’t work for elliptic curves over finite fields of characteristic 2 or 3. If you’d like to read more about this exception, see this post.

How to solve a depressed cubic equation

That is the subject of the next post.


[1] You could do the change of variables first, using x = tb/a. This removes the quadratic term, but leaves the leading coefficient a.

New SI prefixes and their etymology

Five months ago I wrote a post speculating about new SI prefixes. I said in this post “There’s no need for more prefixes; this post is just for fun.”

Well, truth is stranger than fiction. There are four new SI prefixes. These were recently approved at the 27th General Conference on Weights and Measures. Here is the resolution (in French).

The new prefixes are:

  • 1030 quetta (Q)
  • 1027 ronna (R)
  • 10-27 ronto (r)
  • 10-30 quecto (q)

The names were the suggestion of Richard J. C. Brown. He gives seven desirable properties of new names:

  1. The names should be simple and, if possible, meaningful and memorable.
  2. The names should have some connection to the powers of 103 that they represent.
  3. The names should be based on either Latin or Greek as the most used languages previously.
  4. Multiples should end ‘-a’ and sub-multiples should end ‘-o’.
  5. The symbols used should be the same letter for a given power of ten, in upper case for multiples and in lower case for sub-multiples.
  6. Letters already in use for SI prefixes, SI units, other common units, or symbols that may otherwise cause confusion, should be avoided.
  7. Following the precedent set recently, letters should be used in reverse English alphabetical order, suitably modifying chosen names, and skipping letters as appropriate.

OK, so how does that lead to the new prefixes? Point #4 explains the last letter of each prefix.

Brown says that the etymology of ronna and ronto is

Greek & Latin, derived from ‘ennea’ and ‘novem’, suggesting 9 (ninth power of 103)

and that the etymology of quetta and quecto is

Latin, derived from ‘decem’, suggesting 10 (tenth power of 103).

That’s quite a stretch.

The largest prefix had been zetta and yotta, so Brown wanted letters that came before Y in the alphabet. P was already used (peta and pico) and the next two unused letters were Q and R. So the prefixes for 1030 and 1027 begin with Q and R.

Presumably ronna uses an O because yotta had an O for the second letter. And the next letter N comes from the N’s in ennea and novem.

It seems quetta used the Q sound because Q was the next letter available, and an allusion to the hard C in decem. The “etta” part is reminiscent of zetta.

Calculating sine to an absurd number of digits

Suppose you wanted to calculate sin(x) to a million decimal places using a Taylor series. How many terms of the series would you need?

You can use trig identities to reduce the problem to finding sin(x) for |x| ≤ 1. Let’s take the worst case and assume we want to calculate sin(1).

The series for sine alternates, and so by the alternating series theorem we need the first term left out of our Taylor approximation to be less than our error tolerate ε. If we keep the terms of the Taylor series up to x2m – 1 then we need

x2m + 1 / (2m + 1)! < ε.

Since we’re interested in x = 1 and Γ(n + 1) = n!, we need

1/ε < Γ(2m + 2).

That means we need

2m + 2 > Γ-1(1/ε).

But how do you compute the inverse of the gamma function? This is something I wrote about a few years ago.

Here is a function for approximately computing the inverse of the gamma function. See the earlier post for details.

    from numpy import pi, e, sqrt, log
    from scipy.special import lambertw

    def inverse_gamma(x):
        c = 0.03653381448490056
        L = log((x+c)/sqrt(2*pi))
        return L / (lambertw(L/e)) + 0.5

Suppose we want to compute sin(1) to 100 decimal places. We need 2m + 2 to be larger than Γ-1(10100), and the code above tells us that Γ-1(10100) is something like 70.9. This tells us we can choose m = 35.

If we want to compute sin(1) to thousands of digits, the code above will fail because we cannot represent 101000 as a floating point number. I will assume for this post that we will use an extended precision library for summing the series for sin(1), but we’re going to use ordinary precision to plan this calculation, i.e. to decide how many terms to sum.

If we look closely at the function inverse_gamma above we see that it only depends on x via log(x + c). Since we’re interested in large x, we can ignore the difference between log(x + c) and log(x). This lets us write a new version of inverse_gamma that takes the log of x rather than x as an argument.

    def inverse_gamma2(logx):
        L = logx - log(sqrt(2*pi))
        return L/lambertw(L/e) + 0.5

Calling inverse_gamma with x = 10100 gives the same result, down to the last decimal place, as calling inverse_gamma2 with 100 log(10).

We asked at the top of the post about computing sine to a million decimal places. If we call inverse_gamma2(1e6*log(10)) we get 205023.17… and so m = 102,511 would be large enough.


If you enjoyed reading this post, you may like reading this post on planning a world record calculation for computing ζ(3).

Simple example of Kleisli composition

Mars Climate Orbiter, artist conception, via NASA

When a program needs to work with different systems of units, it’s best to consistently use one system for all internal calculations and convert to another system for output if necessary. Rigidly following this convention can prevent bugs, such as the one that caused the crash of the Mars Climate Orbiter.

For example, maybe you need to work in degrees and radians. It would be sensible to do all calculations in radians, because that’s what software libraries expect, and output results in degrees, because that’s what humans expect.

Now suppose you have a function that takes in a length and doubles it, and another function takes in a length and triples it. Both functions take in length in kilometers but print the result in miles.

You would like the composition of the two functions to multiply a length by six. And as before, the composition would take in a speed in kilometers and return a speed in miles.

Here’s how we could implement this badly.

    miles_per_km = 5/8 # approx

    def double(length_km): 
        return 2*length_km*miles_per_km

    def triple(length_km): 
        return 3*length_km*miles_per_km

    length_km = 8
    d = double(length_km)
    print("Double: ", d)
    t = triple(d)
    print("Triple: ", t)

This prints

    Double: 10.0
    Triple: 18.75

The second output should be 30, not 18.5. The result is wrong because we converted from kilometers to miles twice. The correct implementation would be something like the following.

    miles_per_km = 0.6213712

    def double(length_km): 
        d = 2*length_km
        print("Double: ", d*miles_per_km)
        return d

    def triple(length_km): 
        t = 3*length_km
        print("Triple: ", t*miles_per_km)
        return t

    length_km = 8
    d = double(length_km)
    t = triple(d)

This prints the right result.

    Double: 10.0 
    Triple: 30.0

In abstract terms, we don’t want the composition of f and g to be simply gf.

We have a function f from X to Y that we think of as our core function, and a function T that translates the output. Say f doubles its input and T translates from kilometers to miles. Let f* be the function that takes X to TY, i.e. the combination of f and translation.

Now take another function g from Y to Z and define g* as the function that takes Y to TZ. We want the composition of f* and g* to be

g* ∘ f* = T ∘ g ∘ f.

In the example above, we only want to convert from kilometers to miles once. This is exactly what Kleisli composition does. (“Kleisli” rhymes with “highly.”)

Kleisli composition is conceptually simple. Once you understand what it is, you can probably think of times when it’s what you wanted but you didn’t have a name for it.

Writing code to encapsulate Kleisli composition takes some infrastructure (i.e. monads), and that’s a little complicated, but the idea of what you’re trying to achieve is not. Notice in the example above, what the functions print is not what they return; the print statements are a sort of side channel. That’s the mark of a monad.

Kleisli categories

The things we’ve been talking about are formalized in terms of Kleisli categories. You start with a category C and define another category that has the same objects as C does but has a different notion of composition, i.e. Kleisli composition.

Given a monad T on C, the Kleisli category CT has the same objects as C. An arrow f* from X to Y in CT corresponds to an arrow f from X to TY in C. In symbols,

HomCT(X, Y) = HomC(X, TY).

Mr. Kleisli’s motivation for defining his categories was to answer a more theoretical question—whether all monads arise from adjunctions—but more practically we can think of Kleisli categories as a way of formalizing a variation on function composition.

Related posts

Getting pulled back in

“Just when I thought I was out, they pull me back in.” — Michael Corleone, The Godfather, Part 3

My interest in category theory goes in cycles. Something will spark my interest in it, and I’ll dig a little further. Then I reach my abstraction tolerance and put it back on the shelf. Then sometime later something else comes up and the cycle repeats. Each time I get a little further.

A conversation with a client this morning brought me back to the top of the cycle: category theory may be helpful in solving a concrete problem they’re working on.

I’m skeptical of applied category theory that starts with categories. I’m more bullish on applications that start from the problem domain, a discussion something like this.

“Here’s a pattern that we’re trying to codify and exploit.”

“Category theory has a name for that, and it suggests you might also have this other pattern or constraint.”

“Hmm. That sounds plausible. Let me check.”

I think of category theory as a pattern description language, a way to turn vague analogies into precise statements. Starting from category theory and looking for applications is less likely to succeed.

When I left academia the first time, I got a job as a programmer. My first assignment was to make some change to an old Fortran program, and I started asking a lot of questions about context. My manager cut me off saying “You’ll never get here from there.” I had to work bottom-up, starting from the immediate problem. That lesson has stuck with me ever since.

Sometimes you do need to start from the top and work your way down, going from abstract to concrete, but less often that I imagined early in my career.

Exploring bad passwords

If your password is in the file rockyou.txt then it’s a bad password. Password cracking software will find it instantly. (Use long, randomly generated passwords; staying off the list of worst passwords is necessary but not sufficient for security.)

The rockyou.txt file currently contains 14,344,394 bad passwords. I poked around in the file and this post reports some things I found.

To make things more interesting, I made myself a rule that I could only use command line utilities.

Pure numeric passwords

I was curious how many of these passwords consisted only of digits so I ran the following.

    grep -P '^\d+$' rockyou.txt | wc -l

This says 2,346,744 of the passwords only contain digits, about 1 in 6.

Digit distribution

I made a file of digits appearing in the passwords

    grep -o -P '\d' rockyou.txt > digits

and looked at the frequency of digits.

    for i in 0 1 2 3 4 5 6 7 8 9
        grep -c $i digits

This is what I got:


The digits are distributed more evenly than I would have expected. 1’s are more common than other digits, but only about twice as common as the least common digits.

Longest bad passwords

How long is the longest bad password? The command

    wc -L rockyou.txt

shows that one line in the file is 285 characters long. What is this password? The command

    grep -P '.{285}' rockyou.txt

shows that it’s some HTML code. Nice try whoever thought of that, but you’ve been pwned.

A similar search for all-digit passwords show that the longest numeric passwords are 255 digits long. One of these is a string of 255 zeros.

Dictionary words

A common bit of advice is to not choose passwords that can be found in a database. That’s good advice as far as it goes, but it doesn’t go very far.

I used the comm utility to see how many bad passwords are not in the dictionary by running

    comm -23 sorted dict | wc -l

and the answer was 14,310,684. Nearly all the bad passwords are not in a dictionary!

(Here sorted is a sorted version of the rockyou.txt file; I believe the file is initially sorted by popularity, worst passwords first. The comm utility complained that my system dictionary isn’t sorted, which I found odd, but I sorted it to make comm happy and dict is the sorted file.)

Curiously, the command

    comm -13 sorted dict | wc -l

shows there are 70,624 words in the dictionary (specifically, the american-english file on my Linux box) that are not on the bad password list.

Smallest ‘good’ numeric password

What is the smallest number not in the list of pure numeric passwords? The following command strips leading zeros from purely numeric passwords, sorts the results as numbers, removes duplicates, and stores the results in a file called nums.

    grep -P '^\d+$' rockyou.txt | sed 's/^0\+//' | sort -n | uniq > nums

The file nums begins with a blank. I removed this with sed.

    sed -i 1d nums

Next I used awk to print instances where the line number does not match the line in the file nums.

    awk '{if (NR-$0 < 0) print $0 }' nums | less

The first number this prints is 61. This means that the first line is 1, the second line is 2, and so on, but the 60th line is 61. That means 60 is missing. The file rockyou.txt does not contain 60. You can verify this: the command

    grep '^60$' rockyou.txt

returns nothing. 60 is the smallest number not in the bad password file. There are passwords that contain ’60’ as a substring, but just 60 as a complete password is not in the file.

Related posts