Log-ish

I saw a post online this morning that recomended the transformation

f(x) = \left\{ \begin{array}{ll} \log(1+x) & \mbox{if } x > 0 \\ x & \mbox{if } x \leq 0 \end{array} \right.

I could see how this could be very handy. Often you want something like a logarithmic scale, not for the exact properties of the logarithm but because it brings big numbers closer in. And for big values of x there’s little difference between log(x) and log(1 + x).

The function above is linear near the origin, literally linear for negative values and approximately linear for small positive values.

I’ve occasionally needed something like a log scale, but one that would handle values that dip below zero. This transformation would be good for that. If data were equally far above and below zero, I’d use something like arctangent instead.

Uniformity increases entropy

Suppose you have a system with n possible states. The entropy of the system is maximized when all states are equally likely to occur. The entropy is minimized when one outcome is certain to occur.

You can say more. Starting from any set of probabilities, as you move in the direction of more uniformity, you increase entropy. And as you move in the direction of less uniformity, you decrease entropy.

These statements can be quantified and stated more precisely. That’s what the rest of this post will do.

***

Let pi be the probability of the ith state and let p be the vector of the pi.

\mathbf{p} = (p_1, p_2, \ldots, p_n)

Then the entropy of p is defined as

H(\mathbf{p}) = -\sum_{i=1}^n p_i \log_2 p_i

If one of the p‘s is 1 and the rest of the p‘s are zero, then H(p) = 0. (In the definition of entropy, 0 log2 0 is taken to be 0. You could justify this as the limit of x log2 x as x goes to zero.)

If each of the pi are equal, pi = 1/n, then H(p) = log2 n. The fact that this is the maximum entropy, and that compromises between the two extremes always decrease entropy, comes from the fact that the entropy function H is concave (proof). That is, if p1 is one list of probabilities and p2 another, then

H((1-\lambda) \mathbf{p}_1 + \lambda\mathbf{p}_2) \geq (1 - \lambda) H( \mathbf{p}_1) + \lambda H(\mathbf{p}_2)

When we speak informally of moving from p1 in the direction of p2, we mean we increase the parameter λ from 0 to some positive amount no more than 1.

Because entropy is concave, there are no local maxima. As you approach the location of global maximum entropy, i.e. equal state probabilities, from any direction, entropy increases monotonically.

Sinc function approximation

The sinc function

sinc(x) = sin(x) / x

comes up continually in signal processing. If x is moderately small, the approximation

sinc(x) ≈ (2 + cos(x))/3

is remarkably good, with an error on the order of x4/180. This could be useful in situations where you’re working with the sinc function and the x in the denominator is awkward to deal with and you’d rather have a pure trig function.

Here’s a plot:

Of course the approximation is only good for small x. For large x the sinc function approaches zero while (2 + cos(x))/3 oscillates with constant amplitude forever.

When the approximation is good, it is very, very good, which reminds me of this nursery rhyme.

There was a little girl,
Who had a little curl,
Right in the middle of her forehead.
When she was good,
She was very, very good,
But when she was bad, she was horrid.

More sinc posts

Arithmetic for fun and profit

Four years ago I wrote a blog post about simple solutions to client problems. The post opens by recounting a conversation with a friend that ended with my friend saying “So, basically you’re recommending division.”

That conversation came back to me recently when I had a similar conversation during a deposition. I had to explain that the highly sophisticated method I used to arrive at a number in my report was to take the ratio of two other numbers.

I don’t always do division for clients. Sometimes I do multiplication, as when a lawyer asked me to compute the momentum of a car. There I had to multiply mass and velocity. (I actually turned down that project because I had a sense there was a lot more to the problem than I was being told.)

Of course real-world problems do not always have trivial solutions. Sometimes getting to the solution is complicated even though the solution itself is simple. And sometimes the solution is complicated because that’s just how things are. You might need something complicated, like a cepstrum analysis, or you might just need arithmetic.

Why use hash puzzles for proof-of-work?

A couple days ago I wrote about the the problem that Bitcoin requires to be solved as proof-of-work. In a nutshell, you need to tweak a block of transactions until the SHA256 double hash of its header is below a target value [1]. Not all cryptocurrencies use proof of work, but those that do mostly use hash-based puzzles.

Other cryptocurrencies use a different hashing problem, but they still use hashes. Litecoin and Dogecoin use the same proof-of-work problem, similar to the one Bitcoin uses, but with the scrypt (pronounced S-crypt) hash function. Several cryptocurrencies use a hashing problem based on Equihash. Monero uses its RandomX algorithm for proof-of-work, and although this algorithm has multiple components, it ultimately solves a hashing problem. [2]

Why hash puzzles?

Why do cryptocurrencies use hashing problems for proof of work? In principle they could use any computational problem that is hard to solve but easy to verify, such as numerous problems in computational number theory.

One reason is that computer scientists are confident that quantum computing would not reduce the difficulty of solving hash puzzles, even though it would reduce the difficulty of factoring-based puzzles. Also, there is general agreement that it’s unlikely a mathematical breakthrough will find a weakness in hashing functions.

Ian Cassels said “Cryptography is a mixture of mathematics and muddle, and without the muddle the mathematics can be used against you.” Hashing is much more muddle than mathematics.

Why not do something useful?

Hash puzzles work well for demonstrating work done, but they’re otherwise useless. They keep the wheels of cryptocurrencies turning, but the solutions themselves are intrinsically worthless.

Wouldn’t it be nice if crypto miners were solving useful problems like protein folding? You could do that. In fact there is a cryptocurrency FoldingCoin that does just that. But FoldingCoin has a market cap seven orders of magnitude smaller than Bitcoin, on the order of $200,000 compared to Bitcoin’s market cap of $2T.

Cryptocurrencies that use proof of useful work have not taken off. This might necessarily be the case. Requiring practical work creates divergent incentives. If you base a currency on the difficulty of protein folding computations, for example, it would cause major disruption if a pharmaceutical company decided to get into mining protein folding cryptocurrency at a loss because it values the results.

Going back to Cassels’ remark about mathematics and muddle, practical real-world problems often have a mathematical structure. Which is a huge blessing, except when you’re designing problems to be hard. Hash-based problems have gradually become easier to solve over time, and cryptocurrencies have adjusted. But a mathematical breakthrough for solving a practical problem would have the potential to disrupt a currency faster than the market could adapt.

Related posts

[1] You don’t change the transaction amounts, but you may change the order in which the transactions are arranged into a Merkle tree so that you get different hash values. You can also change a 32-bit nonce, and a timestamp, but most of the degrees of freedom you need in order to find an acceptable hash comes from rearranging the tree.

[2] Both scrypt and Equihash were designed to be memory intensive and to thwart the advantage custom ASIC mining hardware. However, people have found a way to use ASIC hardware to solve scrypt and Equihash problems. RandomX requires running a randomly generated problem before hashing the output in an attempt to frustrate efforts to develop specialized mining hardware.

Minimize squared relative error

Suppose you have a list of positive data points y1, y2, …, yn and you wanted to find a value α that minimizes the squared distances to each of the y‘s.

\sum_{i=1}^n (y_i - \alpha)^2

Then the solution is to take α to be the mean of the y‘s:

\alpha = \frac{1}{n} \sum_{i=1}^n y_i

This result is well known [1]. The following variation is not well known.

Suppose now that you want to choose α to minimize the squared relative distances to each of the y‘s. That is, you want to minimize the following.

\sum_{i=1}^n \left( \frac{y_i - \alpha}{\alpha} \right)^2

The value of alpha this expression is the contraharmonic mean of the y‘s [2].

\alpha = \frac{\sum_{i=1}^n y_i^2}{\sum_{i=1}^n y_i}

Related posts

[1] Aristotle says in the Nichomachean Ethics “The mean is in a sense an extreme.” This is literally true: the mean minimizes the sum of the squared errors.

[2] E. F. Beckenbach. A Class of Mean Value Functions. The American Mathematical Monthly. Vol. 57, No. 1 (Jan., 1950), pp. 1–6

What is the Bitcoin proof-of-work problem?

In order to prevent fraud, anyone wanting to add a block to the Bitcoin blockchain must prove that they’ve put in a certain amount of computational work. This post will focus on what problem must be solved in order produce proof of work.

You’ll see the proof of work function described as finding strings whose SHA256 hash value begins with a specified number of 0s. That’s sort of a zeroth-level approximation of the problem.

The string s you’re trying to find has the form data + nonce where the data comes from the block you’re wanting to add and the nonce is a value you concatenate on the end. You try different values until you get an acceptable hash value.

You’re not computing the SHA256(s) but rather the double hash:

SHA256²(s) = SHA256( (SHA256(s) )

The only way to find such a string s is by brute force [1], and applying the hash function twice doubles the amount of brute force work needed.

And you’re not exactly trying to produce leading zeros; you’re trying to produce a value less than a target T. This is roughly the same thing, but not quite.

To illustrate this, suppose you have a 2FA fob that generates six-digit random numbers. You’ve been asked to wait until your fob generates a number less than 2025. Waiting until you have three leading zeros would be sufficient, but that would be making the task harder than it needs to be. You’d be waiting for a number less than 1000 when you’re only asked to wait for a number less than 2025.

A SHA256 hash value is a 256-bit number. If your target T is a power of 2

T = 2256−n

then finding a value of s such that

SHA256²(s) < T

really is finding an s whose (double) hash begins with n zeros, though T is not required to be a power of 2.

Finding a value of s with

SHA256²(s) < 2256−n

will require, on average, testing 2n values of s.

The value of T is adjusted over time in order to keep the amount of necessary work roughly constant. As miners have become more efficient, the value of T has gone down and the amount of work has gone up. But the value of T can go up as well. It is currently fluctuating around 2176, i.e. hashes must have around 80 leading zero bits.

Now here’s where things get a little more complicated. I said at the beginning that the string s has the form

s = data + nonce

where the data comes from the block you’re trying to add and the nonce is a number you twiddle in order to get the desired hash value. But the nonce is a 32-bit integer. If you need to hash on the order of 280 strings in order to find one with 80 leading zeros, you can’t do that just by adjusting a 32-bit nonce.

In practice you’re going to have to twiddle the contents of what I’ve called data. The data contains a Merkle tree of transactions, and you can change the hash values by adjusting the order in which transactions are arranged in the tree, in addition to adjusting the nonce.

Related posts

[1] Unless someone finds a flaw in SHA256, which cryptographers have been trying to do for years and have not been able to do. And if a significant weakness is found in SHA256, it may not translate into a significant flaw in SHA256².

Deleting vs Replacing Names

This post looks at whether you should delete names or replace names when deidentifying personal data. With structured data, generating synthetic names does not increase or decrease privacy. But with unstructured data, replacing real names with randomly generated names increases privacy protection.

Structured data

If you want to deidentify structured data (i.e. data separated into columns in a database) then an obvious first step is to remove a column containing names. This is not sufficient—the reason deidentification is subtle is that it is possible to identify people after the obvious identifiers have been removed—but clearly removing names is necessary.

Instead of removing the names, you could replace them with randomly generated names. This neither hurts nor helps privacy. But this might be useful, say, when creating test data.

Say you’re developing software that handles patient data. You want to use real patient data because you want the test data to be realistic, but that may not be acceptable (or legal) due to privacy risks. You can’t just remove names because you need to test whether software displays names correctly, so you replace real names with randomly generated names. No harm, no foul.

Unstructured data

Now let’s switch contexts and look at unstructured data, such as text transcriptions of doctors’ notes. You still need to remove names, but it’s not as simple as removing the name column. Instead of the database structure telling you what’s a name, what’s a diagnosis, etc. you have to infer these things statistically [1], which means there will always be errors [2].

When working with unstructured text, replacing names with randomly generated names is better for privacy. If you could identify names with 100% accuracy, then it would make no difference whether you removed or replaced names, but you can’t. Software will always miss some names [3].

Suppose your software replaces suspected names with “NAME REDACTED.” When your software fails to identify a name, it’s obvious that it failed. For example,

NAME REDACTED and his wife Margaret came into the clinic today to be tested for …

But if instead your software replaced suspected names with synthetic names, it is not obvious when the software fails. When you see a reference to Margaret, you can’t know whether the patient’s name was Virginia and was replaced with Margaret or whether the software made an error skipped over Margaret’s name.

All else being equal, it’s better to synthesize names than remove names. But that doesn’t mean that just any process for synthesizing names will be adequate. The error rate doesn’t need to be zero, but it can’t be too high either. And the process should not be biased. If the software consistently left Hispanic names slip through, for example, Hispanic patients would not appreciate that.

Related posts

[1] “But what if I use a large language model?” That’s a particular kind of statistical inference.

[2] Any statistical test, such as testing whether a string of text represents a name, will have false positives (type I error) and false negatives (type II error). Software to remove names will remove some text that isn’t a name, and fail to recognize some text as names.

[3] We’ve tested a lot of deidentification software packages for clients, and the error rate is never zero. But privacy regulations such as HIPAA don’t require the error rate to be zero, only sufficiently small.

Typesetting Sha and Bitcoin

I went down a rabbit hole this week using two symbols in LaTeX. The first was the Russian letter Sha (Ш, U+0248), and the second was the currency symbol for Bitcoin (₿, U+20BF).

Sha

I thought there would be a LaTeX package that would include Ш as a symbol rather than as a Russian letter, just as \pi produces π as a symbol rather than as a Greek letter per se, but apparently there isn’t. I was surprised, since Ш is used in math for at least three different things [1].

When I post on @TeXtip how to produce various symbols in LaTeX, I often get a reply telling me I should simply paste in the Unicode character and use XeTeX. That’s what I ended up doing, except one does not simply use XeTeX.

You have to set the font to one that contains a glyph for the character you want, and you have to use a font encoding package. I ended up adding these two lines to my file header:

    \usepackage[T2A]{fontenc}
    \usepackage{eulervm}

That worked, but only when I compiled with pdflatex, not xelatex.

Bitcoin

I ended up using a different but analogous tactic for the Bitcoin symbol. I used fontspec, Liberation Sans, and xelatex rather than fontenc, Euler, and pdflatex. These were the lines I added to the header:

    \usepackage{fontspec}
    \setmainfont{Liberation Sans}

Without these two lines I get the error message

    Missing character: There is no ₿ (U+20BF) in font ...

I didn’t need to use ₿ and Ш in the same document, but the approach in this section works for both. The approach in the previous section will not work for ₿ because the Euler font does not contain a gylph for ₿.

Related posts

[1] The three mathematical uses of Ш that I’m aware of are the shuffle product, the Dirac comb distribution, and Tate–Shafarevich group.

Golden powers revisited

Years ago I wrote a post Golden powers are nearly integers. The post was picked up by Hacker News and got a lot of traffic. The post was commenting on a line from Terry Tao:

The powers φ, φ2, φ3, … of the golden ratio lie unexpectedly close to integers: for instance, φ11 = 199.005… is unusually close to 199.

In the process of writing my recent post on base-φ numbers I came across some equations that explain exactly why golden powers are nearly integers.

Let φ be the golden ratio and ψ = −1/φ. That is, φ and ψ are the larger and smaller roots of

x² − x − 1 = 0.

Then powers of φ reduce to an integer and an integer multiple of φ. This is true for negative powers of φ as well, and so powers of ψ also reduce to an integer and an integer multiple of ψ. And in fact, the integers alluded to are Fibonacci numbers.

φn = Fn φ + Fn − 1
ψn = Fn ψ + Fn − 1

These equations can be found in TAOCP 1.2.8 exercise 11.

Adding the two equations leads to [1]

φn = Fn + 1 + Fn − 1 − ψn

So yes, φn is nearly an integer. In fact, it’s nearly the sum of the (n + 1)st and (n − 1)st Fibonacci numbers. The error in this approximation is −ψn, and so the error decreases exponentially with alternating signs.

Related posts

[1] φn + ψn = Fn (φ + ψ) + 2 Fn − 1 = Fn + 2 Fn − 1 = Fn + Fn − 1 + Fn − 1 = Fn + 1 + Fn − 1