Why use hash puzzles for proof-of-work?

Posted on 22 June 2025 by John

A couple days ago I wrote about the the problem that Bitcoin requires to be solved as proof-of-work. In a nutshell, you need to tweak a block of transactions until the SHA256 double hash of its header is below a target value [1]. Not all cryptocurrencies use proof of work, but those that do mostly use hash-based puzzles.

Other cryptocurrencies use a different hashing problem, but they still use hashes. Litecoin and Dogecoin use the same proof-of-work problem, similar to the one Bitcoin uses, but with the scrypt (pronounced S-crypt) hash function. Several cryptocurrencies use a hashing problem based on Equihash. Monero uses its RandomX algorithm for proof-of-work, and although this algorithm has multiple components, it ultimately solves a hashing problem. [2]

Why hash puzzles?

Why do cryptocurrencies use hashing problems for proof of work? In principle they could use any computational problem that is hard to solve but easy to verify, such as numerous problems in computational number theory.

One reason is that computer scientists are confident that quantum computing would not reduce the difficulty of solving hash puzzles, even though it would reduce the difficulty of factoring-based puzzles. Also, there is general agreement that it’s unlikely a mathematical breakthrough will find a weakness in hashing functions.

Ian Cassels said “Cryptography is a mixture of mathematics and muddle, and without the muddle the mathematics can be used against you.” Hashing is much more muddle than mathematics.

Why not do something useful?

Hash puzzles work well for demonstrating work done, but they’re otherwise useless. They keep the wheels of cryptocurrencies turning, but the solutions themselves are intrinsically worthless.

Wouldn’t it be nice if crypto miners were solving useful problems like protein folding? You could do that. In fact there is a cryptocurrency FoldingCoin that does just that. But FoldingCoin has a market cap seven orders of magnitude smaller than Bitcoin, on the order of $200,000 compared to Bitcoin’s market cap of $2T.

Cryptocurrencies that use proof of useful work have not taken off. This might necessarily be the case. Requiring practical work creates divergent incentives. If you base a currency on the difficulty of protein folding computations, for example, it would cause major disruption if a pharmaceutical company decided to get into mining protein folding cryptocurrency at a loss because it values the results.

Going back to Cassels’ remark about mathematics and muddle, practical real-world problems often have a mathematical structure. Which is a huge blessing, except when you’re designing problems to be hard. Hash-based problems have gradually become easier to solve over time, and cryptocurrencies have adjusted. But a mathematical breakthrough for solving a practical problem would have the potential to disrupt a currency faster than the market could adapt.

[1] You don’t change the transaction amounts, but you may change the order in which the transactions are arranged into a Merkle tree so that you get different hash values. You can also change a 32-bit nonce, and a timestamp, but most of the degrees of freedom you need in order to find an acceptable hash comes from rearranging the tree.

[2] Both scrypt and Equihash were designed to be memory intensive and to thwart the advantage custom ASIC mining hardware. However, people have found a way to use ASIC hardware to solve scrypt and Equihash problems. RandomX requires running a randomly generated problem before hashing the output in an attempt to frustrate efforts to develop specialized mining hardware.

Inequalities for inequality: Gini coefficient lower bounds

Posted on 20 October 2022 by John

The Gini coefficient, a.k.a. Gini index, of a set of numbers is the average of all differences divided by twice the mean. Specifically, let

${\cal X} = \{x_1, x_2, x_3, \ldots, x_n\}$

Then the Gini coefficient of x is defined to be

$G({\cal X}) = \frac{1}{2\mu n^2} \sum_{i=1}^n \sum_{j=1}^n |x_i − x_j|$
where μ is the mean of the set. The Gini coefficient is often used in economics to measure inequalities in wealth.

Now suppose the data is divided into r disjoint groups:

${\cal X} = \bigcup_{i = 1}^r {\cal X}_i$

We would like to estimate the Gini coefficient of the entire group from Gini coefficients of each subgroup. This individual Gini coefficients alone are not enough data for the task, but if we also know the size and sum of each subgroup, we can compute lower bounds on G. The paper [1] gives five such lower bounds.

We will present the five lower bounds and see how well each does in a simulation.

Zagier’s lower bounds

Here are Zagier’s five lower bounds, listed in Theorem 1 of [1].

$\begin{align*} G({\cal X}) &\geq \sum_{i=1}^r \frac{n_i}{n} G({\cal X}_i) \\ G({\cal X}) &\geq \sum_{i=1}^r \frac{X_i}{X} G({\cal X}_i) \\ G({\cal X}) &\ge \left(\sum_{i=1}^r \sqrt{\frac{n_i}{n} \frac{X_i}{X} G({\cal X}_i)} \right)^2 \\ G({\cal X}) &\geq 1 - \left(\sum_{i=1}^r \sqrt{\frac{n_i}{n} \frac{X_i}{X} (1- G({\cal X}_i))} \right)^2 \\ G({\cal X}) &\geq G_0 = \sum_{i=1}^r \frac{n_i}{n} \frac{X_i}{X} G({\cal X}_i) \end{align*}$

Here n_i is the size of the ith subgroup and X_i is the sum of the elements in the ith subgroup. Also, n is the sum of the n_i and X is the sum of the X_i.

G₀ is the Gini coefficient we would get if we replaced each subgroup with its mean, eliminating all variance within subgroups.

Simulation

I drew 102 samples from a uniform random variable and computed the Gini coefficient with

    def gini(x):
        n = len(x)
        mu = sum(x)/n    
        s = sum(abs(a-b) for a in x for b in x)
        return s/(2*mu*n**2)

I split the sample evenly into three subgroups. I then sorted the list of samples and divided into three even groups again.

The Gini coefficient of the entire data set was 0.3207. The Gini coefficients of the three subgroups were 0.3013, 0.2798, and 0.36033. When I divided the sorted data into three groups, the Gini coefficients were 0.3060, 0.0937, and 0.0502. The variation in each group is the same, but the smallest group has a smaller mean and thus a larger Gini coefficient.

When I tested Zagier’s lower bounds on the three unsorted partitions, I got estimates of

[0.3138, 0.3105, 0.3102, 0.3149, 0.1639]

for the five estimators.

When I repeated this exercise with the sorted groups, I got

[0.1499, 0.0935, 0.0933, 0.1937, 0.3207]

The bounds for the first four estimates were much better for the unsorted partition, but the last estimate was better for the sorted partition.

Economics, power laws, and hacking

Posted on 26 January 2019 by John

Increasing costs impact some players more than others. Those who know about power laws and know how to prioritize are impacted less than those who naively believe everything is equally important.

This post will look at economics and power laws in the context of password cracking. Increasing the cost of verifying a password does not impact attackers proportionately because the attackers have a power law on their side.

Key stretching

In an earlier post I explained how key stretching increases the time required to verify a password. The idea is that if authentication systems can afford to spend more computation time verifying a password, they can force attackers to spend more time as well.

The stretching algorithm increases the time required to test a single salt and password combination by a factor of r where r is a parameter in the algorithm. But increasing the amount of work per instance by a factor of r does not decrease the number of users an attacker can compromise by a factor of r.

Power laws

If an attacker is going through a list of compromised passwords, the number of user passwords recovered for a fixed amount of computing power decreases by less than r because passwords follow a power law distribution [1]. The nth most common password appears with frequency proportional to n^−s for some constant s. Studies suggest s is between 0.5 and 0.9. Let’s say s = 0.7.

Suppose an attacker has a sorted list of common passwords. He starts with the most common password and tries it for each user. Then he tries the next most common password, and keeps going until he runs out of resources. If you increase the difficulty of testing each password by r, the attacker will have to stop after trying fewer passwords.

Example

Suppose the attacker had time to test the 1000 most common passwords, but then you set r to 1000, increasing the resources required for each individual test by 1000. Now the attacker can only test the most common password. The most common password accounts for about 1/24 of the users of the 1000 most common passwords. So the number of passwords the attacker can test went down by 1000, but the number of users effected only went down by 24.

It would take 1000 times longer if the attacker still tested the same number of passwords. But the attacker is hitting diminishing return after the first password tested. By testing passwords in order of popularity, his return on resources does not need to go down by a factor of 1000. In this example it only went down by a factor of 24.

How much the hacker’s ROI decreases depends on several factors. It never decreases by more than r, and often it decreases less.

[1] Any mention of power laws brings out two opposite reactions, one enthusiastic and one pedantic. The naive reaction is “Cool! Another example of how power laws are everywhere!” The pedantic reaction is “Few things actually follow a power law. When people claim that data follow a power law distribution the data usually follow a slightly different distribution.”

The pedantic reaction is technically correct but not very helpful. The enthusiastic reaction is correct with a little nuance: many things approximately follow a power law distribution over a significant range. Nothing follows a power law distribution exactly and everywhere, but the same is true of any probability distribution.

It doesn’t matter for this post whether data exactly follow a power law. This post is just a sort of back-of-the-envelope calculation. The power law distribution has some empirical basis in this context, and it’s a convenient form for calculations.

Window dressing for ideological biases

Posted on 19 June 2012 by John

From Russ Roberts on the latest EconTalk podcast:

… this is really embarrassing as a professional economist — but I’ve come to believe that there may be no examples … of where a sophisticated multivariate econometric analysis … where important policy issues are at stake, has led to a consensus. Where somebody says: Well, I guess I was wrong. Where somebody on the other side of the issue says: Your analysis, you’ve got a significant coefficient there — I’m wrong. No. They always say: You left this out, you left that out. And they’re right, of course. And then they can redo the analysis and show that in fact — and so what that means is that the tools, instead of leading to certainty and improved knowledge about the usefulness of policy interventions, are merely window dressing for ideological biases that are pre-existing in the case of the researchers.

Emphasis added.

Related post: Numerous studies have confirmed …

Square root of people

Posted on 10 April 2012 by John

How do you infer the economic well-being of individuals from household income? At one extreme, you could just divide household income by the number of people in the household. This is naive because there are some economies of scale. It doesn’t take twice as much energy to heat a house for two people as a house for one person, for example.

The other extreme is the two-can-live-as-cheaply-as-one philosophy. All that matters is household income; the number of people in the house is irrelevant. That’s too simplistic because while there are fixed costs to running a home, there are marginal costs per person.

So if you have a household of n people do you divide by the total income by n or by 1? Said another way, do you divide by n¹ or n⁰? Some economists split the difference and use an exponent of 0.5, i.e. divide by the square root of n. This would say, for example, that the members of a family of four with a total income of $100,000 have roughly the same standard of living as a single person with an income of $50,000.

Dividing by √n is a crude solution. For one thing, it makes no distinction between adults and children. But for something so simple, it makes some sense. It makes more sense than dividing by n or not taking n into account.

Source: Burkhauser on the Middle Class

Convention versus compulsion

Posted on 24 December 2011 by John

An alternate title for this post could be “Software engineering wisdom from a lecture on economics given in 1945.”

F. A. Hayek gave a lecture on December 17, 1945 entitled “Individualism: True and False.” A transcript of the talk is published in his book Individualism and Economic Order. In this talk Hayek argues that societies must decide between convention and compulsion as means to coordinate activity. The former is preferable, in part because it is more flexible. Individualism depends on

… traditions and conventions which evolve in a free society and which, without being enforceable, establish flexible but normally observed rules that make behavior of other people predictable in a high degree.

Of course Hayek wasn’t thinking of software development, but his comments certainly are applicable to software development. Software engineers are fond of flexibility, but suspicious of rules that cannot be enforced by a machine. And yet there are some kinds of flexibility that require traditions and conventions rather than enforceable rules. Hayek looks beyond the letter of the law to the spirit: the purpose of rules in software engineering is to make the behavior of software (and software engineers) “predictable in a high degree.”

I’ve written a couple blog posts on this theme. One was Software architecture as a function of trust:

If you trust that your developers are highly competent and self-disciplined, you’ll organize your software differently than if you assume developers have mediocre skill and discipline. One way this shows up is the extent that you’re willing to rely on convention to maintain order. … In general, I see more reliance on convention in open source projects than in enterprise projects.

Another was a post on the architecture of Emacs:

In short, Emacs expects developers to be self-disciplined and does not enforce a great deal of external discipline. However, because the software is so light on bureaucracy, it is easy to customize and to contribute to.

The quotation from Hayek above continues:

The willingness to submit to such rules, not merely so long as one understands the reason for them but so long as one has no definite reason to the contrary, is an essential condition for the gradual evolution and improvement of rules of social intercourse … an indispensable condition if it is to be possible to dispense with compulsion.

Imagine a rookie programmer who joins a new team and only follows those conventions he fully understands. That’s not much better than the rookie doing whatever he pleases. The real benefit comes from his following the conventions he doesn’t yet understand (provided he “has no definite reason to the contrary”) because these distill the ideas of more experienced developers.

It takes time to pass on a set of traditions and conventions, especially to convey the rationale behind them. Machine-enforceable rules are a shortcut to establishing a culture.

Every project will be somewhere along a continuum between total reliance on convention and total reliance on rules a computer can check. Emacs is pretty far toward the conventional end of the spectrum, and enterprise Java projects are near the opposite end. If you want to move away from the compulsion end of the spectrum, you need more emphasis on convention.

Related post: Style and understanding

The worst crisis Greece has ever known

Posted on 7 November 2011 by John

A couple tweets from Dan Snow regarding Greece:

BBC reporter: ‘This could be the worst crisis Greece has ever known’. There speaks a man without a history degree.

Greece has been ravaged by Persian Immortals, Roman legionaries, Huns, Janissaries, Russian cossacks, Nazi stormtroopers. She’s seen worse.

Collecting versus understanding

Posted on 11 October 2011 by John

“The year 2000 was essentially the point at which it became cheaper to collect information than to understand it.” — Freeman Dyson

How much time do scientists spend chasing grants?

Posted on 25 April 2011 by John

Computer scientist Matt Welsh said that one reason he left Harvard for Google was that he was spending 40% of his time chasing grants. At Google, he devotes all his time to doing computer science. Here’s how he describes it in his blog post The Secret Lives of Professors:

The biggest surprise is how much time I have to spend getting funding for my research. Although it varies a lot, I guess that I spent about 40% of my time chasing after funding, either directly (writing grant proposals) or indirectly (visiting companies, giving talks, building relationships). It is a huge investment of time that does not always contribute directly to your research agenda—just something you have to do to keep the wheels turning.

According to this Scientific American editorial, 40% is typical.

Most scientists finance their laboratories (and often even their own salaries) by applying to government agencies and private foundations for grants. The process has become a major time sink. In 2007 a U.S. government study found that university faculty members spend about 40 percent of their research time navigating the bureaucratic labyrinth, and the situation is no better in Europe.

Not only do scientists on average spend a large amount of time pursuing grants, they tend to spend more time on grants as their career advances. (This has an element of tautology: you advance your career in part by obtaining grants, so the most successful are the ones who have secured the most grant money.)

By the time scientists are famous, they may no longer spend much time actually doing science. They may spend nearly all their research time chasing grants either directly or, as Matt Welsh describes, indirectly by traveling, speaking, and schmoozing.

Pseudo-commons and anti-commons

Posted on 22 January 2011 by John

Here are a couple variations on the tragedy of the commons, the idea that shared resources can be exhausted by people acting in their individual best interests.

The first is a recent podcast by Thomas Gideon discussing the possibility of a tragedy of the pseudo-commons. His idea of a pseudo-commons is a creative commons with some barriers. He gives the example of open core companies.

The other is Michael Heller’s idea of the tragedy of the anti-commons. If too many people own a resource, the difficulties in coordination may keep the resource from being used effectively. Having too many owners can create problems similar to those caused by having no owners.

If you’re looking for something to blog about, it would be interesting to compare the pseudo-commons and the anti-commons in depth.

Economics

Why use hash puzzles for proof-of-work?

Why hash puzzles?

Why not do something useful?

Related posts

Inequalities for inequality: Gini coefficient lower bounds

Zagier’s lower bounds

Simulation

More posts on inequalities

Economics, power laws, and hacking

Key stretching

Power laws

Example

Related posts

Window dressing for ideological biases

Square root of people

Convention versus compulsion

The worst crisis Greece has ever known

Collecting versus understanding

How much time do scientists spend chasing grants?

Pseudo-commons and anti-commons

Related posts