Divisibility rules in hex

We all learn in elementary school that a number is divisible by 2 if the last digit is even. A number is divisible by 3 if the sum of the digits is divisible by 3. A number is divisible by 5 if its last digit is 0 or 5. Etc.

But imagine we were born with 8 fingers on each hand and did arithmetic in base 16, also called hexadecimal. Or suppose schools decided to teach children hexadecimal instead of decimal to give them a head start in computer science. We would count 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F, 10, 11, …

What would divisibility rules look like? Here are the simplest rules.

  • n divisible by 2 if its last digit is even.
  • n is divisible by 3 if its digit sum is divisible by 3.
  • n is divisible by 4 if its last digit is 0, 4, 8, or C.
  • n is divisible by 5 if its digit sum is divisible by 5.
  • n is divisible by 6 if it is divisible by 2 and 3.
  • n is divisible by 8 if it ends in 0 or 8.
  • n is divisible by A if it is divisible by 2 and 5.
  • n is divisible by F if its digit sum is divisible by F.
  • n is divisible by 10 (i.e. sixteen) if its last digit is 0.

In this list “digit” means hexadecimal digit.

There are also rules for checking divisibility by 7 and 11 (i.e. seventeen) in hexadecimal analogous to the rules for divisibility by 7 and 11 in decimal.

To test divisibility by 7, split off the last digit from the rest of the number. Triple it and subtract from what’s left of the original number. Repeat this process until you get something you recognize as being divisible by 7 or not.

For example, suppose you start with F61. Split this into F6 and 1. Subtract 3 from F6 to get F3. Now split F3 into F and 3. Subtract 9 from F to get 6. F61 is not divisible by 7 because 6 is not divisible by 7. The explanation for how this works is completely analogous to the corresponding rule in base 10.

You can test for divisibility by 11 in base 16 (i.e. by seventeen) just as you’d test divisibility by 11 in base 10 (or any other base). Add up the digits in an odd position and subtract the sum of the digits in an even position. For example, suppose you start with 135B8. The digits in odd position sum to 1 + 5 + 8. The digits in even positions sum to 3 + B. Both sums are equal, so there difference is 0, which is divisible by 11, so 135B8 is divisible by 11.

There’s nothing special about base 16 or base 10. You could come up with analogous divisibility rules in any base.

Related posts:

For daily tweets on algebra and other math, follow @AlgebraFact on Twitter.

AlgebraFact logo

Divisibility by 7

How can you tell whether a number is divisible by 7? Most everyone knows how to easily tell whether a number is divisible by 2, 3, 5, or 9. A few less know tricks for testing divisibility by 4, 6, 8, or 11. But not many people have ever seen a trick for testing divisibility by 7.

Here’s the trick. Remove the last digit from the number, double it, and subtract it from the first part of the number. Do this repeatedly until you get something you recognize as being divisible by 7 or not.

For example, start with 432. Split into 43 and 2. Subtract 4 from 43 to get 39. Since 39 isn’t divisible by 7, neither is 432.

For another example, start with 8631. Split into 863 and 1. Subtract 2 from 863 to get 861.

Now split 861 into Split into 86 and 1. Subtract 2 from 86. Maybe you recognize 84 as a multiple of 7. If not, double 4 and subtract from 8 to get 0, which is divisible by 7. Either way, we conclude that 8631 is divisible by 7.

Why does this work? Let b be the last digit of a number n and let a be the number we get when we split off b. That says n = 10a + b. Now n is divisible by 7 if and only if n – 21b is divisible by 7. But n – 21b = 10(a – 2b) and this is divisible by 7 if and only if a – 2b is divisible by 7.

What about the remainder when you divide a number by 7? Here’s where the rule for 7 differs from the more familiar divisibility rules. For example, a number is divisible by 3 if its digit sum is divisible by 3, and furthermore the remainder when a number is divided by 3 is the remainder when its digit sum is divided by 3. But the divisibility rule for 7 does not give the remainder when a number is divided by 7. For a simple example, the divisibility rule for 31 terminates in 1, but the remainder with 31 is divided by 7 is 3.

Why doesn’t the divisibility rule for 7 give the remainder? It is true that 10a + b and (10a + b) – 21b have the same remainder when divided by 7. But then we factored this into 10(a -2b). It’s true that 10(a – 2b) is divisible by 7 if and only if (a – 2b) is divisible by 7, but if neither is divisible by 7 then they will leave different remainders.

Related posts:

For daily tweets on algebra and other math, follow @AlgebraFact on Twitter.

AlgebraFact logo

Why a computer buffer is called a buffer

Why is a chunk of working memory called a “buffer”?

The word ‘buffer’, by the way, comes from the meaning of the word as a cushion that deadens the force of a collision. In early computers, a buffer cushioned the interaction between files and the computer’s central processing unit. The drums or tapes that held a file and the central processing unit were pieces of equipment that were very different from each other, working at their own speeds, in spurts. The buffer made it possible for them to work together effectively. Eventually, the buffer grew from being an intermediary, a temporary holding place, to being the place where work is done. This transformation is rather like that of a small seaport that grew into a great city: once it was merely the place where cargo was warehoused temporarily before being loaded onto ships; then it became a business and cultural center in its own right.

From An Introduction to Programming in Emacs Lisp. The same book explains two meanings of “argument”.

Approximations with Pythagorean triangles

A few days ago someone asked the following on math.stackexchange:

Is it possible to get arbitrarily near any acute angle with Pythagorean triangles?

A Pythagorean triangle is a right triangle with all integer sides. The sides of such a triangle are called Pythagorean triples. The most famous example is a 3-4-5 triangle. A little less known example is 5-12-13. Since not many Pythagorean triples come to mind, you might think there aren’t that many of them. Or maybe they’re unevenly distributed so that you can approximate some angles as well as you’d like but there’s a limit to how well could you approximate others.

It turns out you can indeed approximate any acute angle with that of a Pythagorean triangle. The answer on math.stackexchange explains why this is possible and here I give an algorithm. I wrote a little online calculator for finding Pythagorean triangle approximations based on the algorithm below.

A result going all the way back to Euclid says that (a, b, c) is a Pythagorean triple if and only if (a, b, c) = (m2n2, 2mn, m2 + n2) where m and n are positive integers with m > n. The tangent of the corresponding triangle is 2mn/(m2n2) = 2t/(1 – t2) where t = n/m. Given an acute angle θ, we need to find t such that tan(θ) = 2t/(1 – t2) and then find a rational approximation to t.

We can solve for t using the quadratic formula and find that t = (1 – cos(θ))/sin(θ). We can then find a rational approximation to t using Farey’s algorithm.

We’d like to not use larger numbers than necessary in our Pythagorean triangles. Suppose we don’t want the hypotenuse to be much more than H. If tn/m and m2 + n2 = H, then (1 + t2) m2H. Therefore we’d like to look for rational approximations to t with denominator m no more than sqrt(H/(1 + t2)). This is approximate, however, since t is only approximately equal to n/m.

If we use the Python function farey from a previous post, the code for finding a Pythagorean approximation to an acute angle θ is simply the following.

from math import sin, cos, sqrt

def pythagorean_triple(acute_angle, max_hypoteneuse):

    t = (1 - cos(acute_angle))/sin(acute_angle)
    max_denominator = sqrt( max_hypoteneuse /(1 + t*t) )
    n, m = farey(t, max_denominator)
    return m*m - n*n, 2*m*n, m*m + n*n

If the angle is small and the maximum hypotenuse is too small, the best rational approximation to t might be zero. In that case the algorithm returns a degenerate triangle, i.e. one of the sides has zero length. You could increase the maximum hypotenuse and try again until you get a valid triangle.

Update: The procedure above will return a Pythagorean triple, but not necessarily a primitive Pythagorean triple, i.e. the three numbers may have a common factor as Douglas pointed out in the comments. The triple is only primitive if m and n have opposite parity. You could guarantee a primitive triple by dividing by the greatest common divisor of the first two terms. (Anything that divides a and b must divide c.)

Here’s revised code that returns only primitive triples.

def gcd(a, b):
    if b == 0:
        return a
        return gcd(b, a%b)

def pythagorean_triple(acute_angle, max_hypoteneuse):

    t = (1 - cos(acute_angle))/sin(acute_angle)
    max_denominator = sqrt( max_hypoteneuse /(1 + t*t) )
    n, m = farey(t, max_denominator)
    a = m*m - n*n
    b = 2*m*n
    c = m*m + n*n
    g = gcd(a, b)
    a /= g
    b /= g
    c /= g
    return a, b, c

Related post: What’s so hard about finding a hypotenuse?

Paleolithic nonsense

The “paleolithic diet” has gotten more press lately. Paleolithic diet advocates say our ancestors lived in a state of gastronomic innocence, eating mostly meat, before they were seduced by agriculture and fell into eating grain. Some go further and say that not only should we eat like cavemen, we should live like cavemen as well. For example, we should have random bursts of exercise, as if fleeing a saber-toothed tiger, followed by long periods of leisure.

I am amused by how much some people believe they know about paleolithic life. Most of us don’t know that much about how our great grandparents lived, and yet others make confident detailed claims about the lifestyles of pre-historic ancestors. This is convenient since their claims are unlikely to be proven wrong, given how little we know or are ever likely to know ancient lifestyles. Of course those making the boldest claims are not scientists but popularizers who take a hint from the scientists and run with it.

I have no opinion on the actual recommendations of the fans of paleolithic culture. Maybe we would be better off eating more meat or having random bursts of intense exercise; I have no idea. However, I object to the pseudo-scientific rhetoric used to support the recommendations. I also object to the implicit assumption that it would necessarily be good to emulate the lives of paleolithic humans even if we did know how they lived.

Even the little we think we know about ancient cuisine should be called into question. A paper entitled Thirty thousand-year-old evidence of plant food processing was posted online this week which suggests people were making flour 10,000 years earlier than previously thought. Of course this doesn’t mean that people all over the world were living on pasta, but it does underscore how little we know about the real paleolithic diet.

Best rational approximation

Suppose you have a number x between 0 and 1. You want to find a rational approximation for x, but you only want to consider fractions with denominators below a given limit.

For example, suppose x = 1/e = 0.367879…  Rational approximations with powers of 10 in the denominator are trivial to find: 3/10, 36/100, 367/1000, etc. But say you’re willing to have a denominator as large as 10. Could you do better than 3/10? Yes, 3/8 = 0.375 is a better approximation. What about denominators no larger than 100? Then 32/87 = 0.36781… is the best choice, much better than 36/100.

How do you find the best approximations? You could do a brute force search. For example, if the maximum denominator size is N, you could try all fractions with denominators less than or equal to N. But there’s a much more efficient algorithm. The algorithm is related to the Farey sequence named after John Farey, though I don’t know whether he invented the algorithm.

The idea is to start with two fractions, a/b = 0/1 and c/d = 1/1. We update either a/b or c/d at each step so that a/b will be the best lower bound of x with denominator no bigger than b, and c/d will be the best upper bound with denominator no bigger than d. At each step we do a sort of binary search by introducing the mediant of the upper and lower bounds. The mediant of a/b and c/d is the fraction (a+c)/(b+d) which always lies between a/b and c/d.

Here is an implementation of the algorithm in Python. The code takes a number x between 0 and 1 and a maximum denominator size N. It returns the numerator and denominator of the best rational approximation to x using denominators no larger than N.

def farey(x, N):
    a, b = 0, 1
    c, d = 1, 1
    while (b <= N and d <= N):
        mediant = float(a+c)/(b+d)
        if x == mediant:
            if b + d <= N:
                return a+c, b+d
            elif d > b:
                return c, d
                return a, b
        elif x > mediant:
            a, b = a+c, b+d
            c, d = a+c, b+d

    if (b > N):
        return c, d
        return a, b

In Python 3.0, the float statement could be removed since the division operator does floating point division of integers.

Read more about rational approximation in Breastfeeding, the golden ratio, and rational approximation.

Here’s an example of a situation in which you might need rational approximations. Suppose you’re designing an experiment which will randomize subjects between two treatments A and B. You want to randomize in blocks of size no larger than N and you want the probability of assigning treatment A to be p. You could find the best rational approximation a/b to p with denominator b no larger than N and use the denominator as the block size. Each block would be a permutation of a A’s and ba B’s.

Update: Eugene Wallingford wrote a blog post about implementing the algorithm in Klein, a very restricted language used for teaching compiler writing.

Update: There’s a bug in the code. See discussion below.

Good old regular expressions

Here are two examples that persuaded me long ago that regular expressions could be powerful. Both come from The Unix Programming Environment by Kernighan and Pike (1984).

The first problem is to produce a list of all English words that contain all five vowels exactly once and in alphabetical order.

The book creates a regular expression aphavowels


then uses it to filter a dictionary file

egrep -f alphavowels /usr/dict/web2

This produced 16 words ranging from abstemious to majestious.

The second problem is to produce a list of all English words of at least six letters with letters appearing in increasing alphabetical order.

The book creates a regular expression named monotonic


then uses it to filter a dictionary file as before, except there is an additional filter stage.

egrep -f monotonic /usr/dict/web2 | grep '......'

This produced 17 words including common words such as almost and ghosty. Some of the more interesting results were bijoux, chintz, and egilops. Kernighan and Pike explain that egilops is a disease that attacks wheat.


The regular expressions above are fairly long, but shorter and more transparent than a procedural program to solve the same problem. The solutions may look mysterious at first sight, but they are entirely straight-forward once you know the most basic features of regular expressions.

In the first problem, the pattern [^aeiou] says to look for anything that isn’t a vowel, i.e. is a consonant (assuming entries in the dictionary file contain only letters). So the regular expression says to start at the beginning of each line and look for zero or more consonants, followed by an ‘a’, followed by zero or more consonants, followed by an ‘e’, and so on down to a ‘u’ optionally followed by consonants at the end of the line.

In the second problem, the question mark matches zero or one instances of a character, i.e. the character is optional. The regular expression says to start at the beginning of each line, look for an optional ‘a’, followed by an optional ‘b’, and so forth to the end of the line. Then the output is filtered by another regular expression ....... Since a period matches any character, a sequence of six periods says to select only words that contain six characters.

Related links:

Tips for learning regular expressions

For daily tips on regular expressions, follow @RegexTip on Twitter.

Regex tip icon

Buggy code is biased code

As bad as corporate software may be, academic software is usually worse. I’ve worked in industry and in academia and have seen first-hand how much lower the quality bar is in academia. And I’m not the only one who has noticed this.

Why does this matter? Because buggy code is biased code. Bugs that cause the software to give unwanted results are more likely to be noticed and fixed. Bugs that cause software to produce expected results are more likely to remain in place.

If your software simulates some complex phenomena, you don’t know what it’s supposed to do; that’s why you’re simulating. Errors are easier to spot in consumer software. A climate model needs a higher level of quality assurance than a word processor because bugs in the latter are more obvious. Genomic analysis may contain egregious errors and no one ever know, but a bug in an MP3 player is sure to annoy users.

You have to test simulation software carefully. You have to test special cases and individual components to have any confidence in the final output. You can’t just look at the results and say “Yeah, that’s about what I expected.”

Related posts:

The Titanic Effect

Gerald Weinberg’s book Secrets of Consulting is filled with great aphorisms. One of these he calls the Titanic Effect:

The thought that disaster is impossible often leads to an unthinkable disaster.

If your model says disaster is extremely unlikely, the weakest link may be your model.

In The Black Swan, Nassim Taleb looks at the risks facing a casino. The biggest risks have not been lucky gamblers. The actuaries working for casinos understand the risks of lucky customers very well and put policies into place to protect against these risks. But the actuaries didn’t account for the possibility that a tiger might maul an irreplaceable performer, costing the casino $100 million. Neither did they account for the possibility that an employee might forget to file tax paperwork or that someone might kidnap a casino owner’s daughter. No one could have foreseen these events, and that’s the point: there are always risks outside your model.

Related post: Feasibility studies

Kibi, mebi, gibi

Computers like powers of 2, people like powers of 10, and the fact that 210 is approximately 103 makes it easy to convert between the two powers.

A kilobyte is 1000 bytes like a kilogram is 1000 grams. But the former is only approximate and the latter is exact. A kilobyte is actually 210 = 1024 bytes. Similarly a megabyte is 220 or approximately 106 bytes and a gigabyte is 230 or approximately 109 bytes. About 10 years ago the International Electrotechnical Commission (IEC) introduced new units to eliminate this ambiguity. The prefix “kibi” means exactly 210, “mebi” means 220, and “gibi” means 230. (The IEC also introduced tebi, pebi, exbi, zebi, and yobi as binary analogs for tera, peta, exa, zetta, and yota.)

I’ll give arguments for and against using the new units and conclude with a note about their implementation in PowerShell.


The number of bytes in a kilobyte is about 2.5% larger than the number of grams in a kilogram. But the number of bytes in a megabyte is about 5% larger than the number if grams in a megagram. Every time you go up by a factor of 210, the approximation degrades about another 2.5%. So a terabyte is almost 1.1 trillion bytes. At larger sizes, the traditional prefixes become progressively more misleading and hence the need for more precise alternatives is greater.

Another advantage to the new prefixes is that “kibi, mebi, gibi” is reminiscent of “veni, vidi, vici.”

(Why does the quality of the approximation (210)k ≈ 103k by about 2.5% for each increment in k? Because by the binomial theorem, (1 + 0.024)k ≈ 1 + 0.024 k.)


The only thing special about decimal powers of 2 — 210, 220, 230 etc. — is that they are approximately round numbers in base 10 that have traditional prefixes associated with them. From a computer hardware perspective, numbers like 216, 232, and 264 are more natural. If you’re not going to use the traditional prefixes, abandon them rather than create near copies. Just state the number of bytes, using scientific notation if necessary.


In Windows PowerShell, the units kb, mb, and gb to mean 210, 220, and 230. For example, if you type 2mb + 5gb at the PowerShell command prompt, the result is 2102272. If you really want to compute 2×106 + 5×103 and can’t do it in your head, you could type 2e6 + 5e3.

Bruce Payette commented on this feature of PowerShell in his book PowerShell in Action.

Yes, the PowerShell team is aware that these notations are not consistent with the IEC recommendations (kibabyte, and so on). Since the point of this notation is convenience and most people in the IT space are more comfortable with Kb than with Ki, we choose to err on the side of comfort over conformance in this one case. Sorry. This particular issue generated easily the second most heated debate on the PowerShell internal and external beta tester lists.

Pi seconds equals one nanocentury

Duff’s rule says that a nanocentury is about π seconds. Assuming a year is 365.25 days, there are 3,155,760,000 seconds in a century. So a nanocentury, one billionth of a century, is 3.15576 seconds, roughly π seconds.

This odd fact is surprisingly useful in back-of-the-envelope calculations. In order to determine whether something is computationally feasible, you have to go from how much work can be done in a second to how much work could be done on calendar time scales. For example, suppose some operation is going to take 1015 steps and you can carry out 106 operations per second. How long would that take? Obviously 109 seconds, but how long is that in familiar units of time? According to Duff’s rule it’s about a third of a century so about 30 years.

Since the square root of 10 is approximately π, you could say there are about square root of 10 seconds in a nanocentury. In fact, the square root of 10 is 3.162 and so it’s closer to the number of seconds in a nanocentury than π is. The advantage to square root of 10 is that it is exactly half an order of magnitude, 101/2. So you could say a century is 9.5 orders of magnitude longer than a second, or a year is 7.5 orders of magnitude longer than a second. Or perhaps more memorably,

A year is about 30 megaseconds.

If you interpret “mega” as 220 rather than 106 this approximation gets even better. (Technically, this would be 30 mebiseconds. SI distinguishes “mega” = 106 = 1,000,000 from “mebi” = 220 = 1,048,576, though the latter isn’t widely used. Most people have either never heard of the new prefixes like “mebi” or think they sound silly and prefer the ambiguity of using “mega” to mean two slightly different things. See Kibi, mebi, gibi.)

I wrote briefly about Duff’s rule a while back in the post Three rules of thumb. That post also includes a great video of Grace Hopper explaining to David Letterman her rule of thumb that light travels about one foot in a nanosecond.

There isn’t a googol of anything

Before Google, there was googol, the number 10^100, written as a 1 followed by 100 zeroes.

There are about 4 × 10^79 atoms in the universe. (Here’s a derivation of that number.) You could bump that number up a little by counting particles rather than atoms, but not by much. There’s not a googol of anything physical in the universe.

On the other hand, numbers larger than a googol routinely arise in application. When you’re counting potential things rather than physical things you can run into numbers much larger than a googol. This happens all the time in probability calculations. Inconceivably large numbers pop up in intermediate steps on the way to moderate-sized results.

Related posts: