At first “discrete logarithm” sounds like a contradiction in terms. Logarithms aren’t discrete, not as we usually think of them. But you can define and compute logarithms in modular arithmetic.

What is a logarithm? It’s the solution to an exponential equation. For example, the logarithm base 10 of 2 is the solution to the equation 10^{x} = 2, and so *x* =0.30103. Similarly, you could look for the logarithm base 10 of 2 modulo 19. This is an integer value of *x* such that 10* ^{x}* = 2 mod 19, and you can show that 17 is a solution.

If we work modulo an integer *n*, the discrete logarithm base *a* of a number *y* is an integer *x* such that *a*^{x} = *y*. For some values of *a* there may not be a solution, but there will be a solution if *a* is a generator of the integers mod* n.*

Brute force the simplest algorithm for finding discrete logarithms: try *x* = 1, 2, 3, …, *n* until you find a value of *x* satisfying *a*^{x} = *y*. The problem with brute force is that the expected run time is on the order of *n*, and *n* is often very large in application.

Discrete logarithms are expensive to compute, but we can do better than brute force. (Cryptographic applications take advantage of the fact that discrete logarithms are hard to compute.) There’s a simple algorithm by Daniel Shanks, known as the baby-step giant-step algorithm, that reduces the run time from order *n* to order roughly √*n*. (Actually O(√*n* log *n*) for reasons we’ll see soon.)

Let *s* be the ceiling of the square root of *n*. The idea is to search for *x* by taking giant steps forward (multiples of *s*) and baby (single) steps backward.

Let *G* be the set of pairs (*a*^{gs} mod *n*, *gs*) where *g* ranges from 1 to *s*. These are the giant steps.

Let *B* be the set of pairs (*y**a*^{b} mod *n*, *b*) where *b* ranges from 0 to *s*-1. These are the baby steps.

Now look for a pair in *G* and a pair in *B* with the matching first components, i.e. *ya*^{b} = *a*^{gs}. Then *x* = *gs* – *b* is the required solution.

Constructing the sets *G* and *B *requires O(*s*) operations, and matching the two sets takes O(*s* log *s*) operations.

Here’s an example, going back to the problem above of finding the discrete log base 10 of 2 mod 19, using a little Python code to help with the calculations.

The square root of 19 is between 4 and 5 so *s* = 5.

>>> [(2*10**r % 19, r) for r in range(5)] [(2, 0), (1, 1), (10, 2), (5, 3), (12, 4)]

>>> [(10**(4*t) % 19, 4*t) for t in range(1,6)] [(6, 4), (17, 8), (7, 12), (4, 16), (5, 20)]

The number 5 appears as the first element of a pair in both *B* and *G*. We have (5, 3) in *B* and (5, 20) in *G* so *x* = 20 – 3 = 17.

**Related**: Applied number theory

Interesting, thank you for sharing. I am not sure to follow the matching (intersection) in O(s log s), though: my understanding is that this can be done in O(s), for instance through a hash table implementation. Can you clarify this?

The purpose of the Baby-Step-Giant-Step algorithm is to make the calculation of the discrete-logarithm more efficient. With that in mind, and since I was not familiar with this scheme, I coded BSGS in Haskell twice, once using an array for the precomputed list, the second time hashing the precomputed list. It seems clear that with the array, the program runs in O(n) time, and with the hash the run time is O(sqrt(n)). The tradeoff is that using the hash takes more space than with the array.

For anyone interested, my Haskell code, along with other number-theoretical routines is available on my website http://www.jimshapiro.com . Comments, criticism always welcomed.

> Let G be the set of pairs (a^(gs) mod n, gs) where g ranges from 1 to s.

> The square root of 19 is between 4 and 5 so s = 5.

>>> [(10**(4*t) % 19, 4*t) for t in range(1,6)]

These seem to be the giant steps, since the range goes from 1 to 5 (inclusive), but where does the number 4 come from? I thought at first 10 would be raised to the power gs here, but that would make the constant in the exponent equal to 5 (the value of s), not 4.