Log semiring

Here’s a strange way to do arithmetic on the real numbers.

First, we’ll need to include +∞ and −∞ with the reals.

We define the new addition of two elements x and y to be −log (exp(−x) + exp(−y) ).

We define the new multiplication to be ordinary addition. (!)

In this new arithmetic +∞ is the additive identity and 0 is the multiplicative identity.

This new algebraic structure is called the log semiring. It’s called a semiring because it satisfies all the properties of a ring except that elements don’t necessarily have additive inverses. We’ll get into the details of the definition below.

* * *

Let’s put a subscript S on everything associated with our semiring in order to distinguish them from their more familiar counterparts. Then we can summarize the definitions above as

a +_S b = −log (exp(−a) + exp(−b) )
a *_S b = a + b
0_S = +∞
1_S = 0

Note that if we define

f(a, b) = a +_S b

then

f(a, b) = −g(−a, −b)

where g(a, b) is the soft maximum of a and b.

* * *

Finally, we list the axioms of a semiring. Note that these equations all hold when we interpret +, *, 0, and 1 in the context of S, i.e. we imagine that each has a subscript S and is as defined above.

(a + b) + c = a + (b + c)
0 + a = a + 0 = a
a + b = b + a
(a * b) * c = a * (b * c)
1 * a = a * 1 = a
a * (b + c) = (a * b) + (a * c)
(a + b) * c = (a * c) + (b * c)
0 * a = a * 0 = 0

Each of these follows immediately from writing out the definitions.

11 thoughts on “Log semiring”

Andrew

26 February 2014 at 22:33

But as a certain GSV would say, what are the civilian applications?

John

26 February 2014 at 22:40

I wouldn’t be surprised if it has applications to convex analysis and optimization.

Sjoerd Visscher

27 February 2014 at 04:05

This haskell package provides this semiring: http://hackage.haskell.org/package/logfloat-0.12.1/docs/Data-Number-LogFloat.html

It explains why it exists: it helps with dealing with very big or very small numbers, which would not fit in double-precision floating point.

Frédéric Grosshans

27 February 2014 at 08:06

About “civilian” applications of the logarithmic semiring :

1°/ I thhink that the 1984 videogame Elite (https://en.wikipedia.org/wiki/Elite_%28video_game%29) used the logarithmic semiring to speed up multiplications and division on the 8-bit processor of the BBC Micro. This processor had no multiplication and division instruction, but knew how to add and substract. I sadly haven’t been able to find a reference for this (and would be happy to have one !).

2°/ As explained at http://blog.smola.org/post/987977550/log-probabilities-semirings-and-floating-point-numbers , some computations involving probabilities and/or likelihood can quickly overflow and underflow. Replacing P with log(P), i.e working in the logarithmic semiring is a natural way to keep the numbers within a natural range and avoids the overflows.

Snowball

27 February 2014 at 08:38

A similar trick is used for computations over small finite fields like GF(2^8) or GF(2^16): make a logarithm/exponential look-up-table, then do multiplication like ab = exp(log(a) + log(b)) (taking a=0 or b=0 as a special case).

Aaron

27 February 2014 at 11:13

The log and tropical semirings are very common in speech recognition and natural language processing, again because of the very small probabilities involved (as mentioned by Sjoerd and Frédéric regarding other domains). See for example, the semiring options in http://openfst.org/. The alternative in language applications is usually a manually-scaled high dynamic range representation; in my experience, the choice between those two comes down to the throughput of logsumexp vs. that of the scaling operation.

NLP applications also occasionally incorporate more esoteric semirings (e.g. in http://aclweb.org/anthology//P/P11/P11-2001.pdf and http://aclweb.org/anthology-new/W/W11/W11-2920.pdf). But those lexicographic semirings are probably harder to apply outside of the domains they were designed for.

None

4 March 2014 at 11:44

Just want to say thanks for your blog entries (in both blogs). I genuinely appreciate the consistently interesting and frequent posts.

Barak A. Pearlmutter

10 March 2014 at 06:00

If you actually want to use this, you need to code “+” to avoid overflow, by taking out the max as a common factor. Let x>=y w.l.o.g.,

log(exp x + exp y)
= log((exp x)(1 + exp(y-x)))
= x + log(1 + exp(y-x))
= x + log1p(exp(y-x))

Pascal Giard

26 March 2014 at 08:28

It’s used quite a bit in many modern error correction decoding¹ mainly for two reasons:

1. hardware complexity:

1.1 log(e^x + e^y) = max(x,y) + ln(1 + exp(-|x-y|)) which is often time approximated directly with max(x,y), but more often a correction term is added using a lookup-table (LUT) with a small quantity of entries (see e.g. [1] where it is shown that for the application of Turbo decoding using the log-map algorithm, a 2 value LUT is enough).

1.2 additions are much simpler than multiplications

2. numerical stability

¹ Many as in: pretty much _every_ cellphone out there as it’s used in CDMA, WiMAX and LTE just to name one application.

[1] Gross, W.J.; Gulak, P.G., “Simplified MAP algorithm suitable for implementation of turbo decoders,” Electronics Letters , vol.34, no.16, pp.1577,1578, 6 Aug 1998

Edward Kmett

20 September 2016 at 10:31

Frédéric Grosshans:

David Braben, the creator of Elite, talks about the logarithm trick he used in his Post Mortem talk at GDC: http://www.gdcvault.com/play/1014628/ The relevant bit is in the “Maths” section.

Dan Piponi

20 November 2019 at 18:33

What’s (-∞)*∞ in the semiring?

Also, does (2*∞)*(-∞) equal 2*(∞*(-∞)) ?

Comments are closed.