Here’s a strange way to do arithmetic on the real numbers.
First, we’ll need to include +∞ and −∞ with the reals.
We define the new addition of two elements x and y to be −log (exp(−x) + exp(−y) ).
We define the new multiplication to be ordinary addition. (!)
In this new arithmetic +∞ is the additive identity and 0 is the multiplicative identity.
This new algebraic structure is called the log semiring. It’s called a semiring because it satisfies all the properties of a ring except that elements don’t necessarily have additive inverses. We’ll get into the details of the definition below.
* * *
Let’s put a subscript S on everything associated with our semiring in order to distinguish them from their more familiar counterparts. Then we can summarize the definitions above as
- a +S b = −log (exp(−a) + exp(−b) )
- a *S b = a + b
- 0S = +∞
- 1S = 0
Note that if we define
f(a, b) = a +S b
then
f(a, b) = −g(−a, −b)
where g(a, b) is the soft maximum of a and b.
* * *
Finally, we list the axioms of a semiring. Note that these equations all hold when we interpret +, *, 0, and 1 in the context of S, i.e. we imagine that each has a subscript S and is as defined above.
- (a + b) + c = a + (b + c)
- 0 + a = a + 0 = a
- a + b = b + a
- (a * b) * c = a * (b * c)
- 1 * a = a * 1 = a
- a * (b + c) = (a * b) + (a * c)
- (a + b) * c = (a * c) + (b * c)
- 0 * a = a * 0 = 0
Each of these follows immediately from writing out the definitions.
But as a certain GSV would say, what are the civilian applications?
I wouldn’t be surprised if it has applications to convex analysis and optimization.
This haskell package provides this semiring: http://hackage.haskell.org/package/logfloat-0.12.1/docs/Data-Number-LogFloat.html
It explains why it exists: it helps with dealing with very big or very small numbers, which would not fit in double-precision floating point.
About “civilian” applications of the logarithmic semiring :
1°/ I thhink that the 1984 videogame Elite (https://en.wikipedia.org/wiki/Elite_%28video_game%29) used the logarithmic semiring to speed up multiplications and division on the 8-bit processor of the BBC Micro. This processor had no multiplication and division instruction, but knew how to add and substract. I sadly haven’t been able to find a reference for this (and would be happy to have one !).
2°/ As explained at http://blog.smola.org/post/987977550/log-probabilities-semirings-and-floating-point-numbers , some computations involving probabilities and/or likelihood can quickly overflow and underflow. Replacing P with log(P), i.e working in the logarithmic semiring is a natural way to keep the numbers within a natural range and avoids the overflows.
A similar trick is used for computations over small finite fields like GF(2^8) or GF(2^16): make a logarithm/exponential look-up-table, then do multiplication like ab = exp(log(a) + log(b)) (taking a=0 or b=0 as a special case).
The log and tropical semirings are very common in speech recognition and natural language processing, again because of the very small probabilities involved (as mentioned by Sjoerd and Frédéric regarding other domains). See for example, the semiring options in http://openfst.org/. The alternative in language applications is usually a manually-scaled high dynamic range representation; in my experience, the choice between those two comes down to the throughput of logsumexp vs. that of the scaling operation.
NLP applications also occasionally incorporate more esoteric semirings (e.g. in http://aclweb.org/anthology//P/P11/P11-2001.pdf and http://aclweb.org/anthology-new/W/W11/W11-2920.pdf). But those lexicographic semirings are probably harder to apply outside of the domains they were designed for.
Just want to say thanks for your blog entries (in both blogs). I genuinely appreciate the consistently interesting and frequent posts.
If you actually want to use this, you need to code “+” to avoid overflow, by taking out the max as a common factor. Let x>=y w.l.o.g.,
log(exp x + exp y)
= log((exp x)(1 + exp(y-x)))
= x + log(1 + exp(y-x))
= x + log1p(exp(y-x))
It’s used quite a bit in many modern error correction decoding¹ mainly for two reasons:
1. hardware complexity:
1.1 log(e^x + e^y) = max(x,y) + ln(1 + exp(-|x-y|)) which is often time approximated directly with max(x,y), but more often a correction term is added using a lookup-table (LUT) with a small quantity of entries (see e.g. [1] where it is shown that for the application of Turbo decoding using the log-map algorithm, a 2 value LUT is enough).
1.2 additions are much simpler than multiplications
2. numerical stability
¹ Many as in: pretty much _every_ cellphone out there as it’s used in CDMA, WiMAX and LTE just to name one application.
[1] Gross, W.J.; Gulak, P.G., “Simplified MAP algorithm suitable for implementation of turbo decoders,” Electronics Letters , vol.34, no.16, pp.1577,1578, 6 Aug 1998
Frédéric Grosshans:
David Braben, the creator of Elite, talks about the logarithm trick he used in his Post Mortem talk at GDC: http://www.gdcvault.com/play/1014628/ The relevant bit is in the “Maths” section.
What’s (-∞)*∞ in the semiring?
Also, does (2*∞)*(-∞) equal 2*(∞*(-∞)) ?