<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Soft maximum</title>
	<atom:link href="http://www.johndcook.com/blog/2010/01/13/soft-maximum/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.johndcook.com/blog/2010/01/13/soft-maximum/</link>
	<description>The blog of John D. Cook</description>
	<lastBuildDate>Sat, 31 Jul 2010 07:45:27 -0400</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: think again! &#187; Blog Archive &#187; Soft maximum</title>
		<link>http://www.johndcook.com/blog/2010/01/13/soft-maximum/comment-page-1/#comment-32933</link>
		<dc:creator>think again! &#187; Blog Archive &#187; Soft maximum</dc:creator>
		<pubDate>Sun, 14 Feb 2010 17:03:19 +0000</pubDate>
		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=4210#comment-32933</guid>
		<description>[...] Problem source: The blog of John D Cook. [...]</description>
		<content:encoded><![CDATA[<p>[...] Problem source: The blog of John D Cook. [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: links for 2010-01-21 &#171; Blarney Fellow</title>
		<link>http://www.johndcook.com/blog/2010/01/13/soft-maximum/comment-page-1/#comment-31335</link>
		<dc:creator>links for 2010-01-21 &#171; Blarney Fellow</dc:creator>
		<pubDate>Fri, 22 Jan 2010 01:32:54 +0000</pubDate>
		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=4210#comment-31335</guid>
		<description>[...] Soft maximum — The Endeavour ~1min (tags: math optimization machine-learning) [...]</description>
		<content:encoded><![CDATA[<p>[...] Soft maximum — The Endeavour ~1min (tags: math optimization machine-learning) [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tomasz Wegrzanowski</title>
		<link>http://www.johndcook.com/blog/2010/01/13/soft-maximum/comment-page-1/#comment-31310</link>
		<dc:creator>Tomasz Wegrzanowski</dc:creator>
		<pubDate>Thu, 21 Jan 2010 13:46:09 +0000</pubDate>
		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=4210#comment-31310</guid>
		<description>(someone asked about it on reddit, I can as well answer it here)

If you need a guarantee that \forall_i softmax(x1,...,xn) &lt;= xi use this function instead: softmax(x1,...xn) = log(exp(x1) + ... + exp(xn)) - log(n)

Proof: log(exp(x1) + ... + exp(xn)) - log(n) &lt;= log(n * exp(xmax)) - log(n) = log(n) + log(exp(xmax)) - log(n) = log(exp(xmax)) = xmax

Derivatives identical to the function from OP. Correction term changes depending on your choice of logarithm and exponentiation base.

And yes, the function will easily overflow when naively implemented with floating point units. Computing it as xmax + softmax(x1-xmax,...,xn-xmax) trivially solves this problem, as all arguments of exponentiation will be nonpositive, and argument of the logarithm will be between 1 and n.</description>
		<content:encoded><![CDATA[<p>(someone asked about it on reddit, I can as well answer it here)</p>
<p>If you need a guarantee that \forall_i softmax(x1,&#8230;,xn) &lt;= xi use this function instead: softmax(x1,&#8230;xn) = log(exp(x1) + &#8230; + exp(xn)) &#8211; log(n)</p>
<p>Proof: log(exp(x1) + &#8230; + exp(xn)) &#8211; log(n) &lt;= log(n * exp(xmax)) &#8211; log(n) = log(n) + log(exp(xmax)) &#8211; log(n) = log(exp(xmax)) = xmax</p>
<p>Derivatives identical to the function from OP. Correction term changes depending on your choice of logarithm and exponentiation base.</p>
<p>And yes, the function will easily overflow when naively implemented with floating point units. Computing it as xmax + softmax(x1-xmax,&#8230;,xn-xmax) trivially solves this problem, as all arguments of exponentiation will be nonpositive, and argument of the logarithm will be between 1 and n.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Stephan Schmidt</title>
		<link>http://www.johndcook.com/blog/2010/01/13/soft-maximum/comment-page-1/#comment-31296</link>
		<dc:creator>Stephan Schmidt</dc:creator>
		<pubDate>Thu, 21 Jan 2010 06:01:57 +0000</pubDate>
		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=4210#comment-31296</guid>
		<description>@Tim: You can also use Wolfram Alpha

http://www.wolframalpha.com/input/?i=g(x,+y)+%3D+log(+exp(x)+%2B+exp(y)+)</description>
		<content:encoded><![CDATA[<p>@Tim: You can also use Wolfram Alpha</p>
<p><a href="http://www.wolframalpha.com/input/?i=g(x,+y)+%3D+log(+exp(x)+%2B+exp(y)+)" rel="nofollow">http://www.wolframalpha.com/input/?i=g(x,+y)+%3D+log(+exp(x)+%2B+exp(y)+)</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: John</title>
		<link>http://www.johndcook.com/blog/2010/01/13/soft-maximum/comment-page-1/#comment-31283</link>
		<dc:creator>John</dc:creator>
		<pubDate>Wed, 20 Jan 2010 23:33:51 +0000</pubDate>
		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=4210#comment-31283</guid>
		<description>Andrew: You&#039;re right that you&#039;d need to be careful implementing g(x, y). The intermediate computations could easily overflow even though the final value would be a moderate-sized number. A robust implementation of g(x, y) would not just turn the definition into source code. See &lt;a href=&quot;http://www.johndcook.com/blog/2010/01/20/how-to-compute-the-soft-maximum/&quot; rel=&quot;nofollow&quot;&gt;this post&lt;/a&gt; for one way to compute soft maximum while avoiding overflow.</description>
		<content:encoded><![CDATA[<p>Andrew: You&#8217;re right that you&#8217;d need to be careful implementing g(x, y). The intermediate computations could easily overflow even though the final value would be a moderate-sized number. A robust implementation of g(x, y) would not just turn the definition into source code. See <a href="http://www.johndcook.com/blog/2010/01/20/how-to-compute-the-soft-maximum/" rel="nofollow">this post</a> for one way to compute soft maximum while avoiding overflow.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Andrew Dalke</title>
		<link>http://www.johndcook.com/blog/2010/01/13/soft-maximum/comment-page-1/#comment-31282</link>
		<dc:creator>Andrew Dalke</dc:creator>
		<pubDate>Wed, 20 Jan 2010 23:19:25 +0000</pubDate>
		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=4210#comment-31282</guid>
		<description>Yes. That&#039;s the same approach as we did in high school to have &quot;the equation for a square: x**N + y**N = R**N where N is very large.&quot; But then there&#039;s numerical problems like k=8;log(exp(98*k) + exp(101.1*k))/k overflows a IEEE double. In any case, this is a trick I&#039;m going to keep in mind for the future, so thanks for writing about it!</description>
		<content:encoded><![CDATA[<p>Yes. That&#8217;s the same approach as we did in high school to have &#8220;the equation for a square: x**N + y**N = R**N where N is very large.&#8221; But then there&#8217;s numerical problems like k=8;log(exp(98*k) + exp(101.1*k))/k overflows a IEEE double. In any case, this is a trick I&#8217;m going to keep in mind for the future, so thanks for writing about it!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: John</title>
		<link>http://www.johndcook.com/blog/2010/01/13/soft-maximum/comment-page-1/#comment-31281</link>
		<dc:creator>John</dc:creator>
		<pubDate>Wed, 20 Jan 2010 22:48:24 +0000</pubDate>
		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=4210#comment-31281</guid>
		<description>Andrew: You can get a better approximation by using a different kind of normalization. See the extra parameter k in g(x, y; k) defined near the end of the post.</description>
		<content:encoded><![CDATA[<p>Andrew: You can get a better approximation by using a different kind of normalization. See the extra parameter k in g(x, y; k) defined near the end of the post.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: The &#8220;Soft Maximum&#8221; function</title>
		<link>http://www.johndcook.com/blog/2010/01/13/soft-maximum/comment-page-1/#comment-31280</link>
		<dc:creator>The &#8220;Soft Maximum&#8221; function</dc:creator>
		<pubDate>Wed, 20 Jan 2010 22:44:52 +0000</pubDate>
		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=4210#comment-31280</guid>
		<description>[...] full post on Hacker News      If you enjoyed this article, please consider sharing it!            Tagged with: function [...]</description>
		<content:encoded><![CDATA[<p>[...] full post on Hacker News      If you enjoyed this article, please consider sharing it!            Tagged with: function [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Andrew Dalke</title>
		<link>http://www.johndcook.com/blog/2010/01/13/soft-maximum/comment-page-1/#comment-31279</link>
		<dc:creator>Andrew Dalke</dc:creator>
		<pubDate>Wed, 20 Jan 2010 22:38:13 +0000</pubDate>
		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=4210#comment-31279</guid>
		<description>Ahh, of course. But keep y small then my g(x,y) ~ log(exp(x)/2) = x - log(2) ~ x. In either case it&#039;s a constant difference and it&#039;s more a question of where you want the error to go. I&#039;m thinking more though about the region where things are sharp. In that case x and y are about the same.</description>
		<content:encoded><![CDATA[<p>Ahh, of course. But keep y small then my g(x,y) ~ log(exp(x)/2) = x &#8211; log(2) ~ x. In either case it&#8217;s a constant difference and it&#8217;s more a question of where you want the error to go. I&#8217;m thinking more though about the region where things are sharp. In that case x and y are about the same.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: John</title>
		<link>http://www.johndcook.com/blog/2010/01/13/soft-maximum/comment-page-1/#comment-31275</link>
		<dc:creator>John</dc:creator>
		<pubDate>Wed, 20 Jan 2010 22:14:12 +0000</pubDate>
		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=4210#comment-31275</guid>
		<description>Troy: one reason to replace the hard maximum with the soft maximum would be to use optimization methods (e.g. Newton&#039;s method) that require objective functions to be twice-differentiable. I&#039;ve also heard that the soft maximum function comes up in electrical engineering problems with power calculations.

Tim: I made my plots using Mathematica. The colors were the defaults for contour plots.

Andrew: The normalization you propose would make g(x, x) = x, which sounds like a nice property to have. But the more important property is that g(x, y) approaches max(x, y) as either argument gets large. That property would no longer hold if you included the normalization constant.</description>
		<content:encoded><![CDATA[<p>Troy: one reason to replace the hard maximum with the soft maximum would be to use optimization methods (e.g. Newton&#8217;s method) that require objective functions to be twice-differentiable. I&#8217;ve also heard that the soft maximum function comes up in electrical engineering problems with power calculations.</p>
<p>Tim: I made my plots using Mathematica. The colors were the defaults for contour plots.</p>
<p>Andrew: The normalization you propose would make g(x, x) = x, which sounds like a nice property to have. But the more important property is that g(x, y) approaches max(x, y) as either argument gets large. That property would no longer hold if you included the normalization constant.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Troy Gilbert</title>
		<link>http://www.johndcook.com/blog/2010/01/13/soft-maximum/comment-page-1/#comment-31274</link>
		<dc:creator>Troy Gilbert</dc:creator>
		<pubDate>Wed, 20 Jan 2010 22:10:50 +0000</pubDate>
		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=4210#comment-31274</guid>
		<description>Very interesting. I&#039;m curious to know some examples of where and why you&#039;d replace the hard maximum with the soft maximum?</description>
		<content:encoded><![CDATA[<p>Very interesting. I&#8217;m curious to know some examples of where and why you&#8217;d replace the hard maximum with the soft maximum?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Andrew Dalke</title>
		<link>http://www.johndcook.com/blog/2010/01/13/soft-maximum/comment-page-1/#comment-31271</link>
		<dc:creator>Andrew Dalke</dc:creator>
		<pubDate>Wed, 20 Jan 2010 21:53:34 +0000</pubDate>
		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=4210#comment-31271</guid>
		<description>You&#039;ve left out a normalization, I think, since g(x,x) = log(exp(x)+exp(x)) = log(2*exp(x)) = log(2) + x, which is not f(x,x)= x. You likely want g(x,y) = log((exp(x)+exp(y))/2), which is the Generalized f-mean that Peter Turney mentions.</description>
		<content:encoded><![CDATA[<p>You&#8217;ve left out a normalization, I think, since g(x,x) = log(exp(x)+exp(x)) = log(2*exp(x)) = log(2) + x, which is not f(x,x)= x. You likely want g(x,y) = log((exp(x)+exp(y))/2), which is the Generalized f-mean that Peter Turney mentions.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tim</title>
		<link>http://www.johndcook.com/blog/2010/01/13/soft-maximum/comment-page-1/#comment-31270</link>
		<dc:creator>Tim</dc:creator>
		<pubDate>Wed, 20 Jan 2010 21:49:18 +0000</pubDate>
		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=4210#comment-31270</guid>
		<description>What software did you use to create this graph?

http://www.johndcook.com/contoursoft.png

I really like the color palette</description>
		<content:encoded><![CDATA[<p>What software did you use to create this graph?</p>
<p><a href="http://www.johndcook.com/contoursoft.png" rel="nofollow">http://www.johndcook.com/contoursoft.png</a></p>
<p>I really like the color palette</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Peter Turney</title>
		<link>http://www.johndcook.com/blog/2010/01/13/soft-maximum/comment-page-1/#comment-30714</link>
		<dc:creator>Peter Turney</dc:creator>
		<pubDate>Wed, 13 Jan 2010 15:02:13 +0000</pubDate>
		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=4210#comment-30714</guid>
		<description>The family of power means includes the arithmetic mean, the geometric mean, the harmonic mean, minimum, and maximum:

http://en.wikipedia.org/wiki/Power_means

By varying the exponent, you can make the &quot;maximum&quot; as soft or as hard as you want.

The soft maximum is a special case of the generalized f-mean:

http://en.wikipedia.org/wiki/Generalized_f-mean</description>
		<content:encoded><![CDATA[<p>The family of power means includes the arithmetic mean, the geometric mean, the harmonic mean, minimum, and maximum:</p>
<p><a href="http://en.wikipedia.org/wiki/Power_means" rel="nofollow">http://en.wikipedia.org/wiki/Power_means</a></p>
<p>By varying the exponent, you can make the &#8220;maximum&#8221; as soft or as hard as you want.</p>
<p>The soft maximum is a special case of the generalized f-mean:</p>
<p><a href="http://en.wikipedia.org/wiki/Generalized_f-mean" rel="nofollow">http://en.wikipedia.org/wiki/Generalized_f-mean</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Divye</title>
		<link>http://www.johndcook.com/blog/2010/01/13/soft-maximum/comment-page-1/#comment-30710</link>
		<dc:creator>Divye</dc:creator>
		<pubDate>Wed, 13 Jan 2010 13:58:58 +0000</pubDate>
		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=4210#comment-30710</guid>
		<description>Nice! 

Found via your tweet.</description>
		<content:encoded><![CDATA[<p>Nice! </p>
<p>Found via your tweet.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tomas Olsson</title>
		<link>http://www.johndcook.com/blog/2010/01/13/soft-maximum/comment-page-1/#comment-30708</link>
		<dc:creator>Tomas Olsson</dc:creator>
		<pubDate>Wed, 13 Jan 2010 13:47:12 +0000</pubDate>
		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=4210#comment-30708</guid>
		<description>Great post!
/Tomas</description>
		<content:encoded><![CDATA[<p>Great post!<br />
/Tomas</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: John</title>
		<link>http://www.johndcook.com/blog/2010/01/13/soft-maximum/comment-page-1/#comment-30707</link>
		<dc:creator>John</dc:creator>
		<pubDate>Wed, 13 Jan 2010 13:29:26 +0000</pubDate>
		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=4210#comment-30707</guid>
		<description>Thanks, Douglas. I fixed the typo.</description>
		<content:encoded><![CDATA[<p>Thanks, Douglas. I fixed the typo.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Douglas</title>
		<link>http://www.johndcook.com/blog/2010/01/13/soft-maximum/comment-page-1/#comment-30705</link>
		<dc:creator>Douglas</dc:creator>
		<pubDate>Wed, 13 Jan 2010 13:26:05 +0000</pubDate>
		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=4210#comment-30705</guid>
		<description>I think the line:
g(x, y) = log( exp(x), exp(y) ).
should be
g(x, y) = log( exp(x) + exp(y) ).
?</description>
		<content:encoded><![CDATA[<p>I think the line:<br />
g(x, y) = log( exp(x), exp(y) ).<br />
should be<br />
g(x, y) = log( exp(x) + exp(y) ).<br />
?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Slavomir Kaslev</title>
		<link>http://www.johndcook.com/blog/2010/01/13/soft-maximum/comment-page-1/#comment-30704</link>
		<dc:creator>Slavomir Kaslev</dc:creator>
		<pubDate>Wed, 13 Jan 2010 13:18:16 +0000</pubDate>
		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=4210#comment-30704</guid>
		<description>Nice find. =-)

This is actually the core of Exponential Shadow Maps technique: www-flare.cs.ucl.ac.uk/staff/J.Kautz/publications/esm_gi08.pdf</description>
		<content:encoded><![CDATA[<p>Nice find. =-)</p>
<p>This is actually the core of Exponential Shadow Maps technique: www-flare.cs.ucl.ac.uk/staff/J.Kautz/publications/esm_gi08.pdf</p>
]]></content:encoded>
	</item>
</channel>
</rss>

<!-- Dynamic Page Served (once) in 0.563 seconds -->
