<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: How to compute the soft maximum</title>
	<atom:link href="http://www.johndcook.com/blog/2010/01/20/how-to-compute-the-soft-maximum/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.johndcook.com/blog/2010/01/20/how-to-compute-the-soft-maximum/</link>
	<description>The blog of John D. Cook</description>
	<lastBuildDate>Sat, 11 Feb 2012 01:10:06 -0500</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: New tech reports &#8212; The Endeavour</title>
		<link>http://www.johndcook.com/blog/2010/01/20/how-to-compute-the-soft-maximum/comment-page-1/#comment-105297</link>
		<dc:creator>New tech reports &#8212; The Endeavour</dc:creator>
		<pubDate>Mon, 26 Sep 2011 14:31:55 +0000</pubDate>
		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=4291#comment-105297</guid>
		<description>[...] had a request to turn my blog posts on the soft maximum into a tech report, so here it [...]</description>
		<content:encoded><![CDATA[<p>[...] had a request to turn my blog posts on the soft maximum into a tech report, so here it [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: johnb</title>
		<link>http://www.johndcook.com/blog/2010/01/20/how-to-compute-the-soft-maximum/comment-page-1/#comment-31305</link>
		<dc:creator>johnb</dc:creator>
		<pubDate>Thu, 21 Jan 2010 10:44:14 +0000</pubDate>
		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=4291#comment-31305</guid>
		<description>The active ingredient in the &quot;safe&quot; version of soft maximum is log(1+exp(x)), which could be called the &quot;soft half-wave rectifier&quot; function.  It is the softness in the soft maximum!

Be aware that there is already a different function called &quot;softmax&quot;.   
I named it (although others may also have done so) so I must take the blame for a neat  but misleading name.
It should have been called softargmax: it provides a soft version of indicating which of several values is greatest in value, and it has several uses in so-called neural network algorithms.  It is important to subtract off the maximum value from all the inputs before applying exp, for the reasons you explain.
See this summary: http://www.faqs.org/faqs/ai-faq/neural-nets/part2/section-12.html
It is the multiple-input generalization of the logistic: (1/(1+exp(-x))).

So to avoid confusion I suggest sticking to you name &quot;soft maximum&quot; for the function you describe, and avoiding the name &quot;softmax&quot;.</description>
		<content:encoded><![CDATA[<p>The active ingredient in the &#8220;safe&#8221; version of soft maximum is log(1+exp(x)), which could be called the &#8220;soft half-wave rectifier&#8221; function.  It is the softness in the soft maximum!</p>
<p>Be aware that there is already a different function called &#8220;softmax&#8221;.<br />
I named it (although others may also have done so) so I must take the blame for a neat  but misleading name.<br />
It should have been called softargmax: it provides a soft version of indicating which of several values is greatest in value, and it has several uses in so-called neural network algorithms.  It is important to subtract off the maximum value from all the inputs before applying exp, for the reasons you explain.<br />
See this summary: <a href="http://www.faqs.org/faqs/ai-faq/neural-nets/part2/section-12.html" rel="nofollow">http://www.faqs.org/faqs/ai-faq/neural-nets/part2/section-12.html</a><br />
It is the multiple-input generalization of the logistic: (1/(1+exp(-x))).</p>
<p>So to avoid confusion I suggest sticking to you name &#8220;soft maximum&#8221; for the function you describe, and avoiding the name &#8220;softmax&#8221;.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: John</title>
		<link>http://www.johndcook.com/blog/2010/01/20/how-to-compute-the-soft-maximum/comment-page-1/#comment-31292</link>
		<dc:creator>John</dc:creator>
		<pubDate>Thu, 21 Jan 2010 02:34:20 +0000</pubDate>
		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=4291#comment-31292</guid>
		<description>Nemo: &lt;a href=&quot;http://www.johndcook.com/blog/2010/01/13/soft-maximum/#comment-31271&quot; rel=&quot;nofollow&quot;&gt;Andrew Dalke&lt;/a&gt; raised the same objection in a comment to my first post on soft maximum. 

The choice of definition depends on what properties you want the soft maximum to have. If you want softmax(x, x) = x to hold, the factor of 2 you suggest works. But if you do that, then softmax(x, y) - max(x, y) no longer goes to zero as one of the arguments grows large.</description>
		<content:encoded><![CDATA[<p>Nemo: <a href="http://www.johndcook.com/blog/2010/01/13/soft-maximum/#comment-31271" rel="nofollow">Andrew Dalke</a> raised the same objection in a comment to my first post on soft maximum. </p>
<p>The choice of definition depends on what properties you want the soft maximum to have. If you want softmax(x, x) = x to hold, the factor of 2 you suggest works. But if you do that, then softmax(x, y) &#8211; max(x, y) no longer goes to zero as one of the arguments grows large.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Nemo</title>
		<link>http://www.johndcook.com/blog/2010/01/20/how-to-compute-the-soft-maximum/comment-page-1/#comment-31290</link>
		<dc:creator>Nemo</dc:creator>
		<pubDate>Thu, 21 Jan 2010 02:16:34 +0000</pubDate>
		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=4291#comment-31290</guid>
		<description>Not bad, but you can do a tiny bit better using the &lt;a href=&quot;http://www.opengroup.org/onlinepubs/9699919799/functions/log1p.html&quot; rel=&quot;nofollow&quot;&gt;log1p&lt;/a&gt; function.

log1p(exp(minimum-maximum)) + maximum

This avoids unnecessary precision loss when &quot;maximum&quot; is close to zero and minimum-maximum is less than -40 or so.

On another note, I am not sure I like how the &quot;soft max&quot; is always larger than the max.  In particular, when x=y, shouldn&#039;t any &quot;max&quot; function always return x?

log((exp(x) + exp(y))/2) is always between x and y, and reduces to x when x=y.  (This is reminiscent of the &lt;a href=&quot;http://mathworld.wolfram.com/PowerMean.html&quot; rel=&quot;nofollow&quot;&gt;&quot;n-th root of the average of the n-th powers&quot;&lt;/a&gt;, which approaches the max as n approaches infinity.)

In short, I might subtract log(2) from your formula.</description>
		<content:encoded><![CDATA[<p>Not bad, but you can do a tiny bit better using the <a href="http://www.opengroup.org/onlinepubs/9699919799/functions/log1p.html" rel="nofollow">log1p</a> function.</p>
<p>log1p(exp(minimum-maximum)) + maximum</p>
<p>This avoids unnecessary precision loss when &#8220;maximum&#8221; is close to zero and minimum-maximum is less than -40 or so.</p>
<p>On another note, I am not sure I like how the &#8220;soft max&#8221; is always larger than the max.  In particular, when x=y, shouldn&#8217;t any &#8220;max&#8221; function always return x?</p>
<p>log((exp(x) + exp(y))/2) is always between x and y, and reduces to x when x=y.  (This is reminiscent of the <a href="http://mathworld.wolfram.com/PowerMean.html" rel="nofollow">&#8220;n-th root of the average of the n-th powers&#8221;</a>, which approaches the max as n approaches infinity.)</p>
<p>In short, I might subtract log(2) from your formula.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

<!-- Dynamic Page Served (once) in 1.218 seconds -->

