Swish function and a Swiss mathematician

The previous post looked at the swish function and related activation functions for deep neural networks designed to address the “dying ReLU problem.”

Plot of swish(x) - \frac{ x \exp(x)}{1 + \exp(x)}

Unlike many activation functions, the function f(x) is not monotone but has a minimum near x0 = -1.2784. The exact location of the minimum is

x_0 = -W\left(\frac{1}{e} \right) - 1

where W is the Lambert W function, named after the Swiss mathematician Johann Heinrich Lambert [1].

The minimum value of f is -0.2784. I thought maybe I made a mistake, confusing x0 and f(x0). If you look at more decimal place, the minimum value of f is


and occurs at


That can’t be a coincidence.

It turns out you can prove that f(x0) − x0 = 1 without explicitly finding x0. Take the derivative of f using the quotient rule and set the numerator equal to zero. This shows that at the minimum,

1 + x_0 + \exp(x_0) = 0


\begin{align*} f(x_0) - x_0 &= \frac{x_0 \exp(x_0)}{1 + \exp(x_0)}  - x_0 \\ &= \frac{x_0 \exp(x_0)}{1 + \exp(x_0)} - \frac{x_0 (1+\exp(x_0))}{1 + \exp(x_0)} \\ &= \frac{-x_0}{1 + \exp(x_0)} \\ &= \frac{1 + \exp(x_0)}{1 + \exp(x_0)} \\ &= 1 \end{align*}

The fourth equation is where we use the equation satisfied at the minimum.

[1] Lambert is sometimes considered Swiss and sometimes French. The plot of land he lived on belonged to Switzerland at the time, but now belongs to France. I wanted him to be Swiss so could use “swish” and “Swiss” together in the title.

One thought on “Swish function and a Swiss mathematician

  1. I see that the Republic of Mulhouse didn’t become part of France until well after Lambert died, so I’d say you would be justified as describing him as “Swiss”.

Comments are closed.