Computing smooth max without overflow

Erik Erlandson sent me a note saying he found my post on computing the soft maximum helpful. (If you’re unfamiliar with the soft maximum, here’s a brief description of what it is and how you might use it.) Erik writes

I used your post on practical techniques for computing smooth max, which will probably be the difference between being able to actually use my result and not. I wrote up what I’m planning to do here.

His post extends the idea in my post to computing the gradient and Hessian of the soft max.