Activation functions and Iverson brackets

Neural network activation functions transform the output of one layer of the neural net into the input for another layer. These functions are nonlinear because the universal approximation theorem, the theorem that basically says a two-layer neural net can approximate any function, requires these functions to be nonlinear.

Heaviside function plot

Activation functions often have two-part definitions, defined one way for negative inputs and another way for positive inputs, and so they’re ideal for Iverson notation. For example, the Heaviside function plotted above is defined to be

f(x) = \left\{ \begin{array}{ll} 1 & \mbox{if } x > 0 \\ 0 & \mbox{if } x \leq 0 \end{array} \right.

Kenneth Iverson’s bracket notation, first developed for the APL programming language but adopted more widely, uses brackets around a Boolean expression to indicate the function that is 1 when the expression is true and 0 otherwise. With this notation, the Heaviside function can be written simply as

f(x) = [x > 0]

Iverson notation is fairly common, but not quite so common that I feel like I can use it without explanation. I find it very handy and would like to popularize it. The result of the post will give more examples.

ReLU

The popular ReLU (rectified linear unit) function is defined as

f(x) = \left\{ \begin{array}{ll} x & \mbox{if } x > 0 \\ 0 & \mbox{if } x \leq 0 \end{array} \right.

and with Iverson bracket notation as

f(x) = x[ x> 0]

The ReLU activation function is the identity function multiplied by the Heaviside function. It’s not the best example of the value of bracket notation since it could be written simply as max(0, x). The next example is better.

ELU

ELU

The ELU (exponential linear unit) is a variation on the ReLU that, unlike the ReLU, is differentiable at 0.

f(x) = \left\{ \begin{array}{ll} x & \mbox{if } x > 0 \\ e^x - 1 & \mbox{if } x \leq 0 \end{array} \right.

The ELU can be described succinctly in bracket notation.

f(x) = (e^x-1)[ x \leq 0] + x[x > 0]

PReLU

PReLU -- parameterized rectified linear unit -- graph

The PReLU (parametric rectified linear unit) depends on a small positive parameter a. This parameter must not equal 1, because then the function would be linear and the universal approximation theorem would not apply.

f(x) = \left\{ \begin{array}{ll} x & \mbox{if } x > 0 \\ ax & \mbox{if } x \leq 0 \end{array} \right.

In Iverson notation:

f(x) = ax[ x \leq 0] + x[x > 0]

Related posts

Redoing images in Midjourney

My son in law was playing around with Midjourney v5 and I asked him to try to redo some of the images I’ve made with DALL-E 2.

Back in August i wrote a post about using DALL-E 2 to generate mnemonic images for memorizing the US presidents using the Major mnemonic system.

To memorize that Rutherford B. Hayes was the 19th president. you might visualize Hayes playing a tuba because you can encode 19 as tuba. The image I created last year was cartoonish and showed Hayes playing something more like a sousaphone than a tuba. Midjourney created a photorealistic image of Hayes playing some weird instrument which is something like a tuba.

Rutherford B. Hayes playing something like a ruba

Franklin Delano Roosevelt was the 32nd president. If we use an image of the moon as the peg for 32, we could imagine FDR looking up at the moon. The previous image of FDR was really creepy and looked nothing like him. The image Midjourney created with FDR was really good.

FDR looking up at the moon

Midjourney choked on the request to create a create an image of Grover Cleveland holding an onion and a wiener dog, just as DALL-E had. It didn’t do any better substituting Grover the Muppet for Grover Cleveland.

A few weeks ago I wrote a blog post about a thought experiment involving an alien astronomer with 12 fingers working in base 12. I could only get DALL-E to draw a hint of an extra finger. We weren’t able to get Midjourney to put 12 fingers on a hand either, but we did get an interesting image of an alien astronomer.

Alien astronomer

The valley of medium reliability

Last evening my electricity went out and this morning it was restored. This got met thinking about systems that fail occasionally [1]. Electricity goes out often enough that we prepare for it. We have candles and flashlights, my desktop computer is on a UPS, etc.

A residential power outage is usually just an inconvenience, especially if the power comes back on within a few hours. A power outage to a hospital could be disastrous, and so hospitals have redundant power systems. The problem is in between, if power is reliable enough that you don’t expect it to go out, but the consequences of an outage are serious [2].

If a system fails occasionally, you prepare for that. And if it never fails, that’s great. In between is the problem, a system just reliable enough to lull you into complacency.

Dangerously reliable systems

For example, GPS used to be unreliable. It made useful suggestions, but you wouldn’t blindly trust it. Then it got a little better and became dangerous as people trusted it when they shouldn’t. Now it’s much better. Not perfect, but less dangerous.

For another example, people who live in flood planes have flood insurance. Their mortgage company requires it. And people who live on top of mountains don’t need flood insurance. The people at most risk are in the middle. They live in an area that could flood, but since it hasn’t yet flooded they don’t buy flood insurance.

 

So safety is not an increasing function of reliability, not always. It might dip down before going up. There’s a valley between unreliable and highly reliable where people are tempted to take unwise risks.

Artificial intelligence risks

I expect we’ll see a lot of this with artificial intelligence. Clumsy AI is not dangerous; pretty good AI is dangerous. Moderately reliable systems in general are dangerous, but this especially applies to AI.

As in the examples above, the better AI becomes, the more we rely on it. But there’s something else going on. As AI failures become less frequent they also become weird.

Adversarial attacks

You’ll see stories of someone putting a tiny sticker on a stop sign and now a computer vision algorithm thinks the stop sign is a frog or an ice cream sundae. In this case, there was a deliberate attack: someone knew how to design a sticker to fool the algorithm. But strange failures can also happen unprompted.

Unforced errors

Amazon’s search feature, for example, is usually very good. Sometimes I’ll get every word in a book title wrong and yet it will figure out what I meant. But one time I was searching for the book Universal Principles of Design.

I thought I remembered a “25” in the title. The subtitle turns out to be “125 ways to enhance reliability …” I searched on “25 Universal Design Principles” and the top result was a massage machine that will supposedly increase the size of a woman’s breasts. I tried the same search again this morning. The top result is a book on design. The next five results are

  1. a clip-on rear view mirror
  2. a case of adult diapers
  3. a ratchet adapter socket
  4. a beverage cup warmer, and
  5. a folding bed.

The book I was after, and whose title I remembered pretty well, was nowhere in the results.

Because AI is literally artificial, it makes mistakes no human would make. If I went to a brick-and-mortar book store and told a clerk “I’m looking for a book. I think the title is something like ’25 Universal Design Principles,” the clerk would not say “Would you like to increase your breast size? Or maybe buy a box of diapers?”

In this case, the results were harmless, even entertaining. But unexpected results in a mission-critical system would not be so entertaining. Our efforts to make systems fool-proof has been based on experience with human fools, not artificial ones.

[1] This post is an elaboration on what started as a Twitter thread.

[2] I’m told that in Norway electrical power is very reliable. But Norway is also very dependent on electricity, including for heating. Alternative sources of fuel such as propane are hard to find.