Curvature and automatic differentiation

Curvature is tedious to calculate by hand because it involves calculating first and second order derivatives. Of course other applications require derivatives too, but curvature is the example we’ll look at in this post.

Computing derivatives

It would be nice to write programs that only explicitly implement the original function and let software take care of finding the derivatives.

Numerical differentiation

Finite difference approximations for derivatives are nothing new. For example, Euler (1707–1783) used finite differences to numerically solve differential equations. But numerical differentiation can be inaccurate or unstable, especially for higher order derivatives.

Symbolic differentiation

Symbolic differentiation is another approach, having the computer manipulate expressions much as a person would do by hand. It works well for many problems, though it scales poorly for large problems. It also requires functions to be presented in traditional mathematical form, not in the form of source code.

Automatic differentiation

Automatic differentiation is a third way. Like numerical differentiation, it works with floating point numbers, not symbolic expressions. But unlike numerical differentiation, the result does not have approximation error.

As someone put it, automatic differentiation applies the chain rule to floating point numbers rather than to symbolic expressions.

Python implementation

I’ll use the Python library autograd to compute curvature and illustrate automatic differentiation. autograd is not the most powerful automatic differentiation library for Python, but it is the simplest I’ve seen.

We will compute the curvature of a logistic curve.

y = \frac{1}{1 + e^{-x}}

The curvature of the graph of a function is given by

\kappa(x) = \frac{|y''|}{(1 + y'^2)^{3/2}}

Here’s Python code using autograd to compute the curvature.

    import autograd.numpy as np
    from autograd import grad

    def f(x):
        return 1/(1 + np.exp(-x))

    f1 = grad(f)  # 1st derivative of f
    f2 = grad(f1) # 2nd derivative of f

    def curvature(x):
        return abs(f2(x))*(1 + f1(x)**2)**-1.5

Curvature plots

The graph is relatively flat in the middle and at the far ends. In between, the graph bends creating two regions of higher curvature.

    import matplotlib.pyplot as plt

    x = np.linspace(-5, 5, 300)
    plt.plot(x, f(x))
    plt.title("Logisitic curve")

Now let’s look at the curvature.

    y = [curvature(t) for t in x]
    plt.plot(x, y)

curvature for logistic curve

As we should expect, the curvature is small at the ends and in the middle, with local maxima in between.

We can also look at the signed curvature, the same expression as curvature but without the absolute value.

k(x) = \frac{y''}{(1 + y'^2)^{3/2}}

We plot this with the following code.

    def signed_curvature(x):
        return f2(x)*(1 + f1(x)**2)**-1.5

    y = [signed_curvature(t) for t in x]
    plt.plot(x, y)
    plt.title("Signed curvature")

The result looks more like a sine wave.

The positive values mean the curve is bending counterclockwise, and the negative values mean the curve is bending clockwise.

Related post: Squircles and curvature

Apple design, squircles, and curvature

A “squircle” is a sort of compromise between a square and circle, but one that differs from a square with rounded corners. It’s a shape you’ll see, for example, in some of Apple’s designs.


A straight line has zero curvature, and a circle with radius r has curvature 1/r. So in a rounded square the curvature jumps from 0 to 1/r where the flat side meets the circular corner. But in the figure above there’s no abrupt change in curvature but instead a smooth transition. More on that in just a bit.

To get a smoother transition from the corners to the flat sides, designers use what mathematicians would call Lp balls, curves satisfying

|x|^p + |y|^p = 1

in rectangular coordinates or

r = \frac{1}{\left(|\cos \theta|^p + |\sin \theta|^p \right)^{1/p}}in polar coordinates.

When p = 2 we get a circle. When p = 2.5, we get square version of the superellipse. As p increases the corners get sharper, pushing out toward the corner of a square. Some sources define the squircle to be the case p = 4 but others say p = 5. The image at the top of the post uses p = 4. [*]

To show how the curvature changes, let’s plot the curvature on top of the squircle. The inner curve is the squircle itself, and radial distance to the outer curve is proportional to the curvature.

Here’s the plot for p = 4.

curvature p = 4

And here’s the plot for p = 5.

curvature p = 5

If we were to make the same plot for a rounded square, the curvature would be zero over the flat sides and jump to some constant value over the circular caps. We can see above that the curvature is largest over the corners but continuously approaches zero toward the middle of the sides.

Related: Swedish Superellipse

[*] Use whatever value of p you like. The larger p is, the closer the figure becomes to a square. The closer p is to 2, the closer the figure is to a circle.

You can even use smaller values of p. The case p = 1 gives you a diamond. If p is less than 1, the sides bend inward. The plot below shows the case p = 0.5.

Lp ball p = 0.5

As p gets smaller, the sides pull in more. As p goes to zero, the figure looks more and more like a plus sign.

Easter eggs and yellow pigs

An Easter egg is a hidden feature, a kind of joke. The term was first used in video games but the idea is broader and older than that. For example, Alfred Hitchcock made a brief appearance in all his movies. And I recently heard that there’s a pineapple or reference to a pineapple in every episode of the television show Psych.

Michael Spivak put references to “yellow pig” in some of his books. I’ve heard that he put allusions to yellow pigs in all his books, but I don’t have all his books, and I haven’t been able to find yellow pigs in two of his books that I do own.

Spivak’s calculus text is dedicated to the memory of Y. P.

Dedicated to the Memory of Y. P.

If you look up yellow pig in the index of the book, it will take you to a page that makes a passing reference to “whole hog.”

Spivak’s publishing company, Publish or Perish Press, originally used a pig as part of its logo.

Publish or Perish old logo

The web site now has no logo. His most recent book, Physics for Mathematicians: Mechanics I, uses a different logo.

The cover of Spivak’s Differential Geometry, Volume 1, second edition, has two yellow drawings of a pig.

Cover of Spivak's Differential Geometry, Volume 1, second edition

If you look up yellow pig in the index, it takes you to a page that doesn’t mention pigs, but does include a drawing that looks something like a ham.

Ham-like illustration from Spivak's Differential Geometry, volume 1

I do not see a reference to yellow pig in Spivak’s first book, Calculus on Manifolds. It was published by Benjamin Cummings. Maybe they would not allow Easter eggs, or maybe the idea of including Easter eggs didn’t occur to Spivak until he had his own publishing company. I also do not see a reference to yellow pigs in his recent Physics for Mathematicians book.

Summary of books mentioned above:

One of my favorite proofs: Lagrange multipliers

One of my lightbulb moments in college was when my professor, Jim Vick, explained the Lagrange multiplier theorem. The way I’d seen it stated in a calculus text gave me no feel for why it should be true, but his explanation made sense immediately.

Suppose f(x) is a function of several variables, i.e. x is a vector, and g(x) = c is a constraint. Then the Lagrange multiplier theorem says that at the maximum of f subject to the constraint g we have ∇f = λ ∇g.

Where does this mysterious λ come from? And why should the gradient of your objective function be related to the gradient of a constraint? These seem like two different things that shouldn’t even be comparable.

Here’s the geometric explanation. The set of points satisfying g(x) = c is a surface. And for any k, the set of points satisfying f(x) = k is also surface. Imagine k very large, larger than the maximum of f on the surface defined by g(x) = c. You could think of the surface g(x) = c being a balloon inside the larger balloon  f(x) = k.

Now gradually decrease k, like letting the air out of the outer balloon, until the surfaces g(x) = c and f(x) = k first touch. At that point, the two surfaces will be tangent, and so their normal vectors, given by their gradients, point in the same direction. That is, ∇f and ∇g are parallel, and so ∇f is some multiple of ∇g. Call that multiple λ.

I don’t know how well that explanation works when written down. But when I heard Jim Vick explain it, moving his hands in the air, it was an eye-opener.

This is not a rigorous proof, and it does not give the most general result possible, but it explains what’s going on. It’s something to keep in mind when reading proofs that are more rigorous or more general. As I comment on here,

Proofs serve two main purposes: to establish that a proposition is true, and to show why it is true.

The literally hand-wavy proof scores low on the former criterion and high on the latter.


Jim Vick was a great teacher. Some of us affectionately called him The Grinning Demon because he was always smiling, even while he gave devilishly hard homework. He was Dean of Natural Sciences when I took a couple classes from him. He later became Vice President for Student Affairs and kept teaching periodically. He has since retired but still teaches.

After taking his topology course, several of us asked him to teach a differential geometry course. He hesitated because it’s a challenge to put together an undergraduate differential geometry course. The usual development of differential geometry uses so much machinery that it’s typically a graduate-level course.

Vick found a book that starts by looking only at manifolds given by level sets of smooth functions, like the surfaces discussed above. Because these surfaces sit inside a Euclidean space, you can quickly get to some interesting geometric results using elementary methods. We eventually got to the more advanced methods, but by then we had experience in a more tangible setting. As Michael Atiyah said, abstraction should follow experience, not precede it.

Covariant and contravariant

The terms covariant and contravariant come up in many contexts. An earlier post discussed how the terms are used in programming and category theory. The meaning in programming is an instance of the general use in category theory.

Vector fields can be covariant or contravariant too. This is also an instance of the categorical usage, except the terminology is backward.

Michael Spivak explains:

Nowadays such situations are always distinguished by calling the things which go in the same direction “covariant” and the things which go in the opposite direction “contravariant.” Classical terminology used these same words, and it just happens to have reversed this: a vector field is called a contravariant vector field, while a section of T*M is called a covariant vector field. And no one had the gall or authority to reverse terminology sanctified by years of usage. So it’s very easy to remember which kind of vector field is covariant, and which is contravariant — it’s just the opposite of what it logically ought to be.

Emphasis added.

In defense of classical nomenclature, it was established decades before category theory. And as Spivak explains immediately following the quotation above, the original terminology made sense in its original context.

From Spivak’s Differential Geometry, volume 1. I own the 2nd edition and quoted from it. But it’s out of print so I linked to the 3rd edition. I doubt the quote changed between editions, but I don’t know.

Related: Applied category theory