Curvature and automatic differentiation

Curvature is tedious to calculate by hand because it involves calculating first and second order derivatives. Of course other applications require derivatives too, but curvature is the example we’ll look at in this post.

Computing derivatives

It would be nice to write programs that only explicitly implement the original function and let software take care of finding the derivatives.

Numerical differentiation

Finite difference approximations for derivatives are nothing new. For example, Euler (1707–1783) used finite differences to numerically solve differential equations. But numerical differentiation can be inaccurate or unstable, especially for higher order derivatives.

Symbolic differentiation

Symbolic differentiation is another approach, having the computer manipulate expressions much as a person would do by hand. It works well for many problems, though it scales poorly for large problems. It also requires functions to be presented in traditional mathematical form, not in the form of source code.

Automatic differentiation

Automatic differentiation is a third way. Like numerical differentiation, it works with floating point numbers, not symbolic expressions. But unlike numerical differentiation, the result does not have approximation error.

As someone put it, automatic differentiation applies the chain rule to floating point numbers rather than to symbolic expressions.

Python implementation

I’ll use the Python library autograd to compute curvature and illustrate automatic differentiation. autograd is not the most powerful automatic differentiation library for Python, but it is the simplest I’ve seen.

We will compute the curvature of a logistic curve.

y = \frac{1}{1 + e^{-x}}

The curvature of the graph of a function is given by

\kappa(x) = \frac{|y''|}{(1 + y'^2)^{3/2}}

Here’s Python code using autograd to compute the curvature.

    import autograd.numpy as np
    from autograd import grad

    def f(x):
        return 1/(1 + np.exp(-x))

    f1 = grad(f)  # 1st derivative of f
    f2 = grad(f1) # 2nd derivative of f

    def curvature(x):
        return abs(f2(x))*(1 + f1(x)**2)**-1.5

Curvature plots

The graph is relatively flat in the middle and at the far ends. In between, the graph bends creating two regions of higher curvature.

    import matplotlib.pyplot as plt

    x = np.linspace(-5, 5, 300)
    plt.plot(x, f(x))
    plt.xlabel("$x$")
    plt.ylabel("$y$")
    plt.title("Logistic curve")
    plt.savefig("logistic_curve.svg")

Now let’s look at the curvature.

    y = [curvature(t) for t in x]
    plt.plot(x, y)
    plt.xlabel("$x$")
    plt.ylabel(r"$\kappa(x)$")
    plt.title("Curvature")
    plt.savefig("plot_logistic_curvature.svg")

curvature for logistic curve

As we should expect, the curvature is small at the ends and in the middle, with local maxima in between.

We can also look at the signed curvature, the same expression as curvature but without the absolute value.

k(x) = \frac{y''}{(1 + y'^2)^{3/2}}

We plot this with the following code.

    def signed_curvature(x):
        return f2(x)*(1 + f1(x)**2)**-1.5

    y = [signed_curvature(t) for t in x]
    plt.plot(x, y)
    plt.xlabel("$x$")
    plt.ylabel(r"$k(x)$")
    plt.title("Signed curvature")
    plt.savefig("graph_signed_curvature.svg")

The result looks more like a sine wave.

The positive values mean the curve is bending counterclockwise, and the negative values mean the curve is bending clockwise.

Related post: Squircles and curvature

7 thoughts on “Curvature and automatic differentiation

  1. This reminds me of some of the math done to control a nuclear fission reactor. In particular, when changing reactor power from very low (shutdown) levels to normal levels (for thermal power production), we want to do so in a very controlled manner, first to avoid a runaway nuclear event, but also (and more pragmatically) to manage thermal stress.

    In the log domain, an ideal upward reactor power transition looks very much like the Logistic Curve, where the initial curvature models the rate of “adding reactivity” (pulling rods), and the slope of the linear portion models the thermal stress. The final curvature models the “feathering” of the reactivity, a surprisingly complex process due to the presence of increasing levels of fission products (called fission “poisons”) that add negative reactivity.

    Since we’re feeding a chain reaction, exponential growth must be strictly constrained. We need to keep the rate of power increase low enough to ensure the reactor safety systems will have time to act before a reaction could escalate beyond their ability to control it without damage to the reactor or its subsystems. Yet we want the rate high enough to get the job done in a reasonable amount of time (primarily to minimize reactor self-poisoning at lower power levels).

    Neutron flux is the proxy measurement for low-level reactor power. This flux is measured in multiple places (for redundancy and to get a profile across the core) and in multiple ways (for both “thermal” and “fast” neutrons).

    Note: Fast neutrons don’t cause many fissions in U235: They must first be “thermalized” (slowed down) in a “moderator” (something containing lots of small nuclei, such as the Hydrogen in H2O). This is why water is used in most reactors, as it serves both as a moderator and as a coolant. Better yet, once a reactor is at normal operating levels it can become self-stabilized, as power increases will heat the water more, which reduces its density, which then reduces it’s ability to thermalize neutrons, which restricts the thermal neutron supply, which in turn reduces the rate of fission and thus the rate of power production. The problem, then, is to safely get the reactor from a “cold iron” state into this sweet zone of operational thermal stability.

    The first question we ask of our reactor power monitoring system is: Will it trigger a timely SCRAM (emergency shutdown) under all operating conditions? What this really means is limiting our operating transients to what the SCRAM triggering system can handle. The mechanical SCRAM itself, which in its first stage uses compressed air or linear actuators to fire the control rods into the core (a “rapid removal of reactivity”), can require about 100ms, depending on the reactor size and design. The power monitoring system must predict the need for a SCRAM far enough in advance so the SCRAM will complete well before any damage or injury can occur.

    But it gets worse: A small number of neutrons from fission are “born” with energies in the thermal range, and are called “prompt thermal neutrons” to distinguish them from the “delayed” thermal neutrons produced by the thermal moderation process undergone by fast neutrons. For a significantly large power transient, the rate of prompt neutron production alone can further accelerate the power increase to uncontrollable levels, a state known as “prompt criticality”.

    Under the worst conditions, a “prompt criticality event” can take a reactor from “cold iron” to a steam explosion in a fraction of a second. Such events are triggered by the too-rapid “addition of reactivity”, which is normally done by pulling the control rods, but can also be done by injecting a slug of cold (dense) water into a warm reactor.

    Nearly all nuclear reactor accidents have been caused by the “prompt criticality” runaway mechanism (https://en.wikipedia.org/wiki/Prompt_criticality). While modern reactors are designed specifically to prevent and limit such accidents, they still are a risk that must always be closely monitored.

    From a nuclear physics perspective, the potential for prompt criticality must be detected early enough so that the race between the exponential rise in reactor power and the SCRAM system will be won by the SCRAM system, well before any damage or injury can occur.

    This is especially important at low power levels, below the level where core heating appreciably affects the coolant temperature. For all but the very end of the reactor startup process, the reactor coolant pumps heat the coolant far more than the reactor itself. As power levels enter the “thermal power zone”, the reactor becomes self-stabilizing, since the water, the main neutron moderator, undergoes beneficial density changes that independently stabilize the reactor power level.

    The rate of neutron production, and thus the rate of neutron detection, is inherently random and chaotic, especially at very low power levels. A first intuition would be to add filtering to manage the chaos, but the time constant of the filter would delay the detection of excessive rates. However, too little filtering could confuse normal randomness with a prompt criticality event, something that can be tolerated (better to be safe than sorry), but should be avoided (to limit the wear and delays caused by activating the SCRAM system).

    If the low-level neutron detections themselves are random, then the first derivative (rate of power change) will be worse, and the second derivative (power level acceleration or runaway rate) will be next to useless.

    To make the second derivative more useful, the curvature of the reactor power is a key parameter to measure and monitor during reactor startup. This is one of several parameters that serve as inputs to the SCRAM trigger, as well as to other alarms and operator displays.

    One of my first contracts during my initial foray into being an independent contractor (in the early 1990’s) was to diagnose and remedy the cause of inappropriate alarms that were causing far too many “false positive” SCRAMs on a research nuclear reactor. I was provided with the source code and schematics for the neutron monitor, and the recordings of several inappropriate SCRAMs it had caused.

    Since I wasn’t provided access to a research reactor to reproduce these SCRAMs under carefully controlled test conditions, I had to reverse-engineer the cause by building a simulator to wrap the monitor’s firmware. The initial worry was that the simulator would not be able to mimic the events, meaning the cause was either outside the neutron monitor, or the simulation was too imprecise.

    Fortunately, I was able to re-create each of the recorded events, but only after learning a bit more about the mathematical details of low-level fission processes, way more about how to generate random numbers for specific distributions, and a ton more about doing fast and precise math on an 8-bit embedded real-time processor (sans FPU).

  2. Hi John,

    I think automatic differentiation is a great idea, but I just have a hard time believing it can deal with complicated functions. Looks at the beta function implementation in boost.math: https://github.com/boostorg/math/blob/develop/include/boost/math/special_functions/beta.hpp

    It has variable aliasing, #ifdef confs, tons of branches, based on both compile-time and run-time knowledge. How do you autodiff that? Do you autodiff after the preprocessor runs, or before? How much effort is it to maintain a C++ automatic differentiation program?

  3. As a matter of style, you are computing the radius of curvature of one particular function. It might be clearer to define an operator that computes the radius of curvature of any function, and then apply it to the function at hand. In Haskell:

    — Calculate the radius of curvature of a function at a point.
    {-# LANGUAGE RankNTypes #-}
    import Numeric.AD

    curvature :: Floating a => (forall b. Floating b => b -> b) -> a -> a
    curvature f x = abs y” / (1 + y’*y’)**1.5
    where (y’, y”) = diff’ (diff f) x

    Examples:

    *Main> curvature cos 0
    1.0

    *Main> sigmoid x = 1 / (1 + exp (- x))
    *Main> curvature sigmoid 4
    1.7019370977116022e-2

    *Main> circle x = sqrt (1 – x^2)
    *Main> map (curvature circle) [-1,-0.9..1::Float]
    [NaN,0.99999994,1.0,0.99999994,1.0,0.99999994,1.0,0.9999999,1.0,1.0,1.0,1.0,1.0000001,1.0,1.0,0.99999994,0.99999994,1.0000001,0.9999999,1.0,NaN]

    This is possible, albeit a bit more awkward, in Python

    def curvature(f):
    def g(x):
    return abs(grad(grad(f))(x)) / (1 + grad(f)(x)**2)**(3/2)
    return g

    and

    curvature(cos)(0.0)

  4. Since the curvature involves second derivatives, specifying the curvature gives you a 2nd order differential equation. It should be solvable over some range.

Comments are closed.