Lagrange multiplier setup: Now what?

Suppose you need to optimize, i.e. maximize or minimize, a function f(x). If this is a practical problem and not a textbook exercise, you probably need to optimize f(x) subject to some constraint on x, say g(x) = 0.

Hmm. Optimize one function subject to a constraint given by another function. Oh yeah, Lagrange multipliers! You just need to solve

\nabla f = \nabla g

Or maybe you have two constraints: g(x) = 0 and h(x) = 0. No big deal. Just introduce another Lagrange multiplier:

\nabla f = \lambda_1 \nabla g + \lambda_2 \nabla h

OK. Now what?

Suppose your x is an n-dimensional vector, and you have k constraints. Now you have a system of n + k equations in n + k unknowns, one equation for each component of the Lagrange multiplier equation and one equation for each constraint.

If your system of equations is linear, hurrah! Maybe it won’t be hard to find where your function takes on its maximum and minimum values [1].

But in general you have a system of nonlinear equations, potentially a lot of nonlinear equations in a lot of unknowns. Maybe you’re new problem is harder than your original problem. That is, it might be easier to optimize your original function, subject to its constraints, using a different method than to solve the nonlinear system of equations that applying Lagrange multipliers gives you.

How do you solve a system of nonlinear equations? That’s a hard question because “nonlinear” is not a hypothesis; it’s the negation of a hypothesis. Suppose I tell you I’m thinking of an animal, and it’s not an elephant. That’s not much to go on. Non-linear is sorta like non-elephantine: you’ve said what a thing is not, but you haven’t said what it is.

One way for functions to be nonlinear is to be convex; convexity is often a tractable generalization of linearity. If your original optimization problem is convex, i.e. both the objective function and the constraints are convex functions, then directly solving your optimization problem may be easier than introducing Lagrange multipliers.

Another class of nonlinear functions is polynomials. If Lagrange multipliers give you a system of several polynomial equations in several unknowns, you may be able to solve your system of equations using techniques from algebraic geometry.

If your equations do not involve polynomial functions, but do involve functions that can be adequately approximated by polynomial functions, maybe the algebraic geometry route will work. But “adequate” here means that the solution to the approximate problem, the polynomial problem, gives a sufficiently accurate solution to the original problem. That may be hard to assess.

Your initial exposure to Lagrange multipliers in a calculus class may have left the wrong impression. Calculus textbooks are filled carefully contrived exercises that can be solved in a moderate amount of time, and for good reason. The problems you solve for exercises are not intrinsically important; you’re solving them to learn a technique. But in a real application, where the problem is intrinsically important (and the solution technique is not) things might be a lot harder.

Related posts

[1] If then system of equations is linear, then your optimization problem must have a very special form: optimizing a quadratic objective function subject to linear constraints, i.e. a quadratic programming problem.

Avoid having to integrate by parts twice

Suppose f(x) and g(x) are functions that are each proportional to their second derivative. These include exponential, circular, and hyperbolic functions. Then the integral of f(x) g(x) can be computed in closed form with a moderate amount of work.

The first time you see how such integrals are computed, it’s an interesting trick. I wrote up an example here. It may seem like you’re going in circles—and if you’re not careful you will go in circles—but the integral pops out.

After you’ve done this kind of integration a few times, the novelty wears off. You know how the calculation is going to end, and it’s a bit tedious and error-prone to get there.

There’s a formula that can compute all these related integrals in one fell swoop [1].


f''(x) = hf(x)


g''(x) = kg(x)

for constants h and k. All the functions mentioned at the top of the post are of this form. Then

\int f(x)g(x)\,dx = \frac{1}{h-k} \bigg(f'(x)g(x) - f(x)g'(x)\bigg) + C.

So, for example, let

f(x) = e^{20x}


g(x) = \sin(23x).

Then h = 400, k = −529, and

\int e^{20x} \sin(23x) \, dx = \frac{1}{929} \bigg(20 e^{20x} \sin (23x) - 23e^{20x}\cos(23x) \bigg) + C.

Here’s another example.


f(x) = \cos(x)


g(x) = \sinh(30x).

Then h = -1, k = 900, and

\int \cos(x) \sinh(30x) \,dx = \frac{1}{901}\bigg(\sin(x) \sinh(30x) + 30 \cos(x)\cosh(30x)\bigg) + C

Implicit in the formula above is the requirement hk. If h does equal k then the integral can be done by more basic techniques.

Related posts

[1] Donald K. Pease. A useful integral formula. American Mathematical Monthly. December 1959. p. 908.

Good autocomplete

Woman using cellphone

I’m not sure whether automatic text completion on a mobile device is a net good. It sometimes saves a few taps, but it seems like it’s at least as likely to cause extra work.

Although I’m ambivalent about autocomplete on my phone, I like it in my text editor. The difference is that in my editor autocomplete is

  • configurable,
  • transparent,
  • intentional.

I value these characteristics, but I understand that not everyone does, especially on a mobile device.


Emacs abbrevs (abbreviations) are completely configurable. The only abbrevs the editor knows about are the ones you ask for, and you can see (and edit) your set of abbrevs in an ordinary text file. You can easily toggle in and out of abbrev-mode, turning all abbrevs on or off.

You can also determine whether an abbrev applies to all modes or just some modes. For example, maybe you’d like imp to automatically expand to

    import matplotlib.pyplot as plt

when writing a Python file, but not when composing an email in which you might wish to call someone an imp.

Abbrevs are not saved by default, but Emacs will ask you when you exit if you’d like to save abbrevs created during your session. This encourages experimentation. You can try out an abbrev before deciding whether to keep it, or define an abbrev for a single task with no intention of keeping it.

Emacs users often use some naming convention so that all their abbrevs begin with a key sequence they’re unlikely to type unless they want it to be expanded. I use “jc” as a prefix, both because these are my initials and because I’m unlikely to type these two letters for any other purpose. (Here’s a grid of other rare two-letter combinations.)

For example, I have an abbrev jciff that expands to the HTML sequence


for the symbol ↔︎ to denote iff (if and only if). The first character is the symbol itself, and the second character is a variation selector asking browsers to render the symbol as text and not as an emoji.

Dynamic abbrevs

An advantage of autocomplete on a mobile device is that you don’t have to go to the effort of programming completions that you might want. Emacs has something similar in what it calls dynamic abbrevs. These are completions based on the text in the file you’re editing.

For example, suppose you’re writing about politician Hery Rajaonarimampianina. After having typed his name once, the second time you could type a few letters of his name and then type Alt-/ to have Emacs fill in the rest. If you haven’t typed enough letters to uniquely determine his name, you can type Alt-/ repeatedly to cycle through the possibilities. Note that you have to initiate the text completion. Emacs will not insert a single character unless explicitly asked to do so.

What to abbreviate

Of course abbreviations are a matter of personal choice. Some people use text expander programs with hundreds of abbreviations. I reserve editor abbreviations for long blocks of text, such as boilerplate HTML, or text like the example above that is difficult to remember or error-prone to type. I don’t find it worthwhile to have lots of minor shortcuts because I won’t remember that I’ve created them. (See the previous post for a discussion of remembering what you’ve automated.)

If you abbreviate something because you find it hard to remember, you trade the task of remembering the thing itself with the task of remembering the abbreviation. I would be cautious about accepting this trade-off. It’s often better to simply memorize the thing that’s hard to remember, because then you can use it in more contexts.

In the example above I abbreviate two Unicode character code points. I abbreviate these rather than memorizing them because I don’t need to use them very often, but often enough that I’ve had to look them up a few times. I’ve memorized several Unicode code points because I need them often enough to justify memorizing them.

Related posts

Small-scale automation


Saving keystrokes is overrated, but maintaining concentration is underrated.

This post is going to look at automating small tasks in order to maintain concentration, not to save time.

If a script lets you easily carry out some ancillary task without taking your concentration off your main task, that’s a big win. Maybe the script only saves you five seconds, but it could save you from losing a train of thought.

If your goal in writing a script is to preserve concentration, that script has to be effortless to run. It’s worth taking a few minutes to search for a script that is going to save you an hour. But if the purpose of a script is to preserve your state of flow, having to search for it defeats the purpose.

Remembering what you’ve written

I’ve often said to myself “I’ve had to do this task several times. I should automate it!” Good idea, except I couldn’t quickly find the code later when I needed it.

I’ve heard many people say you should automate repetitive tasks; I’ve never heard anyone discuss the problem of remembering what you’ve automated and how to invoke it. Maybe there’s less discussion of the memory problem because the solution is personal. It depends, for instance, on what tools you use and what you find memorable.

One suggestion would be to periodically review what you’ve written, say once a month [1]. Maybe you’ve put useful aliases in your shell configuration file. Skimming that config file occasionally could help you remember what’s there. If you have a directory where you keep custom scripts, it could help to browse that directory once in a while. It helps if there aren’t too many places you need to look, which leads to the next section.

Tool priorities

It would also help to minimize the number of tools you use, or at least the number of tools you customize.

And even with a very minimal tool set, it helps to have a primary emphasis on one of those tools. For example, maybe your work environment consists mostly of a shell, a programming language, and an editor. When it’s not obvious which tool to pick, are you going to write a shell script, a program, or an editor extension? By picking one tool as your default, you get better at that tool, accumulate more sample code for that tool, and have fewer contexts to explore when you’re looking for something you’ve written.


[1] A long time ago I heard someone say he reads documentation ever Friday afternoon. I did that for a while and recommend it. Maybe set aside a few minutes each Friday afternoon to review and tweak config files. If you don’t get through everything, pick up next week where you left off.

Remove algorithmic filters from what you read

I typically announce new blog posts from my most relevant twitter account: data science from @DataSciFact, algebra and miscellaneous math from @AlgebraFact, TeX and typography from @TeXtip, etc.

If you’d like to be sure that you’re notified of each post, regardless of what algorithms Twitter applies to your feed, you can subscribe to this blog via email or RSS.

RSS and email logos

If you subscribe via email you’ll see each post in your RSS stream as it is published. If you subscribe by email you’ll get one email each day around 11:00 AM Central Time.

You can also get an email once a month with highlights from the blog.

If you’d like to follow one or more of my Twitter accounts without going through Twitter, you can subscribe via RSS using BazQuz.


I highly recommend RSS in general. It’s a simple way to keep up with what you choose to keep up with, unfiltered by any corporate algorithms. To get started with RSS, you need an RSS client. There are many to choose from. I use Inoreader.

Your RSS client will probably be able to find the RSS feed of a site from its URL. If not, you can usually find the RSS feed of a site by appending /rss or /feed to its URL. Or look for the orange icon above.

Number of bits in a particular integer

When I think of bit twiddling, I think of C. So I was surprised to read Paul Khuong saying he thinks of Common Lisp (“CL”).

As always when working with bits, I first doodled in SLIME/SBCL: CL’s bit manipulation functions are more expressive than C’s, and a REPL helps exploration.

I would not have thought of Common Lisp being more expressive for bit manipulation than C, though in hindsight perhaps I should have. Common Lisp is a huge language, and a lot of thought went into it. It’s a good bet that if CL supports something it supports it well.

One of the functions Khoung uses is integer-length. I looked it up in Guy Steele’s book. Here’s what he says about the function.

This function performs the computation

ceiling(log2(if integer < 0 theninteger else integer + 1))

… if integer is non-negative, then its value can be represented in unsigned binary form in a field whose width is no smaller than (integer-length integer). …

Steele also describes how the function works for negative arguments and why this is useful. I’ve cut these parts out because they’re not my focus here.

I was curious how you’d implement integer-length in C, and so I turned to Hacker’s Delight. This book doesn’t directly implement a counterpart to integer-length, but it does implement the function nlz (number of leading zeros), and in fact implements it many times. Hacker’s Delight points out that for a 32-bit unsigned integer x,

⌊log2(x)⌋ = 31 – nlz(x)


⌈log2(x)⌉ = 32 – nlz(x -1).

So nlz(x) corresponds to 32 − (integer-length x).

Hacker’s Delight implements nlz at least 10 times. I say “at least” because it’s unclear whether a variation of sample code discussed in commentary remarks counts as a separate implementation.

Why so many implementations? Typically when you’re doing bit manipulation, you’re concerned about efficiency. Hacker’s Delight gives a variety of implementations, each of which may have advantages in different hardware. For example, one implementation is recommended in the case that your environment has a microcode implementation of popcount. The book also gives ways to compute nlz that involve casting an integer to a floating point number. The advisability of such a technique will be platform-dependent.

If you’re looking for C implementations of integer-length you can find a few on Sean Anderson’s Bit Twiddling Hacks page.

Related posts

Lemniscate of Bernoulli

The lemniscate of Bernoulli came up in a post a few days ago. This shape is a special case of a Cassini oval:

((x + a)² + y²) ((xa)² + y²) = a4.

Here’s another way to arrive at the lemniscate. Draw a hyperbola (blue in the figure below), then draw circles centered at points on the hyperbola and passing through the center of the hyperbola. The figure-eight that is traced out in the middle is a lemniscate.

Van Aubel’s theorem

Van Aubel’s theorem is analogous to Napoleon’s theorem, though not a direct generalization of it.

Napoleon’s theorem says to start with any triangle and draw equilateral triangles on each side. Connect the centers of the three new triangles, and you get an equilateral triangle.

Illustration of Napoleon's theorem

Now suppose you start with a quadrilateral and draw squares on each side. Connect the centers of the squares. Do you get a square? No, but you do get something interesting.

Van Aubel’s theorem says that if you connect the centers of squares on opposite faces of the quadrilateral (dashed lines below), the two line segments are the same length and perpendicular to each other.

The solid gray lines show that if you connect the centers of the square you do not get a square.

Further development

Van Aubel’s theorem dates to 1878 [1], and there have been numerous extensions ever since. Recently [2] Pellegrinetti found that six points related to Van Aubel’s theorem all lie on a circle. The six points are the midpoints of the two Van Aubel line segments (the dashed line segments above), the intersection of these two line segments, the midpoints of the two diagonals of the quadrilateral, and one more point.

The final point in Pellegrinetti’s theorem comes from flipping each of the squares over, connecting the centers, and noting where the connecting lines intersect. This is analogous to the messy case of Napoleon’s theorem: When you draw the squares on the inside of the quadrilateral rather than on the outside, the picture is hard to take in.

[1] Van Aubel, H., Note concernant les centres des carrés construits sur les cotés dun polygon quelconque, Nouv. Corresp. Math., 4 (1878), 40-44.

[2] Dario Pellegrinetti. The Six-Point Circle for the Quadrangle. International Journal of Geometry. 4 (2019) No 2, pp. 5–13.

Pythagorean triangles with side 2023

Can a Pythagorean triangle have one size of length 2023? Yes, one possibility is a triangle with sides (2023, 6936, 7225).

right triangle with sides 2023, 6936, and 7225

Where did that come from? And can we be more systematic, listing all Pythagorean triangles with a side of length 2023?

Euclid’s formula generates Pythagorean triples by sticking integers m and n into the formula

(m² – n², 2mn, m² + n²).

If one of these three numbers equals 2023, it’s going to have to be the first one. Why? The second number is even, and 2023 is odd, so that one is ruled out.

It’s less obvious but true that the hypotenuse m² + n² cannot be 2023. We could verify this by brute force, trying integer values of m between 1 and √2023, subtracting m² from 2023, and seeing whether the result is a square. But there is a more elegant approach.

There is a theorem that says an integer N is the sum of two squares exactly when every prime that divides N is either congruent to 1 mod 4 or appears an even number of times in the factorization of N. The prime factorization of N is 7 × 17 × 17. The factor 7 appears to an odd power and 7 = 3 mod 4.

Is 2023 a difference of squares? Yes, every odd number is a difference of squares.

To write an number N as a difference of squares, we have to solve

m² – n² = (m + n)(m – n) = N.

If N is odd, we can always set mn = 1. In the case N = 2023, this means m = 1012 and n = 1011.

The solution above corresponds to factoring 2023 as 2023 × 1. We could also factor 2023 as 289 × 7 or 119 × 17. (The first factor has to be larger than the second for mn to be positive.) To find more possible value of m and n we solve the equations

m + n = 289
mn = 7


m + n = 119
mn = 17.

These have solution (148, 141) and (68, 51).

To summarize, we can find three Pythagorean triples with one component equal to 2023 using Euclid’s formula. These are

(2023, 2046264, 2046265)
(2023, 41736, 41785)
(2023, 6936, 7225)

corresponding to (m, n) = (1013, 1012), (148, 141), and (68, 51) respectively. The first two are primitive Pythagorean triples because the three numbers are relative prime in both cases. The last triple is not primitive because each number is divisible by 289.

Euclid’s formula generates all primitive Pythagorean triples, but it doesn’t generate all possible Pythagorean triples. To find all Pythagorean triples we have to consider solutions of the form

(km² – kn², 2kmn, km² + kn²)

for some integer k. If 2023 is k times a difference of squares, k must be a factor of 2023, and so k = 1, 7, 17, 119, 289, or 2023. We’ve already considered the case k = 1.

For k = 7, we express 289 as a difference of squares. The only solutions with positive integers is (m, n) = (145, 144). This yields the Pythagorean triple

(2023, 292320, 292327).

For k = 17, we express 119 as a difference of squares. The solutions are (m, n) = (60, 59) and (12, 5). These yield

(2023, 120360, 120377)


(2023, 2040, 2873).

For k = 119, we express 17 as a difference of squares. This can only be done one way, (m, n) = (9, 8). This yields

(2023, 17136, 17255).

For k = 289, we express 7 as a difference of squares. The unique solution (4, 3) yields the triple

(2023, 6936, 7225)

that we had found previously.

For k = 2023, we have to write 1 as a difference of squares. This has no solution if we require both squares to be positive.

So we’re done. We’ve found all Pythagorean triples containing 2023.

Related posts

Heat equation and the normal distribution

The density function of a normal distribution with mean 0 and standard deviation √(2kt) satisfies the heat equation. That is, the function

u(x, t) = \frac{1}{2\sqrt{\pi kt}} \exp\left(-\frac{x^2}{4kt}\right)

satisfies the partial differential equation

u_t = k\,u_{xx}

You could verify this by hand, or if you’d like, here’s Mathematica code to do it.

      u[x_, t_] := PDF[NormalDistribution[0, Sqrt[2 k t]], x]
      Simplify[ D[u[x, t], {t, 1}] - k D[u[x, t], {x, 2}] ]

This returns 0 as expected.

Solutions to the heat equation are unique if we specify initial conditions. So if we start with a very concentrated heat source at time t = ε, say

u(x, \varepsilon) = \frac{1}{2\sqrt{\pi k\varepsilon}} \exp\left(-\frac{x^2}{4k\varepsilon}\right)

where ε is a tiny but positive number, then the solution at the top of the post hold for all t > 0 and is unique.

Alternatively, we could say that if our initial condition is u(x, 0) = δ where the right hand side is the Dirac delta function, then the solution at the top of the post holds for all t > 0 and is unique. You could justify this rigorously using distribution theory (in the analytic sense, not the probability sense, though they’re related). Or you could justify it heuristically by letting ε to go zero and thinking of δ as the limit.

It is also the case that a constant satisfies the heat equation. So if we add a constant to our solution, the constant representing ambient temperature, and add our initial condition to that constant, the solution is the constant plus the solution above.

The temperature at the center will decay in proportion to 1/√t. A point x away from the center will initially get warmer, reach a maximum temperature at time t = x²/2k, and then cool off roughly in proportion to 1/√t.

Here’s a plot of the temperature at x = 10 with k = 5.

Related posts