Münchausen numbers

Baron Münchausen

Baron Münchausen

The number 3435 has the following curious property:

3435 = 33 + 44 + 33 + 55.

It is called a Münchausen number, an allusion to fictional Baron Münchausen. When each digit is raised to its own power and summed, you get the original number back. The only other Münchausen number is 1.

At least in base 10. You could look at Münchausen numbers in other bases. If you write out a number n in base b, raise each of its “digits” to its own power, take the sum, and get n back, you have a Münchausen number in base b. For example 28 is a Münchausen number in base 9 because

28ten = 31nine = 33 + 11

Daan van Berkel proved that there are only finitely many Münchausen in any given base. In fact, he shows that a Münchausen number in base b cannot be greater than 2bb, and so you could do a brute-force search to find all the Münchausen numbers in any base.

The upper bound 2bb grows very quickly with b and so brute force becomes impractical for large b. If you wanted to find all the hexadecimal Münchausen numbers you’d have to search 2*1616 = 36,893,488,147,419,103,232 numbers. How could you do this more efficiently?

Less likely to get half, more likely to get near half

I was catching up on Engines of our Ingenuity episodes this evening when the following line jumped out at me:

If I flip a coin a million times, I’m virtually certain to get 50 percent heads and 50 percent tails.

Depending on how you understand that line, it’s either imprecise or false. The more times you flip the coin, the more likely you are to get nearly half heads and half tails, but the less likely you are to get exactly half of each. I assume Dr. Lienhard knows this and that by “50 percent” he meant “nearly half.”

Let’s make the fuzzy statements above more quantitative. Suppose we flip a coin 2n times for some large number n. Then a calculation using Stirling’s approximation shows that the probability of n heads and n tails is approximately

1/√(πn)

which goes to zero as n goes to infinity. If you flip a coin a million times, there’s less than one chance in a thousand that you’d get exactly half heads.

Next, let’s quantify the statement that nearly half the tosses are likely to be heads. The normal approximation to the binomial tells us that for large n, the number of heads out of 2n tosses is approximately distributed like a normal distribution with the same mean and variance, i.e. mean n and variance n/2. The proportion of heads is thus approximately normal with mean 1/2 and variance 1/8n. This means the standard deviation is 1/√(8n). So, for example, about 95% of the time the proportion of heads will be 1/2 plus or minus 2/√(8n). As n goes to infinity, the width of this interval goes to 0. Alternatively, we could pick some fixed interval around 1/2 and show that the probability of the proportion of heads being outside that interval goes to 0.

What is calculus?

When people ask me what calculus is, my usual answer is “the mathematics of change,” studying things that change continually. Algebra is essentially static, studying things frozen in time and space. Calculus studies things that move, shapes that bend, etc. Algebra deals with things that are exact and consequently can be fragile. Calculus deals with approximation and is consequently more robust.

I’m happier with the paragraph above if you replace “calculus” with “analysis.” Analysis certainly seeks to understand and model things that change continually, but calculus per se is the mechanism of analysis.

I used to think it oddly formal for people to say “differential and integral calculus.” Is there any other kind? Well yes, yes there is, though I didn’t know that at the time. A calculus is a system of rules for computing things. Differential and integral calculus is a system of rules for calculating derivatives and integrals. Lately I’ve thought about other calculi more than differential calculus: propositional calculus, lambda calculus, calculus of inductive constructions, etc.

In my first career I taught (differential and integral) calculus and was frustrated with students who would learn how to calculate derivatives but never understood what a derivative was or what it was good for. In some sense though, they got to the core of what a calculus is. It would be better if they knew what they were calculating and how to apply it, but they still learn something valuable by knowing how to carry out the manipulations. A computer science major, for example, who gets through (differential) calculus knowing how to calculate derivatives without knowing what they are is in a good position to understand lambda calculus later.

Duality in spherical trigonometry

This evening I ran across an unexpected reference to spherical trigonometry: Thomas Hales’ lecture on lessons learned from the formal proof of the Kepler conjecture. He mentions at one point a lemma that was awkward to prove in its original form, but that became trivial when he looked at its spherical dual.

The sides of a spherical triangle are formed by great circular arcs through the vertices. Since the sides are portions of a circle, they can be measured as angles. So in spherical trig you have this interesting interplay of two kinds of angles: the angles formed at the intersections of the sides, and the angles describing the sides themselves.

Here’s how you form the dual of a spherical triangle. Suppose the vertices of the angle are AB, and C. Think of the arc connecting A and B as an equator, and let C‘ be the corresponding pole that lies on the same side of the arc as the original triangle ABC. Do the analogous process to find the points A‘ and B‘. The triangle ABC‘ is the dual of the triangle ABC. (This idea goes back to the Persian mathematician Abu Nasr Mansur circa 1000 AD.)

The sides in  ABC‘ are the supplementary angles of the corresponding intersection angles in ABC, and the intersection angles in  ABC‘ are the supplementary angles of the corresponding sides in ABC.

In his paper “Duality in the formulas of spherical trigonometry,” published in American Mathematical Monthly in 1909, W. A. Granville gives the following duality principle:

If the sides of a spherical triangle be denoted by Roman letters abc and the supplements of the corresponding opposite angles by the Greek letters α, β, γ, then from any given formula involving any of these six parts, we may wrote down a dual formula simply by interchanging the corresponding Greek and Roman letters.

Related: Notes on Spherical Trigonometry

Some ways linear algebra is different in infinite dimensions

There’s no notion of continuity in linear algebra per se. It’s not part of the definition of a vector space. But a finite dimensional vector space over the reals is isomorphic to a Euclidean space of the same dimension, and so we usually think of such spaces as Euclidean. (We’ll only going to consider real vector spaces in this post.) And there we have a notion of distance, a norm, and hence a topology and a way to say whether a function is continuous.

Continuity

In finite dimensional Euclidean space, linear functions are continuous. You can put a different norm on a Euclidean space than the one it naturally comes with, but all norms give rise to the same topology and hence the same continuous functions. (This is useful in numerical analysis where you’d like to look at a variety of norms. The norms give different analytical results, but they’re all topologically equivalent.)

In an infinite dimensional normed space, linear functions are not necessarily continuous. If the dimension of a space is only a trillion, all linear functions are continuous, but when you jump from high dimension to infinite dimension, you can have discontinuous linear functions. But if you look at this more carefully, there isn’t a really sudden change.

If a linear function is discontinuous, its finite dimensional approximations are continuous, but the degree of continuity is degrading as dimension increases. For example, suppose a linear function stretches the nth basis vector by a factor of n. The bigger n gets, the more the function stretches in the nth dimension. As long as n is bounded, this is continuous, but in a sense it is less continuous as n increases. The fact that the infinite dimensional version is discontinuous tells you that the finite dimensional versions, while technically continuous, scale poorly with dimension. (See practical continuity for more discussion along these lines.)

Completeness

A Banach space is a complete normed linear space. Finite dimensional normed spaces are always complete (i.e. every sequence in the space converges to a point in the space) but this might not happen in infinite dimensions.

Duals and double duals

In basic linear algebra, the dual of a vector space V is the space of linear functionals on V, i.e. the set of linear maps from V to the reals. This space is denoted V*. If V has dimension nV* has dimension n, and all n-dimensional spaces are isomorphic, so the distinction between a space and its dual seems pedantic. But in general a Banach space and its dual are not isomorphic and so its easier to tell them apart.

The second dual of a vector space, V** is the dual of the dual space. In finite dimensional spaces, V** is naturally isomorphic to V. In Banach spaces, V is isomorphic to a subset of V**. And even when V is isomorphic to V**, it might not be naturally isomorphic to V**.  (Here “natural” means natural in the category theory sense of natural transformations.)

Related

Area of a triangle and its projections

Let S be the area of triangle T in three-dimensional space. Let AB, and C be area of the projections of T to the xyyz, and xz planes respectively. Then

S2A2B2C2.

There’s an elegant proof of this theorem here using differential forms. Below I’ll sketch a less elegant but more elementary proof.

You could prove the identity above by using the fact that the area of a triangle spanned by two vectors is half the length of their cross product. Suppose ab, and c are the locations of the three corners of T. Then

S2 = v2/2,

where

v = (a – b) × (c – b)

and by v2 we mean the dot product of v with itself.

Write out the components of v2 and you get three squared terms. Notice that when you set the x components to zero, i.e. project onto the yz plane, the first of the three terms is unchanged and the other two are zero. In other words, the first of the three terms of v2 is A2. A similar argument shows that the other two terms are B2 and C2.

 

How many ways can you tile a chessboard with dominoes?

Suppose you have an n by m chessboard. How many ways can you cover the chessboard with dominoes?

It turns out there’s a remarkable closed-form solution:

 

\sqrt{\prod_{k=1}^m \prod_{\ell=1}^n \left( 2\cos\left(\frac{\pi k}{m+1} \right) + 2i \cos\left(\frac{\pi \ell}{n+1} \right) \right)}

 

Here are some questions you may have.

But what if n and m are both odd? You can’t tile such a board with dominoes.

Yes, in that case the formula evaluates to zero.

Do you need an absolute value somewhere? Or a floor or ceiling?

No. It looks like the double product could be a general complex number, but it’s real. In fact, it’s always the square of an integer.

Update: Apparently the formula does need an absolute value, not to turn complex values into real values but to turn negative integers into positive ones. See Aaron Meurer’s example below.

Does it work numerically?

Apparently so. If you evaluated the product in a package that could symbolically manipulate the cosines, the result would be exact. In floating point it cannot be, but at least in my experiments the result is correct when rounded to the nearest integer. For example, there are 12,988,816 ways to tile a standard 8 by 8 chessboard with dominoes, and the following python script returns 12988816.0. For sufficiently large arguments the result will not always round to the correct answer, but for moderate-sized arguments it should.

        
        from numpy import pi, cos, sqrt
        
        def num_tilings(m, n):
            prod = 1
            for k in range(1, m+1):
                for l in range(1, n+1):
                    prod *= 2*cos(pi*k/(m+1)) + 2j*cos(pi*l/(n+1))
            return sqrt(abs(prod))
        
        print(num_tilings(8,8))

The code looks wrong. Shouldn’t the ranges go up to m and n?

No, Python ranges are half-open intervals. range(a, b) goes from a to b-1. That looks unnecessarily complicated, but it makes some things easier.

You said that there was no need for absolute values, but you code has one.

Yes, because while in theory the imaginary part will be exactly zero, in floating point arithmetic the imaginary part might be small but not zero.

Where did you find this formula?

Thirty-three Miniatures: Mathematical and Algorithmic Applications of Linear Algebra

Tensors 5: Scalars

There are two uses of the word scalar, one from linear algebra and another from tensor calculus.

In linear algebra, vector spaces have a field of scalars. This is where the coefficients in linear combinations come from. For real vector spaces, the scalars are real numbers. For complex vector spaces, the scalars are complex numbers. For vector spaces over any field K, the elements of K are called scalars.

But there is a more restrictive use of scalar in tensor calculus. There a scalar is not just a number, but a number whose value does not depend on one’s choice of coordinates. For example, the temperature at some location is a scalar, but the first coordinate of a location depends on your choice of coordinate system. Temperature is a scalar, but x-coordinate is not. Scalars are numbers, but not all numbers are scalars.

The linear algebraic use of scalar is more common among mathematicians, the coordinate-invariant use among physicists. The two uses of scalar is a special case of the two uses of tensor described in the previous post. Linear algebra thinks of tensors simply as things that take in vectors and return numbers. The physics/tensor analysis view of tensors includes behavior under changes of coordinates. You can think of a scalar as a oth order tensor, one that behaves as simply as possible under a change of coordinates, i.e. doesn’t change at all.

Tensors 4: Behavior under change of coordinates

In the first post in this series I mentioned several apparently unrelated things that are all called tensors, one of these being objects that behave a certain way under changes of coordinates. That’s what we’ll look at this time.

In the second post we said that a tensor is a multilinear functional. A k-tensor takes k vectors and returns a number, and it is linear in each argument if you hold the rest constant. We mentioned that this relates to the “box of numbers” idea of a tensor. You can describe how a k-tensor acts by writing out k nested sums. The terms in these sums are called the components of the tensor.

Tensors are usually defined in a way that has more structure. They vary from point to point in a space, and they do so in a way that in some sense is independent of the coordinates used to label these points. At each point you have a tensor in the sense of a multilinear functional, but the emphasis is usually on the changes of coordinates.

Components, indexes, and coordinates

Tensors in the sense that we’re talking about here come in two flavors: covariant and contravariant. They also come in mixtures; more on that later.

We consider two coordinate systems, one denoted by x‘s and another by x‘s with bars on top. The components of a tensor in the x-bar coordinate system will also have bars on top. For a covariant tensor of order one, the components satisfy

\bar{T}_i =T_r \frac{\partial x^r}{\partial \bar{x}^i}

First of all, coordinates are written with superscripts. So xr is the r coordinate, not x to the power r. Also, this uses Einstein summation notation: there is an implicit sum over repeated indexes, in this case of r.

The components of a contravariant tensor of order one satisfy similar but different equation:

\bar{T}^i =T^r \frac{\partial \bar{x}^i}{\partial x^r}

The components of a covariant tensor are written with subscripts, and the components of a contravariant tensor with superscripts. In the equation for covariant components, the partial derivatives are with respect to the new coordinates, the x bars. In the equation for contravariant components, the partial derivatives are with respect to the original coordinates, the x‘s. Mnemonic: when the indexes go down (covariant tensors) the new coordinates go down (in the partial derivatives). When the indexes go up, the new coordinates go up.

For covariant tensors of order two, the change of coordinate formula is

\bar{T}_{ij} = T_{rs} \frac{\partial x^r}{\partial\bar{x}^i} \frac{\partial x^s}{\partial \bar{x}^j}

Here there the summation convention says that there are two implicit sums, one over r and one over s.

The contravariant counter part says

 \bar{T}^{ij} = T^{rs} \frac{\partial\bar{x}^i}{\partial x^r} \frac{\partial\bar{x}^j}{\partial x^s}

In general you could have tensors that are a mixture of covariant and contravariant. A tensor with covariant order p and contravariant order q has p subscripts and q superscripts. The partial derivatives have x-bars on bottom corresponding to the covariant components and x-bars on top corresponding to contravariant components.

Relation to multilinear functionals

We initially said a tensor was a multilinear functional. A tensor of order k takes k vectors and returns a number. Now we’d like to refine that definition to take two kinds of vectors. A tensor with covariant order p and contravariant order q takes p contravariant vectors and q covariant vectors. In linear algebra terms, in stead of simply taking k elements of a vector space V, we say our tensor takes p vectors from the dual space V* and q vectors from V.

Relation to category theory

You may be familiar with the terms covariant and contravariant from category theory, or its application to object oriented programming. The terms are related. As Michael Spivak explains, “It’s very easy to remember which kind of vector field is covariant, and which is contravariant — it’s just the opposite of what it logically ought to be [from category theory].”

Tensors 3: Tensor products

In the previous post, we defined the tensor product of two tensors, but you’ll often see tensor products of spaces. How are these tensor products defined?

Tensor product splines

For example, you may have seen tensor product splines. Suppose you have a function over a rectangle that you’d like to approximate by patching together polynomials so that the interpolation has the specified values at grid points, and the patches fit together smoothly. In one dimension, you do this by constructing splines. Then you can bootstrap your way from one dimension to two dimensions by using tensor product splines. A tensor product spline in x and y is a sum of terms consisting of a spline in x and a spline in y. Notice that a tensor product spline is not simply a product of two ordinary splines, but a sum of such products.

If X is the vector space of all splines in the x-direction and Y the space of all splines in the y-direction, the space of tensor product splines is the tensor product of the spaces X and Y. Suppose a set si, for i running from 1 to n, is a basis for X. Similarly, suppose tj, for j running from 1 to m, is a basis for Y. Then the products si tj form a basis for the tensor product X and Y, the tensor product splines over the rectangle. Notice that if X has dimension n and Y has dimension m then their tensor product has dimension nm.  Notice that if we only allowed products of splines, not sums of products of splines, we’d get a much smaller space, one of dimension n+m.

Tensor products of vector spaces

We can use the same process to define the tensor product of any two vector spaces. A basis for the tensor product is all products of basis elements in one space and basis elements in the other. There’s a more general definition of tensor products that doesn’t involve bases sketched below.

Tensor products of modules

You can also define tensor products of modules, a generalization of vector spaces. You could think of a module as a vector space where the scalars come from a ring ideas of a field. Since rings are more general than fields, modules are more general than vector spaces.

The tensor product of two modules over a commutative ring is defined by taking the Cartesian product and moding out by the necessary relations to make things bilinear. (This description is very hand-wavy. A detailed presentation needs its own blog post or two.)

Tensor products of modules hold some surprises. For example, let m and n be two relatively prime integers. You can think of the integers mod m or n as a module over the integers. The tensor product of these modules is zero because you end up moding out by everything. This kind of collapse doesn’t happen over vector spaces.

Past and future

The first two posts in this series:

I plan to leave the algebraic perspective aside for a while, though as I mentioned above there’s more to come back to.

Next I plan to write about the analytic/geometric view of tensors. Here we get into things like changes of coordinates and it looks at first as if a tensor is something completely different.

Update: Tensors 4: Behavior under changes of coordinates

Tensors 2: Multilinear operators

The simplest definition of a tensor is that it is a multilinear functional, i.e. a function that takes several vectors, returns a number, and is linear in each argument. Tensors over real vector spaces return real numbers, tensors over complex vector spaces return complex numbers, and you could work over other fields if you’d like.

A dot product is an example of a tensor. It takes two vectors and returns a number. And it’s linear in each argument. Suppose you have vectors uv, and w, and a real number a. Then the dot product (u + vw) equals (uw) + (vw) and (auw) = a(uw).  This shows that dot product is linear in its first argument, and you can show similarly that it is linear in the second argument.

Determinants are also tensors. You can think of the determinant of an n by n matrix as a function of its n rows (or columns). This function is linear in each argument, so it is a tensor.

The introduction to this series mentioned the interpretation of tensors as a box of numbers: a matrix, a cube, etc. This is consistent with our definition because you can write a multilinear functional as a sum. For every vector that a tensor takes in, there is an index to sum over. A tensor taking n vectors as arguments can be written as n nested summations. You could think of the coefficients of this sum being spread out in space, each index corresponding to a dimension.

Tensor products are simple in this context as well. If you have a tensor S that takes m vectors at a time, and another tensor T that takes n vectors at a time, you can create a tensor that takes mn vectors by sending the first m of them to S, the rest to T, and multiply the results. That’s the tensor product of S and T.

The discussion above makes tensors and tensor products still leaves a lot of questions unanswered. We haven’t considered the most general definition of tensor or tensor product. And we haven’t said anything about how tensors arise in application, what they have to do with geometry or changes of coordinate. I plan to address these issues in future posts. I also plan to write about other things in between posts on tensors.

Next post in series: Tensor products

 

 

Tensors 1: What is a tensor?

Riemann tensor $R^\alpha_{\beta\gamma\delta}$

The word “tensor” is shrouded in mystery. The same term is applied to many different things that don’t appear to have much in common with each other.

You might have heared that a tensor is a box of numbers. Just as a matrix is a rectangular collection of numbers, a tensor could be a cube of numbers or even some higher-dimensional set of numbers.

You might also have heard that a tensor is something that has upper and lower indices, such as the Riemann tensor above, things that have arcane manipulation rules such as “Einstein summation.”

Or you  might have heard that a tensor is something that changes coordinates like a tensor. A tensor is as a tensor does. Something that behaves the right way under certain changes of variables is a tensor.

Tensor product $A \otimes B$

And then there’s things that aren’t called tensors, but they have tensor products. These seem simple enough in some cases—you think “I didn’t realize that has a name. So it’s called a tensor product. Good to know.” But then in other cases tensor products seem more elusive. If you look in an advanced algebra book hoping for a simple definition of a tensor product, you might be disappointed and feel like the book is being evasive or even poetic because it describes what a tensor product does rather than what it is. That is, the definition is behavioral rather than constructive.

What do all these different uses of the word “tensor” have to do with each other? Do they have anything to do with the TensorFlow machine learning library that Google released recently? That’s something I’d like to explore over a series of blog posts.

Next posts in the series:

From triangles to the heat equation

“Mathematics compares the most diverse phenomena and discovers the secret analogies that unite them.” — Joseph Fourier

The above quote makes me think of a connection Fourier made between triangles and thermodynamics.

Trigonometric functions were first studied because they relate angles in a right triangle to ratios of the lengths of the triangle’s sides. For the most basic applications of trigonometry, it only makes sense to consider positive angles smaller than a right angle. Then somewhere along the way someone discovered that it’s convenient to define trig functions for any angle.

Once you define trig functions for any angle, you begin to think of these functions as being associated with circles rather than triangles. More advanced math books refer to trig functions as circular functions. The triangles fade into the background. They’re still there, but they’re drawn inside a circle. (Hyperbolic functions are associated with hyperbolas the same way circular functions are associated with circles.)

Now we have functions that historically arose from studying triangles, but they’re defined on the whole real line. And we ask the kinds of questions about them that we ask about other functions. How fast do they change from point to point? How fast does their rate of change change? And here we find something remarkable. The rate of change of a sine function is proportional to a cosine function and vice versa. And if we look at the rate of change of the rate of change (the second derivative or acceleration), sine functions yield more sine functions and cosine functions yield more cosine functions. In more sophisticated language, sines and cosines are eigenfunctions of the second derivative operator.

Here’s where thermodynamics comes in. You can use basic physics to derive an equation for describing how heat in some object varies over time and location. This equation is called, surprisingly enough, the heat equation. It relates second derivatives of heat in space with first derivatives in time.

Fourier noticed that the heat equation would be easy to solve if only he could work with functions that behave very nicely with regard to second derivatives, i.e. sines and cosines! If only everything were sines and cosines. For example, the temperature in a thin rod over time is easy to determine if the initial temperature distribution is given by a sine wave. Interesting, but not practical.

However, the initial distribution doesn’t have to be a sine, or a cosine. We can still solve the heat equation if the initial distribution is a sum of sines. And if the initial distribution is approximately a sum of sines and cosines, then we can compute an approximate solution to the heat equation. So what functions are approximately a sum of sines and cosines? All of them!

Well, not quite all functions. But lots of functions. More functions than people originally thought. Pinning down exactly what functions can be approximated arbitrarily well by sums of sines and cosines (i.e. which functions have convergent Fourier series) was a major focus of 19th century mathematics.

So if someone asks what use they’ll ever have for trig identities, tell them they’re important if you want to solve the heat equation. That’s where I first used some of these trig identities often enough to remember them, and that’s a fairly common experience for people in math and engineering. Solving the heat equation reviews everything you learn in trigonometry, even though there are not necessarily any triangles or circles in sight.

Contradictory news regarding ABC conjecture

“Research is what I’m doing when I don’t know what I’m doing.” — Wernher von Braun

I find Shinichi Mochizuki’s proof of the abc conjecture fascinating. Not the content of the proof—which I do not understand in the least—but the circumstances of the proof. Most mathematics, no matter how arcane it appears to outsiders, is not that original. Experts in a specific area can judge just how much or how little a new paper adds to their body of knowledge, and usually it’s not much. But Mochizuki’s work is such a departure from the mainstream that experts have been trying to understand it for four years now.

Five days ago, Nature published an article headlined Monumental Proof to Torment Mathematicians for Years to Come.

… Kedlaya says that the more he delves into the proof, the longer he thinks it will take to reach a consensus on whether it is correct. He used to think that the issue would be resolved perhaps by 2017. “Now I’m thinking at least three years from now.”

Others are even less optimistic. “The constructions are generally clear, and many of the arguments could be followed to some extent, but the overarching strategy remains totally elusive for me,” says mathematician Vesselin Dimitrov of Yale University in New Haven, Connecticut. “Add to this the heavy, unprecedentedly indigestible notation: these papers are unlike anything that has ever appeared in the mathematical literature.”

But today, New Scientist has an article with the headline Mathematicians finally starting to understand epic ABC proof. According to this article,

At least 10 people now understand the theory in detail, says Fesenko, and the IUT papers have almost passed peer review so should be officially published in a journal in the next year or so.

It’s interesting that the proof is so poorly understood that experts disagree on just how well it is understood.

Related posts:

Practical continuity

I had a discussion recently about whether things are really continuous in the real world. Strictly speaking, maybe not, but practically yes. The same is true of all mathematical properties. There are no circles in the real world, not in the Platonic sense of a mathematical circle. But a circle is a very useful abstraction, and plenty of things are circles for practical purposes. In this post I’ll explain the typical definition of continuity and a couple of modifications for application.

A function f is continuous if nearby points go to nearby points. A discontinuity occurs when some points are close together but their images are far apart, such as when you have a threshold effect. For example, suppose you pay $3 to ship packages that weigh less than a pound and $5 to ship packages that weight a pound or more. Two packages can weigh almost the same, one a little less than a pound and one a little more, but not cost almost the same to ship. The difference in their shipping cost is $2, no matter how close together they are, as long as their on opposite sides of the one-pound threshold.

A practical notion of continuity has some idea of resolution. Suppose in our example that packages below one pound shipped for $3.00 and packages that weigh a pound or more ship for $3.05. You might say “I don’t care about differences of a nickle.” And so at that resolution, the shipping costs are continuous.

The key to understanding continuity is being precise about the two uses of “nearby” in the statement that a continuous function sends nearby points to nearby points. What do you mean by nearby? How close is close enough? In the pure mathematical definition of continuity, the answer is “as close as you want.” You specify any tolerance you like, no matter how small, and call it ε. If for any ε someone picks, it’s always possible to specify a small enough neighborhood around x that those points are mapped within ε of f(x), then f is continuous at x.

For applications, we modify this definition by putting a lower bound on ε. A function is continuous at x, for the purposes of a particular application, if for every ε larger than the resolution of the problem, you can find a neighborhood of x so small that all the points in that neighborhood are mapped within ε of f(x). In our shipping example, if you only care about differences in rates larger than $1, then if the rates change by $0.05 at the one-pound threshold, the rates are continuous as far as you’re concerned. But if the rates jump by $2 at one pound, then the rates are discontinuous for your purposes.

When you see a smooth curve on a high-resolution screen, it’s continuous as far as your vision is concerned. Nearby points on the curve go to points that are nearby as far as the resolution of your vision is concerned, even though strictly speaking the curve could have jump discontinuities at every pixel.

If you take a function that is continuous in the pure mathematical sense, then any multiple of that function is also continuous. If you make the function 10x larger, you just need to get closer to each point x to find a neighborhood so that all points get within ε of f(x). But in practical application, a multiple of a continuous function might not be continuous. If your resolution on shipping rates is $1, and the difference in cost between shipping a 15 ounce and a 17 ounce package is $0.05, then it’s continuous for you. But if the rates were suddenly 100x greater, now you care, because now the difference in cost is $5.

With the example of the curve on a monitor, if you were to zoom in on the image, at some point you’d see the individual pixels and so the image would no longer be continuous as far as your vision is concerned.

We’ve been focusing on what nearness means for the output, ε. Now let’s focus on nearness for the input. Introducing a restriction on ε let us say some functions are continuous for a particular application that are not continuous in the pure mathematical sense. We can also introduce a restriction on the resolution on the input, call it δ, so that the opposite is true: some functions are continuous in the pure mathematical sense that are not continuous for a particular application.

The pure mathematical definition of continuity of f at x is that for every ε > 0, there exists a δ > 0 such that if |x – y| < δ, then |f(x) – f(y)| < ε. But how small does δ have to be? Maybe too small for application. Maybe points would have to be closer together than they can actually be in practice. If a function changes rapidly, but smoothly, then it’s continuous in the pure mathematical sense, but it may be discontinuous for practical purposes. The more rapidly the function changes, the smaller δ has to be for points within δ of x to end up within ε of f(x).

So an applied definition of continuity would look something like this.

A function f is continuous at x, for the purposes of a particular application, if for every ε > the resolution of the problem, there exists a δ > the lower limit for that application, such that if |x – y| < δ, then |f(x) – f(y)| < ε.