Distribution of run times for Euclidean algorithm

The worst case run time for the Euclidean algorithm occurs when you give the algorithm a pair of consecutive Fibonacci numbers. The algorithm takes n steps to compute the greatest common divisor of Fn+1 and Fn+2.

The nth Fibonacci number is the nearest integer to φn/√5 where φ = (1 + √5)/2 is the golden ratio. You can take logs and solve for the largest n such that Fn is less than x, subtract 2, and have an upper bound on the number of steps necessary to find the gcd of two numbers less than x.

But what about the average run time, or the distribution of run times?

For large x, the distribution of number of steps needed to calculate the gcd of two numbers 0 < abx using the Euclidean algorithm is approximately normal with mean

12 log(2) log(b) / π².

See, for example, [1].

Let’s illustrate this with a simulation. Here’s Python code to return the number of steps used in computing gcd(ab).

    def euclid(a, b):
        steps = 0
        while a != 0:
            a, b = b%a, a
            steps += 1
        return steps # gcd = b

I generated 10,000 pairs of random integers less than 263 (because the maximum signed 64 bit integer is 263 − 1) and plotted a histogram of the run times.

I got a nice bell curve with mean 36.3736. The theoretical mean is 0.8428 log(263) = 36.8021.

[1] Doug Hensley. The Number of Steps in the Euclidean Algorithm. Journal of Number Theory 49. 142–182 (1994).

Adjunctions

The previous post looked at adjoints in the context of linear algebra. This post will look at adjoints in the context of category theory.

The adjoint of a linear operator T between inner product spaces V and W is the linear operator T* such that for all v in V and all w in W,

Tv, w ⟩W = ⟨ v, T*w ⟩V.

The subscripts are important because the left side is an inner product in W and the right side is an inner product in V. Think of the inner products as pairings. We pair together elements from two vector spaces. An adjunction in category theory will pair together objects in two categories.

Now suppose we have two categories, C and D. An adjunction between C and D is a pair of functors (FG) with

FDC

and

GCD

such that

HomC(Fd, c) ≅ HomD(d, Gc)

for every object d in D and c in C. Here “Hom” means the set of morphisms from one object to another. The subscript reminds us what category the morphisms are in. The left hand side is a set of morphisms in C and the right hand side is a set of morphisms in D.

Note the typographical similarity between

Tv, w ⟩W = ⟨ v, T*w ⟩V

and

HomC(Fd, c) ≅ HomD(d, Gc).

The functors F and G are analogous to the operators T and T*. Sets of morphisms in different categories are analogous to inner products in different spaces.

Fine print

There’s a lot more going on in the definition of adjoint than is explicit above. In particular, the symbol “≅” is doing a lot of heavy lifting. It’s not saying that two sets are equal. They can’t be equal because they’re sets of different things, sets of morphisms in different categories. It means that there is a bijection between the two sets, and the bijection is “natural in d and c.” This in turn means that for every fixed d in D, there is a natural isomorphism between HomC(Fd, -) and HomD(d, G-), and for every fixed c in C there is a natural isomorphism between HomC(F-, c) ≅ HomD(-, Gc). More on this here.

Direction

Note that there is a direction to adjunctions. In the case of linear operators T* is the adjoint of T; it is not necessarily the case that T is the adjoint of T*. In Hilbert spaces, T** = T, but in Banach spaces the two operators might be different.

Similarly, F is the left adjoint of G, and G is the right adjoint of F. This is written FG, which implies that direction matters. Adjunctions are from one category to another. It’s possible to have FG ⊣ F, but that’s unusual.

Note also that in the case of an adjunction from a category C to a category D, the left adjoint goes from D to C, and the right adjoint goes from C to D. This is the opposite of what you might expect, and many online sources get it wrong.

Hand-wavy examples

Left and right adjoints are inverses in some abstract sense. Very often in an adjunction pair (FG) the functor F adds structure and the function G takes it away. F is “free” in some sense, and G is “forgetful” in some corresponding sense.

For example, let C be the category of topological spaces and continuous functions, and D the space of sets and functions. Let F be the functor that takes sets and gives them the discrete topology, making all functions continuous functions. And let G be the functor that takes a topological space and “forgets” its topology, regarding the space simply as a set and regarding its continuous functions simply as functions. Then FG.

Not every adjunction pair follows the free/forgetful pattern, but most examples you’ll find in references do. As noted above, sometimes FG ⊣ F, which could not happen if you always lose structure as you move down the chain of adjunctions.

Other metaphors for left and right adjoints are that left adjoints approximate something from “below” (i.e. with less structure) and right adjoints approximate from “above” (i.e. with more structure). Some say left adjoints are “optimistic” and right adjoints are “pessimistic.” (This reminds me of last weekend’s post about kernels and cokernels: kernels are degrees of freedom and cokernels are constraints.)

Related posts

Transpose and Adjoint

The transpose of a matrix turns the matrix sideways. Suppose A is an m × n matrix with real number entries. Then the transpose A is an n × m matrix, and the (ij) element of A is the (ji) element of A. Very concrete.

The adjoint of a linear operator is a more abstract concept, though it’s closely related. The matrix A is sometimes called the adjoint of A. That may be fine, or it may cause confusion. This post will define the adjoint in a more general context, then come back to the context of matrices.

This post, and the next will be more abstract than usual. After indulging in a little pure math, I’ll return soon to more tangible topics such as Morse code and barbershop quartets.

Dual spaces

Before we can define adjoints, we need to define dual spaces.

Let V be a vector space over a field F. You can think of F as ℝ or ℂ. Then V* is the dual space of V, the space of linear functionals on V, i.e. the vector space of functions from V to F.

The distinction between a vector space and its dual seems artificial when the vector space is ℝn. The dual space of ℝn is isomorphic to ℝn, and so the distinction between them can seem pedantic. It’s easier to appreciate the distinction between V and V* when the two spaces are not isomorphic.

For example, let V be L3(ℝ), the space of functions f such that |f|3 has a finite Lebesgue integral. Then the dual space is L3/2(ℝ). The difference between these spaces is not simply a matter of designation. There are functions f such that the integral of |f|3 is finite but the integral of |f|3/2 is not, and vice versa.

Adjoints

The adjoint of a linear operator TVW is a linear operator T*: W* → V* where V* and W* are the dual spaces of V and W respectively. So T* takes a linear function from W to the field F, and returns a function from V to F. How does T* do this?

Given an element w* of W*, T*w* takes a vector v in V and maps it to F by

(T*w*)(v) = w*(Tv).

In other words, T* takes a functional w* on W and turns it into a function on V by mapping elements of V over to W and then letting w* act on them.

Note what this definition does not contain. There is no mention of inner products or bases or matrices.

The definition is valid over vector spaces that might not have an inner product or a basis. And this is not just a matter of perspective. It’s not as if our space has a inner product but we choose to ignore it; we might be working over spaces where it is not possible to define an inner product or a basis, such as ℓ, the space of bounded sequences.

Since a matrix represents a linear operator with respect to some basis, you can’t speak of a matrix representation of an operator on a space with no basis.

Bracket notation

For a vector space V over a field F, denote a function ⟨ ·, · ⟩ that takes an element from V and an element from V* and returns an element of F by applying the latter to the former. That is, ⟨ v, v* ⟩ is defined to be the action of v* on v. This is not an inner product, but the notation is meant to suggest a connection to inner products.

With this notation, we have

Tv, w* ⟩W = ⟨ v, T*w* ⟩V

for all v in V and for all in W by definition. This is the definition of T* in different notation. The subscripts on the brackets are meant to remind us that the left side of the equation is an element of F obtained by applying an element of W* to an element of W, while the right side is an element of F obtained by applying an element of V* to an element of V.

Inner products

The development of adjoint above emphasized that there is not necessarily an inner product in sight. But if there are inner products on V and W, then we can define turn an element of v into an element of V* by associating v with ⟨ ·, v ⟩ where now the brackets do denote an inner product.

Now we can write the definition of adjoint as

Tv, w ⟩W = ⟨ v, T*w V

for all v in V and for all in W. This definition is legitimate, but it’s not natural in the technical sense that it depends on our choices of inner products and not just on the operator T and the spaces V and W. If we chose different inner products on V and W then the definition of T* changes as well.

Back to matrices

We have defined the adjoint of a linear operator in a very general setting where there may be no matrices in sight. But now let’s look at the case of TVW where V and W are finite dimensional vector spaces, either over ℝ or ℂ. (The difference between ℝ and ℂ will matter.) And lets definite inner products on V and W. This is always possible because they are finite dimensional.

How does a matrix representation of T* correspond to a matrix representation of T?

Real vector spaces

Suppose V and W are real vector spaces and A is a matrix representation of TVW with respect to some choice of basis on each space. Suppose also that the bases for V* and W* are given by the duals of the bases for V and W. Then the matrix representation of T* is the transpose of A. You can verify this by showing that

Av, w ⟩W = ⟨ v, Aw V

for all v in V and for all in W.

The adjoint of A is simply the transpose of A, subject to our choice of bases and inner products.

Complex vector spaces

Now consider the case where V and W are vector spaces over the complex numbers. Everything works as above, with one wrinkle. If A is the representation of TVW with respect to a given basis, and we choose bases for V* and W* as above, then the conjugate of A is the matrix representation of T*. The adjoint of A is A*, the conjugate transpose of A. As before, you can verify this by showing that

Av, w ⟩W = ⟨ v, A*w V

We have to take the conjugate of A because the inner product in the complex case requires taking a conjugate of one side.

Related posts

Overtones and Barbershop Quartets

I’ve heard that barbershop quartets often sing the 7th in a dominant 7th a little flat in order to bring the note closer in tune with the overtone series. This post will quantify that assertion.

The overtones of a frequency f are 2f, 3f, 4f, 5f, etc. The overtone series is a Fourier series.

Here’s a rendering of the C below middle C and its overtones.

\score { \new Staff { \clef treble <c c' g' c'' e'' bes''>1 } \layout {} \midi {} }

We perceive sound on a logarithmic scale. So although the overtone frequencies are evenly spaced, they sound like they’re getting closer together.

Overtones and equal temperament

Let’s look at the notes in the chord above and compare the frequencies between the overtone series and equal temperament tuning.

Let f be the frequency of the lowest note. The top four notes in this overtone series have frequencies 4f, 5f, 6f, and 7f. They form a C7 chord [1].

In equal temperament, these four notes have frequencies 224/12 f, 228/12 f, 231/12 f, and 234/12 f. This works out to 4, 5.0397, 5.9932, and 7.1272 times the fundamental frequency f

The the highest note, the B♭, is the furthest from its overtone counterpart. The frequency is higher than that of the corresponding overtone, so you’d need to perform it a little flatter to bring it in line with the overtone series. This is consistent with the claim at the top of the post.

Differences in cents

How far apart are 7f and 7.1272f in terms of cents, 100ths of a semitone?

The difference between two frequencies, f1 and f2, measured in cents is

1200 log2(f1 / f2).

To verify this, note that this says an octave equals 1200 cents, because log2 2 = 1.

So the difference between the B♭ in equal temperament and in the 7th note of the overtone series is 31 cents.

The difference between the E and the 5th overtone is 14 cents, and the difference between the G and the 6th overtone is only 2 cents.

More music posts

[1] The dominant 7th chord gets its name from two thing. First, it’s called “dominant” because it’s often found on the dominant (i.e. V) chord of a scale, and it’s made from the 1st, 3rd, 5th, and 7th notes of the scale. It’s a coincidence in the example above that the 7th of the chord is also the 7th overtone. These are two different uses of the number 7 that happen to coincide.

Equipentatonic scale

I ran across a video that played around with the equipentatonic scale [1]. Instead of dividing the octave into 12 equal parts, as is most common in Western music, you divide the octave into 5 equal parts. Each note in the equipentatonic scale has a frequency 21/5 times its predecessor.

The equipentatonic scale is used in several non-Western music systems. For example, the Javanese slendro tuning system is equipentatonic, as is Babanda music from Uganda.

In the key of C, the nearest approximants of the notes in the equipentatonic scale are C, D, F, G, A#. Approximate equipentatonic scale

In the image above [2], the D is denoted as half sharp, 50 cents higher than D. (A cent is 1/100 of a half step.) The actual pitch is a D plus 40 cents, so the half sharp is closer, but still not exactly right.

The F should be 20 cents lower, the G should be 20 cents higher, and the A# should be 10 cents higher.

Notation

The conventional chromatic scale is denoted 12-TET, an abbreviation for 12 tone equal temperament. In general n-TET denotes the scale that results from dividing the octave into n parts. The previous discussion was looking at how 5-TET aligns with 12-TET.

A step in 5-TET corresponds to 2.4 steps in 12-TET. This is approximately 5 steps in 24-TET, the scale we’d get by adding quarter tones between all the notes in the chromatic scale.

Math

When we talk of dividing the octave evenly into n parts, we mean evenly on a logarithmic scale because we perceive music on a log scale.

The notation will be a little cleaner if we start counting from 0. Then the kth note the n-TET scale is proportional to 2k/n.

The proportionality constant is the starting pitch. So if we start on middle C, the frequencies are 262 × 2k/n Hz. The nth frequency is twice the starting frequency, i.e. an octave higher.

We can measure how well an m-TET scale can be approximated by notes from an n-TET scale with the following function:

d(m, n) = \max_j \min_k \left|\frac{j}{m} - \frac{k}{n} \right|

Note that this function is asymmetric: d(mn) does not necessarily equal d(nm). For example, d(12, 24) = 0 because a quarter tone scale contains exact matches for every note in a semitone scale. But d(24, 12) = 1/24 because the quarter tone scale contains notes in between the notes of the semitone scale.

The equipentatonic scale lines up better with the standard chromatic scale than would a 7-note scale or an 11-note scale. That is, d(5, 12) is smaller than d(7, 12) or d(11, 12). Something similar holds if we replace 12 with 24: d(5, 24) is smaller than d(m, 24) for any m > 1 that is relatively prime to 24.

Related posts

[1] The video first presents 10-TET and then defines 5-TET as taking every other note from this scale.

[2] The image was created with the following Lilypond code.

\score {
  \new Staff {
    \relative c' {
      c1 dih f g aih c \bar "|."
    }
  }
  \layout {
    \context {
      \Staff
      \remove "Bar_engraver"
      \remove "Time_signature_engraver"
    }
  }
}

The multiple coupon collector problem

I’ve written about the Coupon Collector Problem and variations several times, most recently here.

Brian Beckman left a comment linking to an article he wrote, which in turn links to a paper on the Double Dixie Cup Problem [1].

The idea behind the Coupon Collector Problem is to estimate how long it will take to obtain at least one of each of n coupons when drawing coupons at random with replacement from a set of n.

The Double Dixie Cup Problem replaces coupons with dixie cups, and it asks how many couples you’d expect to need to collect until you have two of each cup.

The authors in [1] answer the more general question of how long to collect m of each cup, so m = 1 reduces to the Coupon Collector Problem. The punchline is Theorem 2 from the paper: the expected number of cups (or coupons) is

n \left( \log n + (m-1) \log \log n + C_m + o(1) \right)

for fixed m as n → ∞.

Here Cm is a constant that depends on m (but not on n) and o(1) is a term that goes to zero as n → ∞.

Since log log n is much smaller than log n when n is large, this says that collecting multiples of each coupon doesn’t take much longer, on average, than collecting one of each coupon. It takes substantially less time than playing the original coupon collector game m times.

Related posts

[1] Donald J. Newman and Lawrence Shepp. The Double Dixie Cup Problem. The American Mathematical Monthly, Vol. 67, No. 1 (Jan., 1960), pp. 58–61.

Morse code and the limits of human perception

Musician Adam Neely made a video asking What is the fastest music humanly possible? He emphasizes that he means the fastest music possible to hear, not the fastest to perform.

Screenshot of Bolton paper

The video cites a psychology article [1] from 1894 that found that most people can reliably distinguish an inter-onset interval (time between notes) of 100 ms [2]. It also gives examples of music faster than this, such as a performance of Vivaldi with an inter-onset interval of 83 ms [3]. The limit seem to be greater than 50 ms because a pulse train with an inter-onset interval of 50 ms starts to sound like a 20 Hz pitch.

People are able to receive Morse code faster than this implies is possible. We will explain how this works, but first we need to convert words per minute to inter-onset interval length.

Morse code timing

Morse code is made of dots and dashes, but it is also made of spaces of three different lengths: the space between the dots and dashes representing a single letter, the space between letters, and the space between words.

According to an International Telecommunication Union standard

  • A dash is equal to three dots.
  • The space between the signals forming the same letter is equal to one dot.
  • The space between two letters is equal to three dots.
  • The space between two words is equal to seven dots.

The same timing is referred to as standard in a US Army manual from 1957,

Notice that all the numbers above are odd. Since a dot or dash is always followed by a space, the duration of a dot or dash and its trailing space is always an even multiple of the duration of a dot.

If we think of a dot as a sixteenth note, Morse code is made of notes that are either sixteenth notes or three sixteenth notes tied together. Rests are equal to one, three, or seven sixteenths, and notes and rests must alternate. All notes start on an eighth note boundary, i.e. either on a down beat or an up beat but not in between.

Words per minute

Morse code speed is measured in words per minute. But what exactly is a “word”? Words have a variable number of letters, and even words with the same number of letters can have very different durations in Morse code.

The most common standard is to use PARIS as the prototypical word. Ten words per minute, for example, means that dots and dashes are coming at you as fast as if someone were sending the word PARIS ten times per minute. Here’s a visualization of the code for PARIS with each square representing the duration of a dot.

■□■■■□■■■□■□□□■□■■■□□□■□■■■□■□□□■□■□□□■□■□■□□□□□□□□

This has the duration of 50 dots.

How does this relate to inter-onset interval? If each duration of a dot is an interval, then n words per minute corresponds to 50n intervals per minute, or 60/50n = 1.2/n seconds per interval.

This would imply that 12 wpm would correspond to an inter-onset interval of around 100 ms, pushing the limit of perception. But 12 wpm is a beginner speed. It’s not uncommon for people to receive Morse code at 30 wpm. The world record, set by Theodore Roosevelt McElroy in 1939, is 75.2 wpm.

What’s going on?

In the preceding section I assumed a dot is an interval when calculating inter-onset interval length. In musical terms, this is saying a sixteenth note is an interval. But maybe we should count eighth notes as intervals. As noted before, every “note” (dot or dash) starts on a down beat or up beat. Still, that would say 20 wpm is pushing the limit of perception, and we know people can listen faster than that.

You don’t have to hear with the precision of the diagram above in order to recognize letters. And you have context clues. Maybe you can’t hear the difference between “E E E” and “O”, but in ordinary prose the latter is far more likely.

E E E vs O

At some skill level people quit hearing individual letters and start hearing words, much like experienced readers see words rather than letters. I’ve heard that this transition happens somewhere between 20 wpm and 30 wpm. That would be consistent with the estimate above that 20 wpm is pushing the limit of perception letter by letter. But 30 words per minute is doable. It’s impressive, but not unheard of.

What I find hard to believe is that there were intelligence officers, such as Terry Jackson, who could copy encrypted text at 50 wpm. Here a “word” is a five-letter code group. There are millions of possible code groups [4], all equally likely, and so it would be impossible to become familiar with particular code groups the way one can become familiar with common words. Maybe they learned to hear pairs or triples of letters.

Related posts

[1] Thaddeus L. Bolton. Rhythm. The American Journal of Psychology. Vol. VI. No. 2. January, 1894. Available here.

[2] Interonset is not commonly hyphenated, but I’ve hyphenated it here for clarity.

[3] The movement Summer from Vivaldi’s The Four Seasons performed at 180 bpm which corresponds to 720 sixteenth notes per minute, each 83 ms long.

[4] If a code group consisted entirely of English letters, there are 265 = 11,881,376 possible groups. If a code group can contain digits as well, there would be 365 = 60,466,176 possible groups.

Running the Gregorian calendar backwards

Toward the end of last year I wrote several blog posts about calendars. The blog post about the Gregorian calendar began with this paragraph.

The time it takes for the earth to orbit the sun is not an integer multiple of the time it takes for the earth to rotate on its axis, nor is it a rational number with a small denominator. Why should it be? Much of the complexity of our calendar can be explained by rational approximations to an irrational number.

The post went on to say why the Gregorian calendar was designed as it was. In a nutshell, the average length of a year in the Gregorian calendar is 365 97/400 days, which matches the astronomical year much better than the Julian calendar, which has an average year length of 365 ¼ days.

In the Julian calendar, every year divisible by 4 is a leap year. The Gregorian calendar makes an exception: centuries are not leap years unless they are divisible by 400. So the year 2000 was a leap year, but 1700, 1800, and 1900 were not. Instead of having 100 leap years every 400 years, the Gregorian calendar has 97 leap years every 400 years.

Why does it matter whether the calendar year matches the astronomical year? In the short run it makes little difference, but in the long run it matters more. Under the Julian calendar, the Spring equinox occurred around March 21. If the world had remained on the Julian calendar, the date of the Spring equinox would drift later and later, moving into the summer and eventually moving through the entire year. Plants that bloom in March would eventually bloom in what we’d call July. And instead of the dog days of summer being in July, eventually they’d be in what we’d call November.

The Gregorian calendar wanted to do two things: stop the drift of the seasons. and restore the Spring equinox to March 21. The former could have been accomplished with little disruption by simply using the Gregorian calendar moving forward. The latter was more disruptive since it required removing days from the calendar.

The Julian year was too long, gaining 3 days every 400 years. So between the time of Jesus and the time of Pope Gregory, the calendar had drifted by about 12 days. In order to correct for this, the calendar would have to jump forward about a dozen years. If you think moving clocks forward an hour is disruptive, imagine moving the calendar forward a dozen days.

The Gregorian calendar didn’t remove 12 days; it removed 10. In the first countries to adopt the new calendar in 1582, Thursday, October 4th in 1582 was followed by Friday, October 15th. Note that Thursday was followed by Friday as usual. The seven-day cycle of days of the week was not disrupted. That’ll be important later on.

Why did the Gregorian calendar remove 10 days and not 12?

We can think of the 10 days that were removed as corresponding to previous years that the Julian calendar considered a leap year but that the Gregorian calendar would not have: 1500, 1400, 1300, 1100, 1000, 900, 700, 600, 500, and 300. Removing 10 days put the calendar in sync astronomically with the 300’s. This is significant because the Council of Nicaea met in 325 and made decisions regarding the calendar. Removing 10 days in 1582 put the calendar in sync with the calendar at the time of the council.

Now let’s push the calendar back further. Most scholars say Jesus was crucified on Friday, April 3, 33 AD. What exactly does “April 3, 33” mean? Was that a Julian date or a Gregorian date? There’s a possible difference of two days, corresponding to whether or not the years 100 and 200 were considered leap years.

If we were to push the Gregorian calendar back to the first century, the calendar for April in 33 AD would be the same as the calendar in 2033 AD (five cycles of 400 years later). April 3, 2033 is on a Sunday. (You could look that up, or use the algorithm given here.) April 3, 33 in the Julian calendar corresponds to April 1, 33 in the Gregorian calendar. So April 3, 33 was a Friday in the Julian calendar, the calendar in use at the time.

Some scholars date the crucifixion as Friday, April 7, 30 AD. That would also be a Julian date.

Related posts

Fredholm Alternative

The Fredholm alternative is so called because it is a theorem by the Swedish mathematician Erik Ivar Fredholm that has two alternative conclusions: either this is true or that is true. This post will state a couple forms of the Fredholm alternative.

Mr. Fredholm was interested in the solutions to linear integral equations, but his results can be framed more generally as statements about solutions to linear equations.

This is the third in a series of posts, starting with a post on kernels and cokernels, followed by a post on the Fredholm index.

Fredholm alternative warmup

Given an m×n real matrix A and a column vector b, either

Axb

has a solution or

AT y = 0 has a solution yTb ≠ 0.

This is essentially what I said in an earlier post on kernels and cokernels. From that post:

Suppose you have a linear transformation TV → W and you want to solve the equation Tx = b. … If c is an element of W that is not in the image of T, then Tx = c has no solution, by definition. In order for Tx = b to have a solution, the vector b must not have any components in the subspace of W that is complementary to the image of T. This complementary space is the cokernel. The vector b must not have any component in the cokernel if Tx = b is to have a solution.

In this context you could say that the Fredholm alternative boils down to saying either b is in the image of A or it isn’t. If b isn’t in. the image of A, then it has some component in the complement of the image of A, i.e. it has a component in the cokernel, the kernel of AT.

The Fredholm alternative

I’ve seen the Fredholm alternative stated several ways, and the following from [1] is the clearest. The “alternative” nature of the theorem is a corollary rather than being explicit in the theorem.

As stated above, Fredholm’s interest was in integral equations. These equations can be cast as operators on Hilbert spaces.

Let K be a compact linear operator on a Hilbert space H. Let I be the identity operator and A = IK. Let A* denote the adjoint of A.

  1. The null space of A is finite dimensional,
  2. The image of A is closed.
  3. The image of A is the orthogonal complement of the kernel of A*.
  4. The null space of A is 0 iff the image of A is H.
  5. The dimension of the kernel of A equals the dimension of the kernel of A*.

The last point says that the kernel and cokernel have the same dimension, and the first point says these dimensions are finite. In other words, the Fredholm index of A is 0.

Where is the “alternative” in this theorem?

The theorem says that there are two possibilities regarding the inhomogeneous equation

Ax = f.

One possibility is that the homogeneous equation

Ax = 0

has only the solution x = 0, in which case the inhomogeneous equation has a unique solution for all f in H.

The other possibility is that homogeneous equation has non-zero solutions, and the inhomogeneous has solutions has a solution if and only if f is orthogonal to the kernel of A*, i.e. if f is orthogonal to the cokernel.

Freedom and constraint

We said in the post on kernels and cokernels that kernels represent degrees of freedom and cokernels represent constraints. We can add elements of the kernel to a solution and still have a solution. Requiring f to be orthogonal to the cokernel is a set of constraints.

If the kernel of A has dimension n, then the Fredholm alternative says the cokernel of A also has dimension n.

If solutions x to Axf have n degrees of freedom, then right-hand sides f must satisfy n constraints. Each degree of freedom for x corresponds to a basis element for the kernel of A. Each constraint on f corresponds to a basis element for the cokernel that f must be orthogonal to.

[1] Lawrence C. Evans. Partial Differential Equations, 2nd edition

Fredholm index

The previous post on kernels and cokernels mentioned that for a linear operator TV → W, the index of T is defined as the difference between the dimension of its kernel and the dimension of its cokernel:

index T = dim ker T − dim coker T.

The index was first called the Fredholm index, because of it came up in Fredholm’s investigation of integral equations. (More on this work in the next post.)

Robustness

The index of a linear operator is robust in the following sense. If V and W are Banach spaces and TV → W is a continuous linear operator, then there is an open set around T in the space of continuous operators from V to W on which the index is constant. In other words, small changes to T don’t change its index.

Small changes to T may alter the dimension of the kernel or the dimension of the cokernel, but they don’t alter their difference.

Relation to Fredholm alternative

The next post discusses the Fredholm alternative theorem. It says that if K is a compact linear operator on a Hilbert space and I is the identity operator, then the Fredholm index of IK is zero. The post will explain how this relates to solving linear (integral) equations.

Analogy to Euler characteristic

We can make an exact sequence with the spaces V and W and the kernel and cokernel of T as follows:

0 → ker TVW → coker T → 0

All this means is that the image of one map is the kernel of the next.

We can take the alternating sum of the dimensions of the spaces in this sequence:

dim ker T − dim V + dim W − dim coker T.

If V and W have the same finite dimension, then this alternating sum equals the index of T.

The Euler characteristic is also an alternating sum. For a simplex, the Euler characteristic is defined by

V − EF

where V is the number of vertices, E the number of edges, and F the number of faces. We can extend this to higher dimensions as the number of zero-dimensional object (vertices), minus the number of one-dimensional objects (edges), plus the number of two-dimensional objects, minus the number of three dimensional objects, etc.

A more sophisticated definition of Euler characteristic is the alternating sum of the dimensions of cohomology spaces. These also form an exact sequence.

The Atiyah-Singer index theorem says that for elliptic operators on manifolds, two kinds of index are equal: the analytical index and the topological index. The analytical index is essentially the Fredholm index. The topological index is derived from topological information about the manifold.

This is analogous to the Gauss-Bonnet theorem that says you can find the Euler characteristic, a topological invariant, by integrating Gauss curvature, an analytic calculation.

Other posts in this series

This is the middle post in a series of three. The first was on kernels and cokernels, and the next is on the Fredholm alternative.