My favorite paper: H = W

A paper came out in 1964 with the title “H = W.” The remarkably short title was not cryptic, however. The people for whom the paper was written knew exactly what it meant. There were two families of function spaces, one denoted with H and another denoted with W, that were speculated to be the same, and this paper proved that indeed they were.

This post will explain the significance of the paper. It will be more advanced and less self-contained than most of my posts. If you’re still interested, read on.

Definitions

The derivative of a function might exist in a generalized sense when it doesn’t exist in the classical sense. I give an introduction to this idea in the post How to differentiate a non-differentiable function. The generalized derivative is a linear functional on test functions [1] and may not correspond to a classical function. The delta “function” δ(x) is the classic example.

A regular distribution is a distribution whose action on a test function is equal to multiplying the test function by a locally integrable function and integrating.

Given Ω ⊂ ℝn, the Sobolev space Wk,p(Ω) consists of functions whose partial derivatives of order up to k are regular distributions that lie in the space Lp(Ω).

For example, let I be the interval (−1, 1) and let f(x) = |x|. The function f is not differentiable at 0, but the generalized derivative of f is the sign function sgn(x), which is in Lp(I) for all p. The generalized derivative of sgn(x) is 2δ(x), which is not a regular distribution [2], and so fW1,p(I) but fW2,p(I).

The norm on Wk,p(Ω) is the sum of the Lp norms of the function and each of its partial derivatives up to order k.

The Sobolev space Hk,p(Ω) is the closure of test functions in the norm of the space Wk,p(Ω).

Theorem

It’s not obvious a priori that these two ways of defining a Sobolev space are equivalent, but James Serrin and Normal George Meyers [3] proved in 1964 that for all domains Ω, and for all non-negative integers k, and for all 1 ≤ p in < ∞ we have

Hk,p(Ω) = Wk,p(Ω).

The proof is remarkably brief, less than a page.

Significance

Why does this theorem matter? Sobolev spaces are the machinery of the modern theory of differential equations. I spent a lot of time in my mid 20s working with Sobolev spaces.

The grand strategy of PDE research is to first search for generalized solutions to an equation, solutions belonging to a Sobolev space, then if possible prove that the generalized solution is in fact a classical solution.

This is analogous to first proving an algebraic equation has complex solution, then proving that the complex solution is real, or proving that an equation has a real number solution, then proving that the real solution is in fact an integer. It’s easier to first find a solution in a larger space, then if possible show that the thing you found belongs to a smaller space.

Related posts

[1] A test function in this context is an infinitely differentiable function of compact support. In other contexts a test function is not required to have compact support but is required to go asymptotically approach zero rapidly, faster than the reciprocal of any polynomial.

[2] The classical derivative of sgn(x) is equal to zero almost everywhere. But the derivative as a distribution is not zero. The pointwise derivative may not equal the generalized derivative.

[2] Norman G, Meyers; Serrin, James (1964), “H = W”, Proceedings of the National Academy of Sciences, 51 (6): 1055–1056

Transpose and Adjoint

The transpose of a matrix turns the matrix sideways. Suppose A is an m × n matrix with real number entries. Then the transpose A is an n × m matrix, and the (ij) element of A is the (ji) element of A. Very concrete.

The adjoint of a linear operator is a more abstract concept, though it’s closely related. The matrix A is sometimes called the adjoint of A. That may be fine, or it may cause confusion. This post will define the adjoint in a more general context, then come back to the context of matrices.

This post, and the next will be more abstract than usual. After indulging in a little pure math, I’ll return soon to more tangible topics such as Morse code and barbershop quartets.

Dual spaces

Before we can define adjoints, we need to define dual spaces.

Let V be a vector space over a field F. You can think of F as ℝ or ℂ. Then V* is the dual space of V, the space of linear functionals on V, i.e. the vector space of functions from V to F.

The distinction between a vector space and its dual seems artificial when the vector space is ℝn. The dual space of ℝn is isomorphic to ℝn, and so the distinction between them can seem pedantic. It’s easier to appreciate the distinction between V and V* when the two spaces are not isomorphic.

For example, let V be L3(ℝ), the space of functions f such that |f|3 has a finite Lebesgue integral. Then the dual space is L3/2(ℝ). The difference between these spaces is not simply a matter of designation. There are functions f such that the integral of |f|3 is finite but the integral of |f|3/2 is not, and vice versa.

Adjoints

The adjoint of a linear operator TVW is a linear operator T*: W* → V* where V* and W* are the dual spaces of V and W respectively. So T* takes a linear function from W to the field F, and returns a function from V to F. How does T* do this?

Given an element w* of W*, T*w* takes a vector v in V and maps it to F by

(T*w*)(v) = w*(Tv).

In other words, T* takes a functional w* on W and turns it into a function on V by mapping elements of V over to W and then letting w* act on them.

Note what this definition does not contain. There is no mention of inner products or bases or matrices.

The definition is valid over vector spaces that might not have an inner product or a basis. And this is not just a matter of perspective. It’s not as if our space has a inner product but we choose to ignore it; we might be working over spaces where it is not possible to define an inner product or a basis, such as ℓ, the space of bounded sequences.

Since a matrix represents a linear operator with respect to some basis, you can’t speak of a matrix representation of an operator on a space with no basis.

Bracket notation

For a vector space V over a field F, denote a function ⟨ ·, · ⟩ that takes an element from V and an element from V* and returns an element of F by applying the latter to the former. That is, ⟨ v, v* ⟩ is defined to be the action of v* on v. This is not an inner product, but the notation is meant to suggest a connection to inner products.

With this notation, we have

Tv, w* ⟩W = ⟨ v, T*w* ⟩V

for all v in V and for all in W by definition. This is the definition of T* in different notation. The subscripts on the brackets are meant to remind us that the left side of the equation is an element of F obtained by applying an element of W* to an element of W, while the right side is an element of F obtained by applying an element of V* to an element of V.

Inner products

The development of adjoint above emphasized that there is not necessarily an inner product in sight. But if there are inner products on V and W, then we can define turn an element of v into an element of V* by associating v with ⟨ ·, v ⟩ where now the brackets do denote an inner product.

Now we can write the definition of adjoint as

Tv, w ⟩W = ⟨ v, T*w V

for all v in V and for all in W. This definition is legitimate, but it’s not natural in the technical sense that it depends on our choices of inner products and not just on the operator T and the spaces V and W. If we chose different inner products on V and W then the definition of T* changes as well.

Back to matrices

We have defined the adjoint of a linear operator in a very general setting where there may be no matrices in sight. But now let’s look at the case of TVW where V and W are finite dimensional vector spaces, either over ℝ or ℂ. (The difference between ℝ and ℂ will matter.) And lets definite inner products on V and W. This is always possible because they are finite dimensional.

How does a matrix representation of T* correspond to a matrix representation of T?

Real vector spaces

Suppose V and W are real vector spaces and A is a matrix representation of TVW with respect to some choice of basis on each space. Suppose also that the bases for V* and W* are given by the duals of the bases for V and W. Then the matrix representation of T* is the transpose of A. You can verify this by showing that

Av, w ⟩W = ⟨ v, Aw V

for all v in V and for all in W.

The adjoint of A is simply the transpose of A, subject to our choice of bases and inner products.

Complex vector spaces

Now consider the case where V and W are vector spaces over the complex numbers. Everything works as above, with one wrinkle. If A is the representation of TVW with respect to a given basis, and we choose bases for V* and W* as above, then the conjugate of A is the matrix representation of T*. The adjoint of A is A*, the conjugate transpose of A. As before, you can verify this by showing that

Av, w ⟩W = ⟨ v, A*w V

We have to take the conjugate of A because the inner product in the complex case requires taking a conjugate of one side.

Related posts

Fredholm Alternative

The Fredholm alternative is so called because it is a theorem by the Swedish mathematician Erik Ivar Fredholm that has two alternative conclusions: either this is true or that is true. This post will state a couple forms of the Fredholm alternative.

Mr. Fredholm was interested in the solutions to linear integral equations, but his results can be framed more generally as statements about solutions to linear equations.

This is the third in a series of posts, starting with a post on kernels and cokernels, followed by a post on the Fredholm index.

Fredholm alternative warmup

Given an m×n real matrix A and a column vector b, either

Axb

has a solution or

AT y = 0 has a solution yTb ≠ 0.

This is essentially what I said in an earlier post on kernels and cokernels. From that post:

Suppose you have a linear transformation TV → W and you want to solve the equation Tx = b. … If c is an element of W that is not in the image of T, then Tx = c has no solution, by definition. In order for Tx = b to have a solution, the vector b must not have any components in the subspace of W that is complementary to the image of T. This complementary space is the cokernel. The vector b must not have any component in the cokernel if Tx = b is to have a solution.

In this context you could say that the Fredholm alternative boils down to saying either b is in the image of A or it isn’t. If b isn’t in. the image of A, then it has some component in the complement of the image of A, i.e. it has a component in the cokernel, the kernel of AT.

The Fredholm alternative

I’ve seen the Fredholm alternative stated several ways, and the following from [1] is the clearest. The “alternative” nature of the theorem is a corollary rather than being explicit in the theorem.

As stated above, Fredholm’s interest was in integral equations. These equations can be cast as operators on Hilbert spaces.

Let K be a compact linear operator on a Hilbert space H. Let I be the identity operator and A = IK. Let A* denote the adjoint of A.

  1. The null space of A is finite dimensional,
  2. The image of A is closed.
  3. The image of A is the orthogonal complement of the kernel of A*.
  4. The null space of A is 0 iff the image of A is H.
  5. The dimension of the kernel of A equals the dimension of the kernel of A*.

The last point says that the kernel and cokernel have the same dimension, and the first point says these dimensions are finite. In other words, the Fredholm index of A is 0.

Where is the “alternative” in this theorem?

The theorem says that there are two possibilities regarding the inhomogeneous equation

Ax = f.

One possibility is that the homogeneous equation

Ax = 0

has only the solution x = 0, in which case the inhomogeneous equation has a unique solution for all f in H.

The other possibility is that homogeneous equation has non-zero solutions, and the inhomogeneous has solutions has a solution if and only if f is orthogonal to the kernel of A*, i.e. if f is orthogonal to the cokernel.

Freedom and constraint

We said in the post on kernels and cokernels that kernels represent degrees of freedom and cokernels represent constraints. We can add elements of the kernel to a solution and still have a solution. Requiring f to be orthogonal to the cokernel is a set of constraints.

If the kernel of A has dimension n, then the Fredholm alternative says the cokernel of A also has dimension n.

If solutions x to Axf have n degrees of freedom, then right-hand sides f must satisfy n constraints. Each degree of freedom for x corresponds to a basis element for the kernel of A. Each constraint on f corresponds to a basis element for the cokernel that f must be orthogonal to.

[1] Lawrence C. Evans. Partial Differential Equations, 2nd edition

Inner product from norm

If a vector space has an inner product, it has a norm: you can define the norm of a vector to be the square root of the inner product of the vector with itself.

||v|| \equiv \langle v, v \rangle^{1/2}

You can use the defining properties of an inner product to show that

\langle v, w \rangle = \frac{1}{2}\left( || v + w ||^2 - ||v||^2 - ||w||^2 \right )

This is a form of the so-called polarization identity. It implies that you can calculate inner products if you can compute norms.

So does this mean you can define an inner product on any space that has a norm?

No, it doesn’t work that way. The polarization identity says that if you have a norm that came from an inner product then you can recover that inner product from norms.

What would go wrong if tried to use the equation above to define an inner product on a space that doesn’t have one?

Take the plane R² with the max norm, i.e.

|| (x, y) || \equiv \max(|x|, |y|)

and define a function that takes two vectors and returns the right-side of the polarization identity.

f(v, w) = \frac{1}{2}\left( || v + w ||^2 - ||v||^2 - ||w||^2 \right )

This is a well-defined function, but it’s not an inner product. An inner product is bilinear, i.e. if you multiply one of the arguments by a constant, you multiply the inner product by the same constant.

To see that f is not an inner product, let v = (1, 0) and w = (0, 1). Then f(v, w) = −1/2, but f(2v, w) is also −1/2. Multiplying the first argument by 2 did not multiply the result by 2.

When we say that R² with the max norm doesn’t have an inner product, it’s not simply that we forgot to define one. We cannot define an inner product that is consistent with the norm structure.

Duals and double duals of Banach spaces

The canonical examples of natural and unnatural transformations come from linear algebra, namely the relation between a vector space and its first and second duals. We will look briefly at the finite dimensional case, then concentrate on the infinite dimensional case.

Two finite-dimensional vector spaces over the same field are isomorphic if and only if they have the same dimension.

For a finite dimensional space V, its dual space V* is defined to be the vector space of linear functionals on V, that is, the set of linear functions from V to the underlying field. The space V* has the same dimension as V, and so the two spaces are isomorphic. You can do the same thing again, taking the dual of the dual, to get V**. This also has the same dimension, and so V is isomorphic to V** as well as V*. However, V is naturally isomorphic to V** but not to V*. That is, the transformation from V to V* is not natural.

Some things in linear algebra are easier to see in infinite dimensions, i.e. in Banach spaces. Distinctions that seem pedantic in finite dimensions clearly matter in infinite dimensions.

The category of Banach spaces considers linear spaces and continuous linear transformations between them. In a finite dimensional Euclidean space, all linear transformations are continuous, but in infinite dimensions a linear transformation is not necessarily continuous.

The dual of a Banach space V is the space of continuous linear functions on V. Now we can see examples of where not only is V* not naturally isomorphic to V, it’s not isomorphic at all.

For any real p > 1, let q be the number such that 1/p  + 1/q = 1. The Banach space Lp is defined to be the set of (equivalence classes of) Lebesgue integrable functions f such that the integral of |f|p is finite. The dual space of Lp is Lq. If p does not equal 2, then these two spaces are different. (If p does equal 2, then so does qL2 is a Hilbert space and its dual is indeed the same space.)

In the finite dimensional case, a vector space V is isomorphic to its second dual V**. In general, V can be embedded into V**, but V** might be a larger space. The embedding of V in V** is natural, both in the intuitive sense and in the formal sense of natural transformations, discussed in the previous post. We can turn an element of V into a linear functional on linear functions on V as follows.

Let v be an element of V and let f be an element of V*. The action of v on f is simply fv. That is, v acts on linear functions by letting them act on it!

This shows that some elements of V** come from evaluation at elements of V, but there could be more. Returning to the example of Lebesgue spaces above, the dual of L1 is L, the space of essentially bounded functions. But the dual of L is larger than L1. That is, one way to construct a continuous linear functional on bounded functions is to multiply them by an absolutely integrable function and integrate. But there are other ways to construct linear functionals on L.

A Banach space V is reflexive if the natural embedding of V in V** is an isomorphism. For p > 1, the spaces Lp are reflexive.

However, R. C. James proved the surprising result that there are Banach spaces that are isomorphic to their second duals, but not naturally. That is, there are spaces V where V is isomorphic to V**, but not via the natural embedding; the natural embedding of V into V** is not an isomorphism.

Related: Applied functional analysis

Some ways linear algebra is different in infinite dimensions

There’s no notion of continuity in linear algebra per se. It’s not part of the definition of a vector space. But a finite dimensional vector space over the reals is isomorphic to a Euclidean space of the same dimension, and so we usually think of such spaces as Euclidean. (We’ll only going to consider real vector spaces in this post.) And there we have a notion of distance, a norm, and hence a topology and a way to say whether a function is continuous.

Continuity

In finite dimensional Euclidean space, linear functions are continuous. You can put a different norm on a Euclidean space than the one it naturally comes with, but all norms give rise to the same topology and hence the same continuous functions. (This is useful in numerical analysis where you’d like to look at a variety of norms. The norms give different analytical results, but they’re all topologically equivalent.)

In an infinite dimensional normed space, linear functions are not necessarily continuous. If the dimension of a space is only a trillion, all linear functions are continuous, but when you jump from high dimension to infinite dimension, you can have discontinuous linear functions. But if you look at this more carefully, there isn’t a really sudden change.

If a linear function is discontinuous, its finite dimensional approximations are continuous, but the degree of continuity is degrading as dimension increases. For example, suppose a linear function stretches the nth basis vector by a factor of n. The bigger n gets, the more the function stretches in the nth dimension. As long as n is bounded, this is continuous, but in a sense it is less continuous as n increases. The fact that the infinite dimensional version is discontinuous tells you that the finite dimensional versions, while technically continuous, scale poorly with dimension. (See practical continuity for more discussion along these lines.)

Completeness

A Banach space is a complete normed linear space. Finite dimensional normed spaces are always complete (i.e. every sequence in the space converges to a point in the space) but this might not happen in infinite dimensions.

Duals and double duals

In basic linear algebra, the dual of a vector space V is the space of linear functionals on V, i.e. the set of linear maps from V to the reals. This space is denoted V*. If V has dimension nV* has dimension n, and all n-dimensional spaces are isomorphic, so the distinction between a space and its dual seems pedantic. But in general a Banach space and its dual are not isomorphic and so its easier to tell them apart.

The second dual of a vector space, V** is the dual of the dual space. In finite dimensional spaces, V** is naturally isomorphic to V. In Banach spaces, V is isomorphic to a subset of V**. And even when V is isomorphic to V**, it might not be naturally isomorphic to V**.  (Here “natural” means natural in the category theory sense of natural transformations.)

Related

We got the definition wrong

When I was in grad school, I had a course in Banach spaces with Haskell Rosenthal. One day he said “We got the definition wrong.” It took a while to understand what he meant.

There’s nothing logically inconsistent about the definition of Banach spaces. What I believe he meant is that the definition is too broad to permit nice classification theorems.

I had intended to specialize in functional analysis in grad school, but my impression after taking that course was that researchers in the field, at least locally, were only interested in questions of the form “Does every Banach space have the property …” In my mind, this translated to “Can you construct a space so pathological that it lacks a property enjoyed by every space that anyone cares about?” This was not for me.

I ended up studying differential equations. I found it more interesting to use Banach spaces to prove theorems about PDEs than to study them for their own sake. From my perspective there was nothing wrong with their definition.

Related posts