Cepstral alanysis vocabulary

An earlier post defined cepstrum and quefrency. This post explains some of the other quirky terms introduced in the same paper by Bogert, Healy, and Tukey. (Given Tukey’s delight in coining words, we can assume he was the member of the trio responsible for the new terms.)

The paper [1] explains why the new twists on familiar words:

In general, we find ourselves operating on the frequency side in ways customary on the time side and vice versa. Experience has made it clear that “words that sound like other words,” although strange at first sight, considerably reduce confusion on balance. These parallel or “paraphrased” words are made by the interchange of consonants or consonant groups, as in “alanysis” from “analysis,” and are introduced as needed.

The magnitude and phase of a cepstrum are called gamnitude and saphe. (The latter explains the pun “saphe cracking” in the title.)

Filtering in the cepstral domain is called liftering. A high-pass filter corresponds to a long-pass lifter and a low-pass filter corresponds to a short-pass lifter.

Harmonics in spectra correspond to rahmonics in cepstra.

Some of these terms are helpful. As explained in the previous post, the independent variable in cepstral analysis, quefrency, differs enough from frequency that it helps to have a separate term for it. Using the terms long and short rather than high and low is helpful for the same reason. Using repiod for the analog of period seems gratuitous, but maybe it’s necessary for consistency. Once you introduce some new terminology, you have to keep going.

[1] Bruce P. Bogert, M. J. R. Healy, John W. Tukey. The Quefrency Analysis of Time Series for Echoes: Cepstrum, Pseudo-Autocovariance, Cross-Cepstrum and Saphe Cracking. Collected works of John Tukey volume 1

Cepstrum, quefrency, and pitch

John Tukey

John Tukey coined many terms that have passed into common use, such as bit (a shortening of binary digit) and software. Other terms he coined are well known within their niche: boxplot, ANOVA, rootogram, etc. Some of his terms, such as jackknife and vacuum cleaner, were not new words per se but common words he gave a technical meaning to.

Cepstrum is an anagram of spectrum. It involves an unusual use of power spectra, and is roughly analogous to making anagrams of a word. A related term, one we will get to shortly, is quefrency, an anagram of frequency. Some people pronounce the ‘c’ in cepstrum hard (like ‘k’) and some pronounce it soft (like ‘s’).

Let’s go back to an example from my post on guitar distortion. Here’s a note played with a fairly large amount of distortion:


And here is its power spectrum:

single note with distortion

There’s a lot going on in the spectrum, but the peaks are very regularly spaced. As I mentioned in the post on the sound of a leaf blower, this is the fingerprint of a sound with a definite pitch. Spikes in the spectrum alone don’t indicate a definite pitch if they are irregularly spaced.

The peaks are fairly periodic. How to you find periodic patterns in a signal? Fourier transform! But if you simply take the Fourier transform of a Fourier transform, you essentially get the original signal back. The key to the cepstrum is to do something else between the two Fourier transforms.

The cepstrum starts by taking the Fourier transform, then the magnitude, then the logarithm, and then the inverse Fourier transform.

When we take the magnitude, we throw away phase information, which we don’t need in this context. Taking the log of the magnitude is essentially what you do when you compute sound pressure level. Some define the cepstrum using the magnitude of the Fourier transform and some the magnitude squared. Squaring only introduces a multiple of 2 once we take logs, so it doesn’t effect the location of peaks, only their amplitude.

Taking the logarithm compresses the peaks, bringing them all into roughly the same range, making the sequence of peaks roughly periodic.

When we take the inverse Fourier transform, we now have something like a frequency, but inverted. This is what Tukey called quefrency.

Looking at the guitar power spectrum above, we see a sequence of peaks spaced 440 Hz apart. When we take the inverse Fourier transform of this, we’re looking at a sort of frequency of a frequency, what Tukey calls quefrency. The quefrency scale is inverted: sounds with a high frequency fundamental have overtones that are far apart on the frequency domain, so the sequence of the overtone peaks has low frequency.

Here’s the plot of the cepstrum for the guitar sample.

electric guitar cepstrum

There’s a big peak at 109 on the quefrency scale. The audio clip was recorded at 48000 samples per second, so the 109 on the quefrency scale corresponds to a frequency of 48000/109 = 440 Hz. The second peak is at quefrency 215, which corresponds to 48000/215 = 223 Hz. The second peak corresponds to the perceived pitch of the note, A3, and the first peak corresponds to its first harmonic, A4. (Remember the quefrency scale is inverted relative to the frequency scale.)

I cheated a little bit in the plot above. The very highest peaks are at 0. They are so large that they make it hard to see the peaks we’re most interested in. These low quefrency peaks correspond to very high frequency noise, near the edge of the audible spectrum or beyond.

Bring out your equations!

Nice discussion from Fundamentals of Kalman Filtering: A Practical Approach by Paul Zarchan and Howard Musoff:

Often the hardest part in Kalman filtering is the subject that no one talks about—setting up the problem. This is analogous to the quote from the recent engineering graduate who, upon arriving in industry, enthusiastically says, “Here I am, present me with your differential equations!” As the naive engineering graduate soon found out, problems in the real world are frequently not clear and are subject to many interpretations. Real problems are seldom presented in the form of differential equations, and they usually do not have unique solutions.

Whether it’s Kalman filters, differential equations, or anything else, setting up the problem is the hard part, or at least a hard part.

On the other hand, it’s about as impractical to only be able to set up problems as it is to only be able to solve them. You have to know what kinds of problems can be solved, and how accurately, so you can formulate a problem in a tractable way. There’s a feedback loop: provisional problem formulation, attempted solution, revised formulation, etc. It’s ideal when one person can set up and solve a problem, but it’s enough for the formulators and solvers to communicate well and have some common ground.

Related posts:

Family tree numbering

When you draw a tree of your ancestors, things quickly get out of hand. There are twice as many nodes each time you go back a generation, and so the size of paper you need grows exponentially. Things also get messy because typically you know much more about some lines than others. If you know much about your ancestry, one big tree isn’t going to work.

Ahnentafel numbering system from 1590

There’s a simple solution to this problem, one commonly used in genealogy: assign everyone in the tree a number, starting with yourself as 1. Then follow two simple rules:

  1. The father of person n has number 2n.
  2. The mother of person n has number 2n + 1.

You can tell where someone fits into the tree easily from their number. Men have even numbers, women odd numbers. The number of someone’s child is half their number (rounding down if you get a fraction). For example, person 75 on your tree must be a woman. Her husband would be 74, her child 37, her father 150, etc.

Taking the logarithm base 2 tells you how many generations back someone is. That is, person n is ⌊ log2n ⌋ generations back. Going back to our example of 75, this person would be 6 generations back because log2 75 = 6.2288. (Here ⌊ x ⌋ is the “floor” of x, the largest integer less than x. Using the same notation, the child of n is ⌊ n/2 ⌋.)

Said another way, the people m generations back have numbers 2m through 2m+1 – 1. Your paternal line has numbers equal to powers of 2, and your maternal line has numbers one less than powers of 2.

If you write out a person’s number in binary, you stick a 0 on the end to find their father and a 1 on the end to find their mother. So your paternal grandmother, for example, would have number 101 in binary. Start with your number: 1. Then tack on a zero for your father: 10. Then tack on a 1 for his mother: 101.

In our example of 75 above, this number is 1001011 in binary. Leave off the one on the left, then read from left to right saying “father” every time you see a 0 and “mother” every time you see a 1. So person 75 is your father’s father’s mother’s father’s mother’s mother.

This numbering system goes back to at least 1590. In that year Michaël Eytzinger published the chart in the image above, giving the genealogy of Henry III of France.

Related posts:

Consecutive pair magic square

The following magic square has a couple unusual properties. For one, numbers appear in consecutive pairs. Also, you can connect the numbers 1 through 32 in a continuous path.

I found this in Before Sudoku. The authors attribute it to William Mannke, “A Magic Square.” Journal of Recreational Mathematics. 1 (3) page 139, July 1968.

Other magic squares:

Magic hexagon

The following figure is a magic hexagon: the numbers in any straight path through the figure add to 38, even though paths may have length three, four, or five.

I found this in Before Sudoku. The authors attribute it to Madachy’s Mathematical Recreations.

This is essentially the only magic hexagon filled with consecutive integers starting with one. The only others are rotations or reflections of this one, or the trivial case of a single hexagon.

Related posts:

Personal growth and discrete harmonic functions

“You are the average of the five people you spend the most time with.”

A Google search says this quote is by Jim Rohn. I think other people have said similar things. I’ve heard it quoted many times. The implication is usually that you can improve your life by hanging around better people.

Here are three things it makes me think of.

  1. It sounds approximately true.
  2. It can’t be literally true.
  3. It reminds me of harmonic functions.

There are numbers to back up the assertion that things like your income and even your weight approximately match the average of the people around you. Cause and effect go both ways: people like to hang out with people like themselves, and we become like the people we hang around.

It’s an aphorism, not meant to be taken literally. But a moment’s thought shows that it can’t be literally true for everybody. In any social network, someone has the most money, for example. That person’s net worth cannot be the average of the net worth of others in the group, unless everyone has the exact same net worth. The same applies to the poorest person in the network.

The reason I say that this reminds me of harmonic functions is the mean value theorem. If a function satisfies Laplace’s equation in some region, at any point in the interior of the region, the value of the function equals the average of the function over a spherical region centered at the point. But this is only true in the interior. On the boundary, you might have a maximum or minimum. If the boundary is compact, you will have a maximum and a minimum, provided the function extends continuously to its boundary.

I think of the continuous case first because I spent years thinking about such things. But there’s a discrete analog of harmonic functions that applies directly to the example above. If you have some network, such as a social network, and assign number to each node, you have a discrete harmonic function if the value at every node is the average of the values at its neighboring nodes. For a finite network, a function cannot be harmonic at every point unless it is constant, for reasons given above. But a function could be harmonic at all but two nodes of a graph, or approximately harmonic at all nodes.

Related posts:

Magic square rows and columns as numbers

Take any 3 by 3 magic square. For example, here’s the ancient Lo Shu square:

[[4,9,2],[3,5,7], [8,1,6]]

If you read the rows as numbers and sum their squares, you get the same thing whether you read left to right or right to left. In this case

4922 + 3572 + 8162 = 2942 + 7532 + 6182.

Similarly, if you read the columns as numbers and sum their squares, you get the same thing whether you read top to bottom or bottom to top:

4382 + 9512 + 2762 = 8342 + 1592 + 6722.

This doesn’t depend on base 10. It’s true of any base. And the entries of the magic square do not have to be single digits as long as you take the first to be the coefficient of b2, the second the coefficient of b, and the last the coefficient of 1, where b is your base.

In addition to rows and columns, you can get analogous results for diagonals.

4562 + 9782 + 2312 = 6542 + 8792 + 1322

4562 + 3122 + 8972 = 6542 + 2132 + 7982

2582 + 9362 + 4712 = 8522 + 6392 + 1742

2582 + 7142 + 6932 = 8522 + 4172 + 3962

How would you prove this? Arthur Benjamin and Kan Yasuda give an elegant proof here using permutation matrices. Or you could use brute-force starting with Édouard Lucas’ theorem that every 3 by 3 magic square has the following form.

[[c-b, c+a+b, c-a], [c-a+b, c, c+a-b], [c+a, c-a-b, c+b]]

(For each ab, and c there are eight variations on magic square given by Lucas, reflections and rotations of his square.)

Benjamin and Yasuda attribute this discovery to R. Holmes in 1970. “The magic magic square”, The Mathematical Gazette, 54(390):376.

Alphamagic square in Spanish

In a previous post I gave an example of an alphamagic square in English. This is a magic square such that if you replace each number with the letter count when spelling out the word, you get another magic square.

Flag of Spain

I wondered whether I could find an alphamagic square in Spanish, so I wrote a script to look. I found two. (Plus there are eight rotations and reflections of each.)

The first is the following:

[[93, 155, 121], [151, 123, 95], [125, 91, 153]]

When spelled out in Spanish the numbers are:

[[noventa y tres, ciento cincuenta y cinco, ciento veintiuno], [ciento cincuenta y uno, ciento veintitrés, noventa y cinco], [ciento veinticinco, noventa y uno, ciento cincuenta y tres]]

And the number of letters in each cell gives:

[[12, 21, 15],[19, 16, 13], [17, 11, 20]]

Here’s a second example:

[[95, 156, 124], [154, 125, 96], [126, 94, 155]]

Spelled out in Spanish:

[[noventa y cinco, ciento cincuenta y seis, ciento veinticuatro], [ciento cincuenta y cuatro, ciento veinticinco, noventa y seis], [ciento veintiséis, noventa y cuatro, ciento cincuenta y cinco]]

Sum of the letters:

[[13, 20, 18], [22, 17, 12], [16, 14, 21]]

If my script is correct, these are the only examples (besides their rotations and reflections) for numbers between 1 and 200.

Next: Alphamagic squares in French

Cornu’s spiral

Cornu’s spiral is the curve parameterized by

x(t) = C(t) = \int_0^t \cos \left( \frac{\pi}{2} s \right) \, ds \\ y(t) = S(t) = \int_0^t \sin \left( \frac{\pi}{2} s \right) \, ds

where C and S are the Fresnel functions, sometimes called the Fresnel cosine integral and Fresnel sine integral. Here’s a plot of the spiral.

Cornu's spiral

Both Fresnel functions approach ½ as t → ∞ and so the curve slowly spirals toward (½, ½) in the first quadrant. And by symmetry, because both functions are odd, the curve spirals toward (-½, -½) in the third quadrant.

Here’s the Python code used to make the plot.

    from scipy.special import fresnel
    from scipy import linspace
    import matplotlib.pyplot as plt

    t = linspace(-7, 7, 1000)
    y, x = fresnel(t)

    plt.plot(x, y)

The SciPy function fresnel returns both Fresnel functions at the same time. It returns them in the order (S, C) so the code reverses the order of these to match the Cornu curve.

One interesting feature of Cornu’s spiral is that its curvature increases linearly with time. This is easy to verify: because of the fundamental theorem of calculus, the Fresnel functions reduce to sines and cosines when you take derivatives, and you can show that the curvature at time t equals πt.

How fast does the curve spiral toward (½, ½)? Since the curvature at time t is πt, that says that at time t the curve is instantaneously bending like a circle of radius 1/πt. So the radius of the spiral is decreasing like 1/πt.

Cornu’s spiral was actually discovered by Euler. Cornu was an engineer who independently discovered the curve much later. Perhaps because Cornu used the curve in applications, his name is more commonly associated with the curve. At least I’ve more often seen it named after Cornu. This is an example of Stigler’s law that things are usually not named after the first person to discover them.

* * *

For daily posts on analysis, follow @AnalysisFact on Twitter.

AnalysisFact twitter icon

Categorical products


There’s an odd sort of partisan spirit to discussions of category theory. They often have the flavor of “Category theory is great!” or “Category theory is a horrible waste of time!” You don’t see this sort of partisanship around, say, probability. Probability theory is what it is, and if you need it, you use it. If you don’t need it, you don’t use it. I think of category theory in a similar way. It’s good for some things and not for others.

In this post I’ll look at just one little piece of category theory, the definition of products, and use it to give a flavor of category theory in general.

Initial objections

The first time I saw category theory’s definition of a product I thought it was a bizarre complication. “The product of A and B is an object P such that for any other object X …”

What is this X doing in our definition? It’s not our product, nor is it one of the things we’re taking the product of.  And why introduce a diagram? Is the product of two mathematical objects a picture?! Why not come out and say what a product is rather than saying what it does? It’s just ordered pairs, right?

Category theory is all about how things behave rather than what they’re made of inside. So you could say that talking about pairs of elements violates the rules of the game. But that raises the question of why play this game at all. What do we get in return for placing such severe and unusual restrictions on ourselves?

The answer is that we get to see broader connections. When we focus on behavior rather than internal composition, we can see that two things behave the same even though they look different inside. Software developers should be familiar with this idea: depend on interface rather than implementation.


OK, so what is this mysterious definition of product? It’s a mouthful, but we’ll explain why it has to be what it is.

Given two objects A and B in some category, a product of A and B is an object P in that category and a pair of morphisms π1: PA and π2: PB such that for every object X with morphisms f1: X → A and f2: X → B, there exists a unique morphism f that makes the following diagram commute.

Commutative diagram for categorical product

Whew! That’s a lot more work than saying a product is the set of ordered pairs (ab) with a from A and b from B. And it’s not the first definition of product a student should see. However, there are three reasons why it’s worth introducing later:

  1. The ordered pair definition is not complete.
  2. The categorical definition is not as complex as it seems.
  3. The categorical definition makes new connections visible.

Why not ordered pairs

Saying “a product is just ordered pairs” isn’t enough. You have to say how the product relates to the things it’s a product of. In the case of a Cartesian product of sets, the projections are so obvious that it’s hard to realize they’re there, but in general they need to be specified.

Another reason the ordered pair definition isn’t complete is that you need to say how the product is structured. If you’re taking the product of groups, for example, then you have to say how the group operation is defined on these ordered pairs. Or if you’re taking the product of two topological spaces, then you have to say what the topology is on this set whose points are the ordered pairs.

The categorical definition doesn’t tell you how to construct a product, but it tells you how to know when you’ve found something that works. That’s the trade-off: in order to have a theory that exposes wider connections, it can’t be too tied to a specific example. Whether that’s an acceptable trade-off depends on your aim.

To reach further with our theory, we have to look at how things behave rather than how they are constructed. So how does a product behave? It lets you take components: here’s the first component, here’s the second. That’s about it. The categorical definition formalizes this in terms of projections, and it says that this is a universal property of products: anything else that acts like a product factors uniquely through the product.

In general you can’t just say products are ordered pairs. Sometimes products are not pairs, and sometimes pairs are not products. So the ordered pair definition doesn’t always apply. And when it does apply, it keeps us from seeing how products relate to coproducts, limits, and other operations.

When products are not pairs

Here’s an example of a product that’s not a pair. A partially ordered set can be viewed as a category. The elements of the set are the objects of the category, and there is an there is a morphism from a to b if a ≤ b. In that case the product of a and b is their minimum a ∧ b.

When pairs are not products

Here’s an example of a pair that’s not a product. The category of fields does not generally have products. You can form ordered pairs of elements from two fields, but you can’t always define any operation on these pairs that will turn them into a field.

For example, the number of elements in a finite field must be a power of a prime. If you take a field of order 5 and a field of order 7, there are 35 ordered pairs of elements, but there is no field of order 35.

But is it worth it?

The categorical definition of products is difficult to understand. It’s analogous to the δ-ε definition of limits: not the first thing you think of, but the rigorous definition that will generalize well into new situations.

Abstraction should follow experience, not precede it. You need to have multiple examples of products in you mind before you see any advantage to abstracting the idea of a product.

So what does the abstraction buy you? Maybe nothing! It depends on what you’re after. One thing it might do for you is help you to be more consistent. Programming language designers, for example, use category theory to make languages more consistent and easier to think about. A language might want to handle various kinds of products uniformly, even when the products look very different at first. In addition to consistently implementing what they should, category theory might guide designers to not implement what they shouldn’t. For example, above we said that it doesn’t make sense in general to take the product of two fields.

Category theory also suggests new questions. For example, duality is pervasive through out category theory. For every concept, there’s a co-concept. So once you identify a product in some context, it’s natural to ask what coproducts are, and these tend to be less obvious than products. And going back to consistency, category theory might guide you to handle dual concepts in a dual manner.

Related posts

Beats: amplitude modulation in radios and musical instruments

What do tuning a guitar and tuning a radio have in common? Both are examples of beats or amplitude modulation.


In an earlier post I wrote about how beats come up in vibrating systems, such as a mass and spring combination or an electric circuit. Here I look at examples from music and radio.


When two musical instruments play nearly the same note, they produce beats. The number of beats per second is the difference in the two frequencies. So if two flutes are playing an A, one playing at 440 Hz and one at 442 Hz, you’ll hear a pitch at 441 Hz that beats two times a second. Here’s a wave file of two pure sine waves at 440 Hz and 442 Hz.

As the players come closer to being in tune, the beats slow down. Sometimes you don’t have two instruments but two strings on the same instrument. Guitarists listen for beats to tell when two strings are playing the same note with the same pitch.

AM radio

The same principle applies to AM radio. A message is transmitted by multiplying a carrier signal by the content you want to broadcast. The beats are the content. As we’ll see below, in some ways the musical example and the AM radio example are opposites. With tuning, we start with two sources and create beats. With AM radio, we start by creating beats, then see that we’ve created two sources, the sidebands of the signal.

Mathematical explanation

Both examples above relate to the following trig identity:

cos(ab) + cos(a+b) = 2 cos a cos b

And because we’re looking at time-varying signals, slip in a factor of 2πt:

cos(2π(ab)t) + cos(2π(a+b)t) = 2 cos 2πat cos 2πbt


In the case of two pure tones, slightly out of tune, let a = 441 and b = 1. Playing an A 440 and an A 442 at the same time results in an A 441, twice as loud, with the amplitude going up and down like cos 2πt, i.e. oscillating two times a second. (Why two times and not just once? One beat for the maximum and and one for the minimum of cos 2πt.)

It may be hard to hear beats because of complications we’ve glossed over. Musical instruments are never perfectly in phase, but more importantly they’re not pure tones. An oboe, for example, has strong components above the fundamental frequency. I used a flute in this example because although its tone is not simply a sine wave, it’s closer to a sine wave than other instruments, especially when playing higher notes. Also, guitarists often compare the harmonics of two strings. These are purer tones and so it’s easier to hear beats between them.


For the case of AM radio, read the equation above from right to left. Let a be the frequency of the carrier wave. For example if you’re broadcasting on AM station 700, this means 700 kHz, so a = 700,000. If this station were broadcasting a pure tone at 440 Hz, b would be 440. This would produce sidebands at 700,440 Hz and 699,560 Hz.

AM signal

In practice, however, the carrier is not multiplied by a signal like cos 2πbt but by 1 + m cos 2πbt where |m| < 1 to avoid over-modulation. Without this extra factor of 1 the signal would be 100% modulated; the envelope of the signal would pinch all the way down to zero. By including the factor of 1 and using a modulation index m less than 1, the signal looks more like the image above, with the envelope not pinching all the way down. (Over-modulation occurs when m > 1. Instead of the envelope pinching to zero, the upper and lower parts of the envelop cross.)

Click to learn more about consulting help with signal processing

Related posts:

Correlation of two sine waves

What is the correlation of two sine waves that differ in phase? The result itself is interesting, and the calculation along the way shows tricks to avoid calculating integrals.

sines slightly out of phase

The correlation of two periodic signals, f and g, is

\frac{ \int \left( f - \mu_f\right) \left( g - \mu_g\right) \, dt } { \left( \int (f - \mu_f)^2\, dt \right)^{1/2} \left( \int (g - \mu_g)^2\, dt \right)^{1/2} }

where the integral is over a period of the two functions. For functions known at discrete points this would be a sum rather than an integral, but in this case we have continuous signals so we integrate.

In our case the two functions are f(t) = sin(t) and g(t) = sin(t + φ) and the integrals are over [0, 2π]. Both functions have average value 0, so the μ terms go away.

We use a trig identity to expand the numerator:

\sin\theta_1 \sin \theta_2 = \frac{1}{2} \left( \cos(\theta_1-\theta_2) - \cos(\theta_1 + \theta_2) \right)

In our case θ1 = t and θ2 = t + φ and so the numerator becomes

\frac{1}{2} \int_0^{2\pi} \cos\phi - \cos(2t + \phi) \, dt = \pi \cos \phi

The first part of the integral is integrating a constant (with respect to t) and so becomes the constant times the length of the integration range. The second part of the integral is zero because it is integrating a cosine over two periods.

Now for the denominator. Over a full period, sin2(t) and cos2(t) take on the same values, just shifted. So the integral of sin2(t) is half the integral of sin2(t) + cos2(t) = 1. Therefore

\int_0^{2\pi} \sin^2 t\, dt = \frac{1}{2}(2\pi) = \pi

and the same argument shows that the integral of sin2(t + φ) is also π. So our correlation is simply cos φ: the correlation of two out-of-phase sine waves is the cosine of their phase difference. It may be a little surprising that it works out to be so simple, but the result makes sense. When φ = 0, or any multiple of 2π, the waves are identical and so the correlation should be 1. When φ = π, or an odd multiple of π, the two waves are perfectly out of phase, and so the correlation should be -1. In between these extremes the correlation oscillates, in fact it is a cosine.

Energy in frequency modulated signals

In an earlier post we proved that if you modulate a cosine carrier by a sine signal you get a signal whose sideband amplitudes are given by Bessel functions. Specifically:

\cos( 2\pi f_c t + \beta \sin(2\pi f_m t) ) = \sum_{k=-\infty}^\infty J_n(\beta) \cos(2\pi(f_c + nf_m)t)

When β = 0, we have the unmodulated carrier, cos(2π fct), on both sides. When β is positive but small, J0(β) is near 1, and so the frequency component corresponding to the carrier is only slightly diminished. Also, the sideband amplitudes, the values of Jn(β) for n ≠ 0, are small and decay rapidly as |n| increases. As β increases, the amplitude of the carrier component decreases, the sideband amplitudes increase, and the sidebands decay more slowly.

We can be much more precise: the energy in the modulated signal is the same as the energy in the unmodulated signal. As β increases, more enery transfers to the sidebands, but the total energy stays the same. This conservation of energy result applies to more complex signals than just pure sine waves, but it’s easier to demonstrate in the case of a simple signal.


To prove the energy stays constant, we show that the sum of the squares of the coefficients of the cosine components is the same for the modulated and unmodulated signal.The unmodulated signal is just cos(2π fct), and so the only coefficient is 1. That means we have to prove

 \sum_{n=-\infty}^\infty J_n(\beta)^2 = 1

This is a well-known result. For example, it is equation 9.1.76 in Abramowitz and Stegun. We’ll show how to prove it from first principles. We’ll actually prove a more general result, the Newmann-Schläffi addition formula, then show our result follows easily from that.

Newmann-Schläffi addition formula

Whittaker and Watson define the Bessel functions by their generating function:

\exp\left(\frac{z}{2}\left(t - \frac{1}{t}\right)\right) = \sum_{n=-\infty}^\infty t^n J_n(z)

This means that when you expand the expression on the left as a power series in t, whatever is multiplied by tn is Jn(z) by definition. (There are other ways of defining the Bessel functions, but this way leads quickly to what we want to prove.)

We begin by factoring the Bessel generating function applied to zw.

\exp\left(\frac{z+w}{2}\left(t - \frac{1}{t}\right)\right) = \exp\left(\frac{z}{2}\left(t - \frac{1}{t}\right)\right) \exp\left(\frac{w}{2}\left(t - \frac{1}{t}\right)\right)

Next we expand both sides as power series.

\sum_{n=-\infty}^\infty t^n J_n(z+w) = \sum_{j=-\infty}^\infty t^j J_j(z) \sum_{k=-\infty}^\infty t^k J_k(w)

and look at the terms involving tn on both sides. On the left this is Jn(zw). On the right, we multiply two power series. We will get a term containing tn whenever we multiply terms tj and tk where j and k sum to n.

 J_n(z+w) = \sum_{j+k = n} J_j(z) J_k(w) = \sum_{m=-\infty}^\infty J_m(z) J_{n-m} J(w)

The equation above is the Newmann-Schläffi addition formula.

Sum of squared coefficients

To prove that the sum of the squared sideband coefficients is 1,  we apply the addition formula with n = 0, z = β, and w = -β.

1 = J_0(\beta - \beta) = \sum_{m=-\infty}^\infty J_m(\beta) J_{-m}(-\beta) = \sum_{m=-\infty}^\infty J_m(\beta)^2

This proves what we were after:

 \sum_{n=-\infty}^\infty J_n(\beta)^2 = 1

We used a couple facts in the last step that we haven’t discussed. The first was that J0(0) = 1. This follows from the generating function by setting z to 0 and taking the limit as t → 0. The second was that Jm(-β) = Jm(β). You can also see this from the generating function since negating z has the same effect as swapping t and 1/t.

Click to learn more about consulting help with signal processing

Related posts