Linear combination of sine and cosine as phase shift

Here’s a simple calculation that I’ve done often enough that I’d like to save the result for my future reference and for the benefit of anyone searching on this.

A linear combination of sines and cosines

a sin(x) + b cos(x)

can be written as a sine with a phase shift

A sin(x + φ).

Going between {a, b} and {A, φ} is the calculation I’d like to save. For completeness I also include the case

A cos(x + ψ).



f(x) = a sin(x) + b cos(x)


g(x) = A sin(x + φ).

Both functions satisfy the differential equation

y″ + y = 0

and so f = g if and only if f(0) = g(0) and f′(0) = g′(0).

Setting the values at 0 equal implies

b = A sin(φ)

and setting the derivatives at 0 equal implies

a = A cos(φ).

Taking the ratio of these two equations shows

b/a = tan(φ)

and adding the squares of both equations shows

a² + b² = A².


First we consider the case

a sin(x) + b cos(x) = A sin(x + φ).

Sine with phase shift

If a and b are given,

A = √(a² + b²)


φ = tan−1(b / a).

If A and φ are given,

a = A cos(φ)


b = A sin(φ)

from the previous section.

Cosine with phase shift

Now suppose we want

a sin(x) + b cos(x) = A cos(x + ψ)

If a and b are given, then

A = √(a² + b²)

as before and

ψ = − tan−1(a / b).

If A and ψ are given then

a = − A sin(ψ)


b = A cos(ψ).

Related posts

Resolving a mysterious problem with find

Suppose you want to write a shell script searches the current directory for files that have a keyword in the name of the file or in its contents. Here’s a first attempt.

find . -name '*.py' -type f -print0 | grep -i "$1"
find . -name '*.py' -type f -print0 | xargs -0 grep -il "$1"

This works well for searching file contents but behaves unexpectedly when searching for file names.

If I have a file named in the directory, the script will return

grep: (standard input): binary file matches

Binary file matches?! I wasn’t searching binary files. I was searching files with names consisting entirely of ASCII characters. Where is a binary file coming from?

If we cut off the pipe at the end of the first line of the script and run

find . -name '*.py' -type f -print0

we get something like

with no apparent non-ASCII characters. But if we pipe the output through xxd to see a hex dump, we see that there are invisible null characters after each file name.

One way to fix our script would be to add a -a option to the call to grep, telling to treat the input as ASCII. But this will return the same output as above. The output of find is treated as one long (ASCII) string, which matches the regular expression.

Another possibility would be to add a -o flag to direct grep to return just the match. But this is less than ideal as well. If you were looking for file names containing a Q, for example, you’d get Q as your output, which doesn’t tell you the full file name.

There may be better solutions [1], but my solution was to insert a call to strings in the pipeline:

find . -name '*.py' -type f -print0 | strings | grep -i "$1"

This will extract the ASCII strings out of the input it receives, which has the effect of splitting the string of file names into individual names.

By default the strings command defines an ASCII string to be a string of 4 or more consecutive ASCII characters. A file with anything before the .py extension will necessarily have at least four characters, but the analogous script to search C source files would overlook a file named x.c. You could fix this by using strings -n 3 to find sequences of three or more ASCII characters.

If you don’t have the strings command installed, you could use sed to replace the null characters with newlines.

find . -name '*.py' -type f -print0 | sed 's/\x0/\n/g' | grep -i "$1"

Note that the null character is denoted \x0 rather than simply \0.

Related posts

[1] See the comments for better solutions. I really appreciate your feedback. I’ve learned a lot over the years from reader comments.

The Postage Stamp Problem

I recently stumbled upon the Postage Stamp Problem. Given two relatively prime positive numbers a and b, show that any sufficiently large number N, there exists nonnegative integers x and y such that

ax + by = N.

I initially missed the constraint that x and y must be positive, in which result is well known (Bézout’s lemma) and there’s no requirement for N to be large. The positivity constraint makes things more interesting.

5 cent and 21 cent stamps

The problem is called the Postage Stamp Problem because it says that given any two stamps whose values are relatively prime, say a 5¢ stamp and a 21¢ stamp, you can make any sufficiently large amount of postage using just those two stamps.

A natural question is how large is “sufficiently large,” and the answer turns out to be all integers larger than

ab − a − b.

So in our example, you cannot make 79¢ postage out of 5¢ and 21¢ stamps, but you can make 80¢ or any higher amount.

If you’ve been reading this blog for a while, you may recognize this as a special case of the Chicken McNugget problem, which you can think of as the Postage Stamp problem with possibly more than two stamps.

Related posts

Impersonating an Edwardian math professor

I’ve read some math publications from around a century or so ago, and I wondered if I could pull off being a math professor if a time machine dropped me into a math department from the time. I think I’d come across as something of an autistic savant, ignorant of what contemporaries would think of as basic math but fluent in what they’d consider more advanced.

There are two things in particular that were common knowledge at the time that I would be conspicuously ignorant of: interpolation tricks and geometry.

People from previous eras knew interpolation at a deeper level than citing the Lagrange interpolation theorem, out of necessity. They learned time-saving tricks have since been forgotten.

The biggest gap in my knowledge would be geometry. Mathematicians a century ago had a far deeper knowledge of geometry, particularly synthetic geometry, i.e. geometry in the style of Euclid rather than in the style of Descartes.

Sometimes older math books use notation or terminology that has since changed. I imagine I’d make a few gaffs, not immediately understanding a basic term or using a term that wasn’t coined until later.

If I had to teach a class, I’d choose something like real and complex analysis. Whittaker & Watson’s book on the subject was first published in 1902 and remains a common reference today. The only thing I find jarring about that book is that “show” is spelled “shew.” Makes me think of Ed Sullivan. But I think I’d have a harder time teaching a less advanced class.

Related posts

Maybe Copernicus isn’t coming

Before Copernicus promoted the heliocentric model of the solar system, astronomers added epicycle on top of epicycle, creating ever more complex models of the solar system. The term epicycle is often used derisively to mean something ad hoc and unnecessarily complex.

Copernicus’ model was simpler, but it was less accurate. The increasingly complex models before Copernicus were refinements. They were not ad hoc, nor were they unnecessarily complex, if you must center your coordinate system on Earth.

It’s easy to draw the wrong conclusion from Copernicus, and from any number of other scientists who were able to greatly simplify a previous model. One could be led to believe that whenever something is too complicated, there must be a simpler approach. Sometimes there is, and sometimes there isn’t.

If there isn’t a simpler model, the time spent searching for one is wasted. If there is a simpler model, the time searching for one might still be wasted. Pursuing brute force progress might lead to a simpler model faster than pursuing a simpler model directly.

It all depends. Of course it’s wise to spend at least some time looking for a simple solution. But I think we’re fed too many stories in which the hero comes up with a simpler solution by stepping back from the problem.

Most progress comes from the kind of incremental grind that doesn’t make an inspiring story for children. And when there is a drastic simplification, that simplification usually comes after grinding on a problem, not instead of grinding on it.

3Blue1Brown touches on this in this video. The video follows two hypothetical problem solvers, Alice and Bob, who attack the same problem. Alice is the clever thinker and Bob is the calculating drudge. Alice’s solution of the original problem is certainly more elegant, and more likely to be taught in a classroom. But Bob’s approach generalizes in a way that Alice’s approach, as far as we know, does not.

Related posts

Trigonometric interpolation

Suppose you want to interpolate a set of data points with a combination of sines and cosines.

One way to approach this problem would be to set up a system of equations for the coefficients of the sines and cosines. If you have N data points, you will get a system of N equations in N unknowns. The system will have a unique solution, though this is not obvious a priori.

Another approach would be to use the discrete Fourier transform (DFT). This is the approach that would commonly be used in practice. It’s even further from obvious a priori that this would work, but it does. (The DFT is so often computed using the FFT algorithm that the transform is often referred to by the algorithm name. If you’d like, mentally substitute FFT for DFT in the rest of the post.)

There are multiple ways to motivate the DFT, and the way suggested by the name is to derive the DFT as a discrete approximation to the (continuous) Fourier transform. Why should should a discrete approximation to an integral transform also solve an interpolation problem? This doesn’t sound inevitable, or even plausible, but it is the case.

Another way to motivate the DFT is as the least-squares solution to fitting a sum of sines and cosines to a set of points. Since this is phrased as an optimization problem rather than an interpolation problem, it is clear that it will have a solution. However, it is not clear that the error in the optimal fit will in fact be zero. Furthermore, the equation for the coefficients in the solution is the same as the equation for the DFT. You can find a derivation in [1].


Let’s take the vector [3, 1, 4, 1, 5, 9] and find trig functions that pass through these points. We can use the FFT as implemented in Python’s SciPy library to find a set of complex exponentials that pass through the points.

    from scipy.fft import fft
    from numpy import exp, array, pi, round

    x = array([3, 1, 4, 1, 5, 9])
    y = fft(x)

    N = len(x)
    z = [sum([exp(2j*pi*k*n/N)*y[k] for k in range(N)])/N for n in range(N)]

Aside from rounding errors on the order of 10−15 the vector z equals the vector x.

Turning the expression for z above into a mathematical expression, we have

f(z) = y0 + y1 exp(nπi/3) + y2 exp(2nπi/3) + y3 exp(nπi) + y4 exp(4nπi/3) + y5 exp(5nπi/3)

where the y‘s come from the FFT above.

To find sines and cosines we need to use Euler’s formula

exp(iθ) = cos(θ) + i sin(θ)

Because started with real data x, there will be symmetries in the FFT components x that simplify the reduction of the complex function f to a real-valued function g using sines and cosines; some of the components will be conjugate and so the complex parts cancel out.

6 g(x) = y0 + (y1 + y5) cos(πx/3) + (y2 + y4) cos(2πx/3) + y3 cos(πx)
+ i (y1y5) sin(πx/3) + (y2y4) cos(2πx/3)

and so

g(x) = 3.833 + 0.8333 cos(πx/3) − 1.833 cos(2πx/3) + 0.1666 cos(πx)
− 2.5981 sin(πx/3) − 2.0207 cos(2πx/3)

Here’s a plot that verifies that g(x) passes through the specified points.

Related posts

[1] William L. Briggs and Van Emden Henson. The DFT: An Owner’s Manual for the Discrete Fourier Transform. SIAM 1995.

Moments with Laplace

This is a quick note to mention a connection between two recent posts, namely today’s post about moments and post from a few days ago about the Laplace transform.

Let f(t) be a function on [0, ∞) and F(s) be the Laplace transform of f(t).

F(s) = \int_0^\infty e^{-st} f(t) \,dt

Then the nth moment of f,

m_n = \int_0^\infty t^n \, f(t)\, dt

is equal to then nth derivative of F, evaluated at 0, with an alternating sign:

(-1)^n F^{(n)}(0) = m_n

To see this, differentiate with respect to s inside the integral defining the Laplace transform. Each time you differentiate you pick up a factor of −t, so differentiating n times you pick up a term (−1)n tn, and evaluating at s = 0 makes the exponential term go away.

Related posts

When do moments determine a function?

Two girls playing on a seesaw

The use of the word “moment” in mathematics is related to its use in physics, as in moment arm or moment of inertia. For a non-negative integer n, the nth moment of a function f is the integral of xn f(x) over the function’s domain.


If two continuous functions f and g have all the same moments, are they the same function? The answer is yes for functions over a finite interval, but no for functions over an unbounded interval.


Now let’s consider starting with a set of moments rather than starting with a function. Given a set of moments m0, m1, m2, … is there a function that has these moments? Typically no.

A better question is what conditions on the moments are necessary for there to exist a function with these moments. This question breaks into three questions

  1. The Hausdorff moment problem
  2. The Stieltjes moment problem
  3. The Hamburger moment problem

depending on whether the function domain is a finite interval, a half-bounded interval, or the real line. For each problem there are known conditions that are necessary and sufficient, but the conditions are different for each problem.

Interestingly, each of the three names Hausdorff, Stieltjes, and Hamburger are well known. Felix Hausdorff is best known for his work in topology: Hausdorff spaces, etc. Thomas Stieltjes is best known for the Riemann-Stieltjes integral, and for his work on continued fractions. Hans Ludwig Hamburger is not as well known, though his last name is certainly familiar.

Finite moments

A practical question in probability is how well a finite number of moments determine a probability distribution. They cannot uniquely determine the distribution, but the do establish bounds for how different the two distributions can be. See this post.

Related posts

Floating point: Everything old is new again

In the early days of computing hardware (and actually before) mathematicians put a lot of effort into understanding and mitigating the limitations of floating point arithmetic. They would analyze mundane tasks such as adding a list of numbers and think carefully about the best way to carry out such tasks as accurately as possible.

Now that most arithmetic is carried out in double precision, you can often get away with not thinking about such things. Except when you can’t. The vagaries of floating point computation still matter occasionally, even with double precision, though not as often as they did with single precision.

Although most computing has moved from single precision to double precision, there is increasing interest in going the opposite direction, from single precision to half precision. The main driver is neural networks. You don’t need a lot of precision in weights, and you’ve got a lot of numbers to store. So instead of taking 64 bits to store a double precision number, or 32 bits to store a single precision number. you might want to use a 16 bit or even 8 bit floating point number. That way you can fit more weights in memory at once.

However, when you move to lower precision numbers, you now have to think again about the things numerical analysts thought about a couple generations ago, such as different ways of rounding. You might think that floating point rounding could be modeled by random variables. If so, you’re in good company, because John von Neumann suggested this in 1947. But a few years later people began to realize that floating point rounding errors are not random. Or to be more precise, they began to realize that modeling rounding errors as random was inadequate; of course they knew that rounding errors weren’t literally random.

But what it rounding errors were random? This would lead to more error cancellation than we see in practice with floating point arithmetic. With stochastic rounding, the rounded values become unbiased estimators of the values they would like to represent but cannot represent exactly. Now the central limit theorem and all that come to your aid. More on applications of stochastic rounding here.

(To be pedantic a moment, stochastic rounding isn’t truly random, but uses pseudorandom numbers to implement a procedure which is well modeled by randomness. Random is as random does.)

Related posts

Band-limited expansion

The band-limited expansion of the function f(x) is given by

f(x) = \sum_{k=-\infty}^\infty f(kh) \, \text{sinc}\left(\frac{x - kh}{h}\right)
where sinc(x) = sin(πx)/πx. This is also called the sinc expansion, or the Whittaker cardinal after its discoverer E. T. Whittaker [1].

This is called the band-limited expansion of f because each term in the infinite sum is band-limited, i.e. only has Fourier spectrum within a finite band, because the Fourier transform of the sinc function is a step function supported between −1/2 and 1/2. [2]

The band-limited expansion has a lot of nice mathematical properties, leading Whittaker to call it “a function of royal blood in the family of entire functions, whose distinguished properties separate it from its bourgeois brethren.”

We can find a band-limited approximation for f by taking only a finite number of terms in the sum. An advantage of the band-limited approximation over a truncated Fourier series is that the former converges faster, making it useful in numerical algorithms [3]. Here’s an example of approximating the function exp(−x²) by taking h = 1 and using three terms, i.e. k running from −1 to 1.

You can improve the accuracy of the approximation by decreasing the size of h or by increasing N. This post explains how to pick the trade-off between h and N to minimize approximation error.

Related posts

[1] E. T. Whittaker, On the functions which are represented by the expansions of the interpolation theory, Proc. Roy. Soc. Edinburgh, 35 (1915), pp. 181–194.

[2] You may get a different interval of support if you use a different convention for defining the Fourier transform. Unfortunately there are many conventions.

[3] Frank Stenger. Numerical Methods based on Whittaker Cardinal, or Sinc Functions. Source: SIAM Review, Apr., 1981, Vol. 23, No. 2 (Apr., 1981), pp. 165-224