Uncategorized

Linear combination of sine and cosine as phase shift

Here’s a simple calculation that I’ve done often enough that I’d like to save the result for my future reference and for the benefit of anyone searching on this.

A linear combination of sines and cosines

a sin(x) + b cos(x)

can be written as a sine with a phase shift

A sin(x + φ).

Going between {a, b} and {A, φ} is the calculation I’d like to save. For completeness I also include the case

A cos(x + ψ).

Derivation

Define

f(x) = a sin(x) + b cos(x)

and

g(x) = A sin(x + φ).

Both functions satisfy the differential equation

y″ + y = 0

and so f = g if and only if f(0) = g(0) and f′(0) = g′(0).

Setting the values at 0 equal implies

b = A sin(φ)

and setting the derivatives at 0 equal implies

a = A cos(φ).

Taking the ratio of these two equations shows

b/a = tan(φ)

and adding the squares of both equations shows

a² + b² = A².

Equations

First we consider the case

a sin(x) + b cos(x) = A sin(x + φ).

Sine with phase shift

If a and b are given,

A = √(a² + b²)

and

φ = tan−1(b / a).

If A and φ are given,

a = A cos(φ)

and

b = A sin(φ)

from the previous section.

Cosine with phase shift

Now suppose we want

a sin(x) + b cos(x) = A cos(x + ψ)

If a and b are given, then

A = √(a² + b²)

as before and

ψ = − tan−1(a / b).

If A and ψ are given then

a = − A sin(ψ)

and

b = A cos(ψ).

Related posts

The Postage Stamp Problem

I recently stumbled upon the Postage Stamp Problem. Given two relatively prime positive numbers a and b, show that any sufficiently large number N, there exists nonnegative integers x and y such that

ax + by = N.

I initially missed the constraint that x and y must be positive, in which result is well known (Bézout’s lemma) and there’s no requirement for N to be large. The positivity constraint makes things more interesting.

5 cent and 21 cent stamps

The problem is called the Postage Stamp Problem because it says that given any two stamps whose values are relatively prime, say a 5¢ stamp and a 21¢ stamp, you can make any sufficiently large amount of postage using just those two stamps.

A natural question is how large is “sufficiently large,” and the answer turns out to be all integers larger than

ab − a − b.

So in our example, you cannot make 79¢ postage out of 5¢ and 21¢ stamps, but you can make 80¢ or any higher amount.

If you’ve been reading this blog for a while, you may recognize this as a special case of the Chicken McNugget problem, which you can think of as the Postage Stamp problem with possibly more than two stamps.

Related posts

Impersonating an Edwardian math professor

I’ve read some math publications from around a century or so ago, and I wondered if I could pull off being a math professor if a time machine dropped me into a math department from the time. I think I’d come across as something of an autistic savant, ignorant of what contemporaries would think of as basic math but fluent in what they’d consider more advanced.

There are two things in particular that were common knowledge at the time that I would be conspicuously ignorant of: interpolation tricks and geometry.

People from previous eras knew interpolation at a deeper level than citing the Lagrange interpolation theorem, out of necessity. They learned time-saving tricks have since been forgotten.

The biggest gap in my knowledge would be geometry. Mathematicians a century ago had a far deeper knowledge of geometry, particularly synthetic geometry, i.e. geometry in the style of Euclid rather than in the style of Descartes.

Sometimes older math books use notation or terminology that has since changed. I imagine I’d make a few gaffs, not immediately understanding a basic term or using a term that wasn’t coined until later.

If I had to teach a class, I’d choose something like real and complex analysis. Whittaker & Watson’s book on the subject was first published in 1902 and remains a common reference today. The only thing I find jarring about that book is that “show” is spelled “shew.” Makes me think of Ed Sullivan. But I think I’d have a harder time teaching a less advanced class.

Related posts

A strange take on the harmonic series

It is well known that the harmonic series

1 + ½ + ⅓ + ¼ + …

diverges. But if you take the denominators as numbers in base 11 or higher, the series converges [1].

I wonder what inspired this observation. Maybe Brewster was bored, teaching yet another cohort of students that the harmonic series diverges, and let his mind wander.

Proof

Let f(n) be the function that takes a positive integer n, writes it in base 10, then reinterprets the result as a number in base b where b > 10. Brewster is saying that the sum of the series 1/f(n) converges.

To see this, note that the first 10 terms are less than or equal to 1. The next 100 terms are less than 1/b. The next 1000 terms are less than 1/b², and so on. This means the series is bounded by the geometric series 10 (10/b)m.

Python

Incidentally, despite being an unusual function, f is very easy to implement in Python:

   def f(n, b): return int(str(n), b)

Citation

Brewster’s note was so brief that I will quote it here in full.

The [harmonic series] is divergent. But if the denominators of the terms are read as numbers in scale 11 or any higher scale, the series is convergent, and the sum is greater than 2.828 and less than 26.29. The convergence is rather slow. I estimate that, to find the last number by direct addition, one would have to work out 1090 terms, to about 93 places of decimals.

[1] G. W. Brewster. An Old Result in a New Dress. The Mathematical Gazette, Vol. 37, No. 322 (Dec., 1953), pp. 269–270.

Chebyshev polynomials as distorted cosines

Forman Acton’s book Numerical Methods that Work describes Chebyshev polynomials as

cosine curves with a somewhat disturbed horizontal scale, but the vertical scale has not been touched.

The relation between Chebyshev polynomials and cosines is

Tn(cos θ) = cos(nθ).

Some sources take this as the definition of Chebyshev polynomials. Other sources define the polynomials differently and prove this equation as a theorem.

It follows that if we let x = cos θ then

Tn(x) = cos(n arccos x).

Now sin x = cos(π/2 − x) and for small x, sin x ≈ x. This means

arccos(x) ≈ π/2 − x

for x near 0, and so we should expect the approximation

Tn(x) ≈ cos(n(π/2 − x)).

to be accurate near the middle of the interval [−1, 1] though not at the ends. A couple plots show that this is the case.

Mote Chebyshev posts

Up and down the abstraction ladder

It’s easier to go up a ladder than to come down, literally and metaphorically.

Gian-Carlo Rota made a profound observation on the application of theory.

One frequently notices, however, a wide gap between the bare statement of a principle and the skill required in recognizing that it applies to a particular problem.

This isn’t quite what he said. I made two small edits to generalize his actual statement. He referred specifically to the application of the inclusion-exclusion principle to problems in combinatorics. Here is his actual quote from [1].

One frequently notices, however, a wide gap between the bare statement of the principle and the skill required in recognizing that it applies to a particular combinatorial problem.

This post will expand a little on Rota’s original statement and a couple applications of the more general version of his statement.

The inclusion-exclusion principle

I don’t think Rota would have disagreed with my edited version of his statement, but it’s interesting to think of his original context. The inclusion-exclusion principle seems like a simple problem solving strategy: you may count a set of things by first over-counting, then correcting for the over-counting.

For example, a few days ago I wrote about a graph created by turning left at the nth step if n is divisible by 7 or contains a digit 7. Suppose you wanted to count how many times you turn in the first 100 steps. You could count the number of positive integers up to 100 that are divisible by 7, then the number that contain the digit 7, then subtract the number that both are divisible by 7 and contain a 7.

You can carry this a step further by over-counting, then over-correcting for your over-counting, then correcting for your over-correction. This is the essence of the following probability theorem.

\begin{align*} P(A \cup B \cup C) &= P(A) + P(B) + P(C) \\ &- P(A\cap B) - P(B \cap C) - P(A \cap C) \\ &+ P(A \cap B \cap C) \end{align*}

The inclusion-exclusion principle a clever idea, but not that clever. And yet Rota discusses how this idea was developed over decades into the Möbius inversion principle, which has diverse applications, including Euler characteristic and the calculus of finite differences.

Bayes’ theorem

Bayesian statistics is a direct application of Bayes’ theorem. Bayes’ theorem is a fairly simple idea, and yet people make careers out of applying it.

When I started working in the Biostatistics Department at MD Anderson, a bastion of Bayesian statistics, I was surprised how subtle Bayesian statistics is. I probably first saw Bayes’ theorem as a teenager, and yet it was not easy to wrap my head around Bayesian statistics. I would think “This is simple. Why is this hard?” The core principle was simple, but the application was not trivial.

Newtonian mechanics

When I took introductory physics, we would get stuck on homework problems and ask our professor for help. He would almost always begin by saying “Well, F = ma.”

This was infuriating. Yes, we know F = ma. But how does that let us solve this problem?!

There’s more to Newtonian mechanics than Newton’s laws, a lot more. And most of it is never made explicit. You pick it up by osmosis after you’ve worked hundreds of exercises.

Related posts

[1] G. C. Rota. On the Foundations of Combinatorial Theory I. Theory of Möbius Functions. Z. Wahrscheinlichkeitstheorie und Verw. Gebiete, 2 (1964) 340–368.

Making documents with makefiles

I learned to use the command line utility make in the context of building C programs. The program make reads an input file to tell it how to make things. To make a C program, you compile the source files into object files, then link the object files together.

You can tell make what depends on what, so it will not do any unnecessary work. If you tell it that a file foo.o depends on a file foo.c, then it will rebuild foo.o if that file is older than the file foo.c that it depends on. Looking at file timestamps allows make to save time by not doing unnecessary work. It also makes it easier to dictate what order things need to be done in.

There is nothing about make that is limited to compiling programs. It simply orchestrates commands in a declarative way. Because you state what the dependencies are, but let it figure if and when each needs to be run, a makefile is more like a Prolog program than a Python script.

I have a client that needs a dozen customized reports, each of which requires a few steps to create, and the order of the steps is important. I created the reports by hand. Then, of course, something changed and all the reports needed to be remade. So then I wrote a Python script to manage the next build. (And the next, and the next, …). But I should have written a makefile. I’m about to have to revise the reports, and this time I’ll probably use a makefile.

Here’s a very simple example of using a makefile to create documents. For more extensive examples of how to use makefiles for a variety of tasks, see How I stopped worrying and loved Makefiles. [1]

Suppose you need to create a PDF and a Microsoft Word version of a report from a LaTeX file [2].

all: foo.pdf foo.docx

foo.pdf: foo.tex
	pdflatex foo.tex

foo.docx: foo.tex
	pandoc foo.tex --from latex --to docx > foo.docx

clean:
	rm foo.aux foo.log

If the file foo.pdf does not exist, or exists but it older than foo.tex, the command make foo.pdf will create the PDF file by running pdflatex. If foo.pdf is newer than foo.tex then running make foo.pdf will return a message

make: 'foo.pdf' is up to date.

If you run make with no argument, it will build both foo.pdf and foo.docx as needed.

The command make clean deletes auxiliary files created by pdflatex. This shows that a make action does have to “make” anything; it simply runs a command.

 

[1] The blog post title is an allusion to the 1964 movie Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb.

[2] I use LaTeX for everything I can. This saves time in the long run, if not in the moment, because of consistency. I used to use Word for proposals and such, and LaTeX for documents requiring equations. My workflow became more efficient when I switched to using LaTeX for everything.

A deck of cards

One time when I was in grad school, I was a teaching assistant for a business math class that included calculus and a smattering of other topics, including a little bit of probability. I made up examples involving a deck of cards, but then learned to my surprise that not everyone was familiar with playing cards. I was young and had a lot to learn about seeing things through the eyes of other people.

Later I learned that a “standard” deck of cards is not quite as standard as I thought. The “standard 52-card deck” is indeed standard in the English-speaking world, but there are variations used in other countries.

The Unicode values for the characters representing playing cards are laid out in a nice pattern. The last digit of the Unicode value corresponds to the point value of the card (aces low), and the second-to-last digit corresponds to the suite.

The ace of spades has code point U+1F0A1, and the aces of hearts, diamonds, and clubs have values  U+1F0B1, U+1F0C1, and U+1F0D1 respectively. The three of spades is U+1F0A3 and the seven of hearts is U+1F0B7.

But there’s a surprise if you’re only away of the standard 52-card deck: there’s a knight between the jack and the queen. The Unicode values for jacks end in B (Bhex = 11ten) as you’d expect, but queens end in D and kings end in E. The cards ending in C are knights, cards that don’t exist in the standard 52-card deck.

Here’s a little Python code that illustrates the Unicode layout.

for i in range(4):
   for j in range(14):
     print(chr(0x1f0a1+j+16*i), end='')
   print()

The output is too small to read as plain text, but here’s a screen shot from running the code above in a terminal with a huge font.

Unicode cards

Playing cards have their origin in tarot cards, which are commonly associated with the occult. But John C. Wright argues that tarot cards were based on Christian iconography before they became associated with the occult.

On greedy algorithms and rodeo clowns

Rodeo clown

This weekend I ran across a blog post The Rodeo Clown Theory of Personal Development. The title comes from a hypothetical example of a goal you don’t know how to achieve: becoming a rodeo clown.

Let’s say you decide you want to be a rodeo clown. And let’s say you’re me and you have no idea how to be a rodeo clown. … Can you look at the possibilities currently available to you and imagine which of them might lead to an overall increase of rodeo clowndom in your life, even infinitesimally?

Each day you ask yourself what you can do that might lead you closer to your goal. This is a greedy algorithm: at each step you make as much progress toward your goal as you can. Greedy algorithms are not always optimal, but the premise above is that you have no idea what to do. In that case, doing whatever you can imagine that might bring you a little closer to your goal is a smart thing to do.

Crossing the continent

If you’re in Los Angeles and your goal is to get to New York, you could start by walking east. You might run into a brick wall and decide that you should not take your approach too literally. Maybe you decide to buy a ticket for a bus that’s traveling east, even if you have to walk west to the bus station. Maybe the bus takes you to Houston. This isn’t the most direct route, but it’s progress.

When you don’t know what else to do, a greedy approach, especially one tempered with common sense, could be a very sensible way to start. Maybe it’ll put you in a position to learn of a more efficient approach. In the scenario above, maybe some stranger sees you walk into a wall, asks you what you’re trying to do, and tells you where the bus station is.

Sorting algorithms

The way most people sort a hand of playing cards is a sort of greedy algorithm. They scan for the highest card and put it first. Then they scan the remainder of the hand for the next highest card and so on. This algorithm is known as insertion sort. It’s not the most efficient algorithm, but it may be the most natural, and it works. And for a hand of five cards it’s about as good as any.

Anyone who has taken an algorithms class will know you can beat insertion sort with a more sophisticated algorithm such as heap sort. But sorting is a well-understood, one-dimensional problem. That’s why people long ago came up with solutions that are better than a greedy approach. But when a problem is not well understood and is high-dimensional, greedy algorithms are more likely to succeed, even if they’re not optimal.

Training neural nets

Training a neural network is a very high dimensional optimization problem, maybe on the order of a billion dimensions. And yet it is solved using stochastic gradient descent, which is essentially a greedy algorithm. At each iteration, you move in the direction that gives you the most improvement toward your goal. There’s a little more nuance to it than that, but that’s the basic idea.

How can gradient descent work so well in practice when in theory it could get stuck in a local minimum or a saddle point? John Carmack posted a heuristic explanation on Twitter a few weeks ago. A local critical point requires the “cooperation” of every variable in the function you’re optimizing. The more variables, the less likely this is to happen.

Carmack didn’t give a full explanation—he posted a tweet, not a journal article—but he does give an intuitive idea of what’s going on. At the global minimum, the variables cooperate for a good reason. He assumes the variables are unlikely to cooperate anywhere else, which is apparently often true when training neural nets.

Usually things get harder in higher dimensions, hence the “curse of dimensionality.” But there’s a sort of blessing of dimensionality here: you’re less likely to get stuck in a local minimum in higher dimensions.

Back to the rodeo

Training a neural network can be a huge undertaking, using millions of dollars worth of electricity, but it’s a mathematically defined problem. Life goals such as becoming a rodeo clown are not cleanly quantified problems. They are very high dimensional in the sense that there are a vast number of things you could do at any given time. You might argue by analogy with neural nets that the fact that there are so many dimensions means that you’re unlikely to get stuck in a local minimum. This is a loose analogy, one that easily breaks down, but still one worth thinking about.

Seth Godin once said “Remarkable work often comes from making choices when everyone else feels as though there is no choice.” What looks like a local minimum may be revealed to not be once you consider more options.

 

Image by Clarence Alford from Pixabay

Finding strings in binary files

There’s a little program called strings that searches for what appear to be strings inside binary file. I’ll refer to it as strings(1) to distinguish the program name from the common English word strings. [1]

What does strings(1) consider to be a string? By default it is a sequence of four or more bytes that correspond to printable ASCII strings. There are command options to change the sequence length and the character encoding.

There are 98 printable ASCII characters [2] and 256 possible values for an 8-bit byte, so the probability of a byte being a printable character is

p = 98/256 = 0.3828125.

This implies that the probability of strings(1) flagging a sequence of four random bytes as a string is p4 or about 2%.

How long a string might you find inside a binary file?

I ran strings(1) on a photograph and found a 46-character string:

   .IEC 61966-2.1 Default RGB colour space - sRGB

Of course this isn’t random. This string is part of the metadata stored inside JPEG images.

Next I encrypted the file so that no strings would be metadata and all strings would be false positives. The longest line was 12 characters:

    Z<Bq{7fH}~G9

How does this compare to what we might expect from a random file? I wrote about the probability of long runs a dozen years ago. In a file of n bytes, the expected length of the longest run of printable characters is approximately

-frac{log n(1-p)}{log p}

In my case, the file had n = 203,308 bytes. The expected length of the longest run of printable characters would then be 12.2 characters, and so the actual length of the longest run is in line with what theory would have predicted.

 

[1] Unix documentation is separated into sections, and parentheses after a name specify the documentation section. Section 1 is for programs, Section 2 is for system calls, etc. So, for example, chmod(1) is the command line utility named chmod, and chmod(2) is the system call by the same name. Since command line utilities often have names that are common words, tacking (1) on the end helps distinguish program names from English words.

[2] p could be a little larger if you consider some of the control characters to be printable.