Permutations and tests

Suppose a test asks you to place 10 events in chronological order. Label these events A through J so that chronological order is also alphabetical order.

If a student answers BACDEFGHIJ, then did they make two mistakes or just one? Two events are in the wrong position, but they made one transposition error. The simplest way to grade such a test would be to count the number of events that are in the correct position. Is this the most fair way to grade?

If you decide to count how many transpositions are needed to correct a student’s answer, do you count any transposition or only adjacent transpositions? For example, if someone answered JBCDEFGHIA, then transposing the A and the J is enough to put the results in order. But reversing the first and last event seems like a bigger mistake than reversing the first two events. Counting only adjacent transpositions would penalize this mistake more. You would have to swap the J with each of the eight letters between J and A. But it hardly seems that answering JBCDEFGHIA is eight times worse than answering BACDEFGHIJ.

Maybe counting transpositions is too much work. So we just go back to counting how many events are in the right place. But then suppose someone answers JABCDEFGHI. This is completely wrong since every event is in the wrong position. But the student obviously knows something, since the relative order of nearly all of the events is correct. From one perspective there was only one mistake: J comes last, not first.

What is the worst possible answer? Maybe getting the order exactly backward? If you have an odd number of events, then getting the order backward means one event is in the right place, and so that doesn’t receive the lowest possible score.

This is an interesting problem beyond grading exams. (As for grading exams, I’d suggest simply not using questions of this type on an exam.) In manufacturing, how serious a mistake is it to reverse two consecutive components versus two distant components? You could also ask the same question when comparing DNA sequences or other digital signals. The best way to assign a distance between the actual and desired sequence would depend entirely on context.

Reading equations forward and backward

There is no logical difference between writing A = B and writing B = A, but there is a psychological difference.

Equations are typically applied left to right. When you write A = B you imply that it may be useful to replace A with B. This is helpful to keep in mind when learning something new: the order in which an equation is written gives a hint as to how it may be applied. However, this way of thinking can also be a limitation. Clever applications often come from realizing that you can apply an equation in the opposite of the usual direction.

For example, Euler’s reflection formula says

Γ(z) Γ(1-z) = π / sin(πz).

Reading from left to right, this says that two unfamiliar/difficult things, values of the Gamma function, are related to a more familiar/simple thing, the sine function. It would be odd to look at this formula and say “Great! Now I can compute sines if I just know values of the Gamma function.” Instead, the usual reaction would be “Great! Now I can relate the value of Gamma at two different places by using sines.”

When we see Einstein’s equation

E = mc2

the first time, we think about creating energy from matter, such as the mass lost in nuclear fission. This applies the formula from left to right, relating what we want to know, an amount of energy, to what we do know, an amount of mass. But you could also read the equation from right to left, calculating the amount of energy, say in an accelerator, necessary to create a particle of a given mass.

Calculus textbooks typically have a list of equations, either inside the covers or in an appendix, that relate an integral on the left to a function or number on the right. This makes sense because calculus students compute integrals. But mathematicians often apply these equations in the opposite direction, replacing a number or function with an integral. To a calculus student this is madness: why replace a familiar thing with a scary thing? But integrals aren’t scary to mathematicians. Expressing a function as an integral is often progress. Properties of a function may be easier to see in integral form. Also, the integral may lend itself to some computational technique, such as reversing the order of integration in a double integral, or reversing the order to taking a limit and an integral.

Calculus textbooks also have lists of equations involving infinite sums, the summation always being on the left. Calculus students want to replace the scary thing, the infinite sum, with the familiar thing, the expression on the right. Generating functions turn this around, wanting to replace things with infinite sums. Again this would seem crazy to a calculus student, but it’s a powerful problem solving technique.

Differential equation students solve differential equations. They want to replace what they find scary, a differential equation, with something more familiar, a function that satisfies the differential equation. But mathematicians sometimes want to replace a function with a differential equation that it satisfies. This is common, for example, in studying special functions. Classical orthogonal polynomials satisfy 2nd order differential equations, and the differential equation takes a different form for different families of orthogonal polynomials. Why would you want to take something as tangible and familiar as a polynomial, something you might study as a sophomore in high school, and replace it with something as abstract and mysterious as a differential equation, something you might study as a sophomore in college? Because some properties, properties that you would not have cared about in high school, are more clearly seen via the differential equations.

Pedantic arithmetic rules

Generations of math teachers have drilled into their students that they must reduce fractions. That serves some purpose in the early years, but somewhere along the way students need to learn reducing fractions is not only unnecessary, but can be bad for communication. For example, if the fraction 45/365 comes up in the discussion of something that happened 45 days in a year, the fraction 45/365 is clearer than 9/73. The fraction 45/365 is not simpler in a number theoretic sense, but it is psychologically simpler since it’s obvious where the denominator came from. In this context, writing 9/73 is not a simplification but an obfuscation.

Simplifying fractions sometimes makes things clearer, but not always. It depends on context, and context is something students don’t understand at first. So it makes sense to be pedantic at some stage, but then students need to learn that clear communication trumps pedantic conventions.

Along these lines, there is a old taboo against having radicals in the denominator of a fraction. For example, 3/√5 is not allowed and should be rewritten as 3√5/5. This is an arbitrary convention now, though there once was a practical reason for it, namely that in hand calculations it’s easier to multiply by a long decimal number than to divide by it. So, for example, if you had to reduce 3/√5 to a decimal in the old days, you’d look up √5 in a table to find it equals 2.2360679775. It would be easier to compute 0.6*2.2360679775 by hand than to compute 3/2.2360679775.

As with unreduced fractions, radicals in the denominator might be not only mathematically equivalent but psychologically preferable. If there’s a 3 in some context, and a √5, then it may be clear that 3/√5 is their ratio. In that same context someone may look at 3√5/5 and ask “Where did that factor of 5 in the denominator come from?”

A possible justification for rules above is that they provide standard forms that make grading easier. But this is only true for the simplest exercises. With moderately complicated exercises, following a student’s work is harder than determining whether two expressions represent the same number.

One final note on pedantic arithmetic rules: If the order of operations isn’t clear, make it clear. Add a pair of parentheses if you need to. Or write division operations as one thing above a horizontal bar and another below, not using the division symbol. Then you (and your reader) don’t have to worry whether, for example, multiplication has higher precedence than division or whether both have equal precedence and are carried out left to right.

Confidence

Zig Ziglar said that if you increase your confidence, you increase your competence. I think that’s generally true. Of course you could be an idiot and become a more confident idiot. In that case confidence just makes things worse [1]. But otherwise when you have more confidence, you explore more options, and in effect become more competent.

There are some things you may need to learn not for the content itself but for the confidence boost. Maybe you need to learn them so you can confidently say you didn’t need to. Also, some things you need to learn before you can see uses for them. (More on that theme here.)

I’ve learned several things backward in the sense of learning the advanced material before the elementary. For example, I studied PDEs in graduate school before having mastered the typical undergraduate differential equation curriculum. That nagged at me. I kept thinking I might find some use for the undergrad tricks. When I had a chance to teach the undergrad course a couple times, I increased my confidence. I also convinced myself that I didn’t need that material after all.

My experience with statistics was similar. I was writing research articles in statistics before I learned some of the introductory material. Once again the opportunity to teach the introductory material increased my confidence. The material wasn’t particularly useful, but the experience of having taught it was.

Related post: Psychological encapsulation


[1] See Yeats’ poem The Second Coming:

The best lack all conviction, while the worst
Are full of passionate intensity.

 

Elementary vs Foundational

Euclid’s proof that there are infinitely many primes is simple and ancient. This proof is given early in any course on number theory, and even then most students would have seen it before taking such a course.

There are also many other proofs of the infinitude of primes that use more sophisticated arguments. For example, here is such a proof by Paul Erdős. Another proof shows that there must be infinitely many primes because the sum of the reciprocals of the primes diverges. There’s even a proof that uses topology.

When I first saw one of these proofs, I wondered whether they were circular. When you use advanced math to prove something elementary, there’s a chance you could use a result that depends on the very thing you’re trying to prove. The proofs are not circular as far as I know, and this is curious: the fact that there are infinitely many primes is elementary but not foundational. It’s elementary in that it is presented early on and it builds on very little. But it is not foundational. You don’t continue to use it to prove more things, at least not right away. You can develop a great deal of number theory without using the fact that there are infinitely many primes.

The Fundamental Theorem of Algebra is an example in the other direction, something that is foundational but not elementary. It’s stated and used in high school algebra texts but the usual proof depends on Liouville’s theorem from complex analysis.

It’s helpful to distinguish which things are elementary and which are foundational when you’re learning something new so you can emphasize the most important things. But without some guidance, you can’t know what will be foundational until later.

The notion of what is foundational, however, is conventional. It has to do with the order in which things are presented and proved, and sometimes this changes. Sometimes in hindsight we realize that the development could be simplified by changing the order, considering something foundational that wasn’t before. One example is Cauchy’s theorem. It’s now foundational in complex analysis: textbooks prove it as soon as possible then use it to prove things for the rest of course. But historically, Cauchy’s theorem came after many of the results it is now used to prove.

Related: Advanced or just obscure?

On replacing calculus with statistics

Russ Roberts had this to say about the proposal to replacing the calculus requirement with statistics for students.

Statistics is in many ways much more useful for most students than calculus. The problem is, to teach it well is extraordinarily difficult. It’s very easy to teach a horrible statistics class where you spit back the definitions of mean and median. But you become dangerous because you think you know something about data when in fact it’s kind of subtle.

A little knowledge is a dangerous thing, more so for statistics than calculus.

This reminds me of a quote by Stephen Senn:

Statistics: A subject which most statisticians find difficult but in which nearly all physicians are expert.

Related: Elementary statistics book recommendation

Least understood bit of basic math

Logarithms may be the least understood topic in basic math. In my experience, if an otherwise math-savvy person is missing something elementary, it’s usually logarithms.

For example, I have had conversations with people with advanced technical degrees where I’ve had to explain that logs in all bases are proportional to each other. For example, if one thing is proportional to the natural log of another, the former is also proportional to the log base 10 or log base anything else of the latter [1].

I’ve also noticed that quite often there’s a question on the front page of math.stackexchange of the form “How do I solve …” and the solution is invariably “take logarithms of both sides.” This seems to be a secret technique.

I suspect that more people understood logarithms when they had to use slide rules. A slide rule is essentially two sticks with log-scale markings. By moving one relative to the other, you’re adding lengths, which means adding logs, which does multiplication. If you do that for a while, it seems you’d have to get a feel for logs.

Log tables also make logs more tangible. At first it seems there’s no skill required to use a  table, but you often have to exercise a little bit of understanding. Because of the limitations of space, tables can’t be big enough to let you directly look up everything. You have to learn how to handle orders of magnitude and how to interpolate.

If the first time you see logs is when it’s time to learn to differentiate them, you have to learn two things at once. And that’s too much for many students. They make mistakes, such as assuming logs are linear functions, that they would not make if they had an intuitive feel for what they’re working with.

Maybe schools could have a retro math week each year when students can’t use calculators and have to use log tables and slide rules. I don’t think it would do as much good to just make tables or slide rules a topic in the curriculum. It’s when you have to use these things to accomplish something else, when they are not simply an isolated forgettable topic of their own, that the ideas sink in.

More on logarithms

[1] That is, loga(x) = loga(b) logb(x). This says loga(b) is the proportionality constant for converting between logs in base a and b. To prove the equation, raise a to the power of both sides.

To memorize this equation, notice the up-and-down pattern of the bases and arguments: a up to x = a up to b down to b up to x. The right side squeezes an up and down b in between a and x.

Bottom-up exposition

I wish more authors followed this philosophy:

The approach I have taken here is to try to move always from the particular to the general, following through the steps of the abstraction process until the abstract concept emerges naturally. … at the finish it would be quite appropriate for the reader to feel that (s)he had just arrived at the subject, rather than reached the end of the story.

From the preface here (ISBN 0486450260).

When books start at the most abstract point, I feel like saying to the author “Thank you for the answer, but what was the question?”

Differentiating bananas and co-bananas

I saw a tweet this morning from Patrick Honner pointing to a blog post asking how you might teach derivatives of sines and cosines differently.

One thing I think deserves more emphasis is that “co” in cosine etc. stands for “complement” as in complementary angles. The cosine of an angle is the sine of the complementary angle. For any function f(x), its complement is the function f(π/2 – x).

When memorizing a table of trig functions and their derivatives, students notice a pattern. You can turn one formula into another by replacing every function with its co-function and adding a negative sign on one side. For example,

(d/dx) tan(x) = sec2(x)

and so

(d/dx) cot(x) = – csc2(x)

In words, the derivative of tangent is secant squared, and the derivative of cotangent is negative cosecant squared.

The explanation of this pattern has nothing to do with trig functions per se. It’s just the chain rule applied to f(π/2 – x).

(d/dx) f(π/2 – x) = – f‘(π/2 – x).

Suppose you have some function banana(x) and its derivative is kiwi(x). Then the cobanana function is banana(π/2 – x), the cokiwi function [1] is kiwi((π/2 – x), and the derivative of cobanana(x) is –cokiwi(x). In trig-like notation

(d/dx) ban(x) = kiw(x)

implies

(d/dx) cob(x) = – cok(x).

Now what is unique to sines and cosines is that the second derivative gives you the negative of what you started with. That is, the sine and cosine functions satisfy the differential equation y” = –y. That doesn’t necessarily happen with bananas and kiwis. If the derivative of banana is kiwi, that doesn’t imply that the derivative of kiwi is negative banana. If the derivative of kiwi is negative banana, then kiwis and bananas must be linear combinations of sines and cosines because all solutions to y” = –y have the form a sin(x) + b cos(x).

More trig posts

[1] Authors are divided over whether the cokiwi function should be abbreviated cok or ckw.

Dual polyhedra for kids

Here are a dodecahedron (left) and icosahedron (right) made from Zometool pieces.

dodecahedron and icosahedron

These figures are duals of each other:  If you put a vertex in the middle of each face of one of the shapes, and connect all the new vertices, you get the other shape. You could use these as a tangible way to introduce duality to kids.

There are lots of patterns that kids might discover for themselves. The dodecahedron has 12 faces and 20 vertices; the icosahedron has 20 faces and 12 vertices. At each vertex of the dodecahedron 3 five-sided faces come together; at each vertex of the icosahedron 5 three-sided faces come together.

The two polyhedra have the same number of edges. You can see this by taking one shape apart to make the other. A more sophisticated explanation is that Euler’s theorem says that V + F = E + 2. When you swap the roles of V and F, V+F doesn’t change, so E cannot change.

Here’s a hint on making an icosahedron with Zometool. Stick the red struts with the pentagonal ends into every pentagonal hole on one of the balls. Now if you connect each of the outer balls to each other, you have an icosahedron. You can leave the red pieces inside, or you can use a few of them as a temporary scaffolding to get started, then remove them.

If you do leave the red pieces inside, it’s hard to put the last few pieces in place because the shape is so rigid.

icosahedron with struts to its center

More geometry posts