Python resources

Each Wednesday I post a list of some of the resources on this site. This week: Python notes.

See also blog posts tagged Python, SciPy, and SymPy and the Twitter account SciPyTip.

Last week: Special functions

Next week: Probability resources

Special function resources

This week’s resource post: some pages of notes on special functions:

See also blog posts tagged special functions.

I tweet about special functions occasionally on AnalysisFact

Last week: HTML, LaTeX, and Unicode

Next week: Python resources

Counting primitive bit strings

A string of bits is called primitive if it is not the repetition of several copies of a smaller string of bits. For example, the 101101 is not primitive because it can be broken down into two copies of the string 101. In Python notation, you could produce 101101 by "101"*2. The string 11001101, on the other hand, is primitive. (It contains substrings that are not primitive, but the string as a whole cannot be factored into multiple copies of a single string.)

For a given n, let’s count how many primitive bit strings there are of length n. Call this f(n). There are 2n bit strings of length n, and f(n) of these are primitive. For example, there are f(12) primitive bit strings of length 12. The strings that are not primitive are made of copies of primitive strings: two copies of a primitive string of length 6, three copies of a primitive string of length 4, etc. This says

 2^{12} = f(12) + f(6) + f(4) + f(3) + f(2) + f(1)

and in general

2^n = \sum_{d \mid n} f(d)

Here the sum is over all positive integers d that divide n.

Unfortunately this formula is backward. It gives is a formula for something well known, 2n, as a sum of things we’re trying to calculate. The Möbius inversion formula is just what we need to turn this formula around so that the new thing is on the left and sums of old things are on the right. It tells us that

f(n) = \sum_{d \mid n} \mu\left(\frac{n}{d}\right) 2^d

where μ is the Möbius function.

We could compute f(n) with Python as follows:

from sympy.ntheory import mobius, divisors

def num_primitive(n):
    return sum( [mobius(n/d)*2**d for d in divisors(n)] )

The latest version of SymPy, version 0.7.6, comes with a function mobius for computing the Möbius function. If you’re using an earlier version of SymPy, you can roll your own mobius function:

from sympy.ntheory import factorint

def mobius(n):
    exponents = factorint(n).values()
    lenexp = len(exponents)
    m = 0 if lenexp == 0 else max(exponents)
    return 0 if m > 1 else (-1)**lenexp

The version of mobius that comes with SymPy 0.7.6 may be more efficient. It could, for example, stop the factorization process early if it discovers a square factor.

Thou, thee, you, and ye

Ever wonder what the rules were for when to use thou, thee, ye, or you in Shakespeare or the King James Bible?

For example, the inscription on front of the Main Building at The University of Texas says

Ye shall know the truth and the truth shall make you free.

Why ye at the beginning and you at the end?

The latest episode of The History of English Podcast explains what the rules were and how they came to be. Regarding the UT inscription, ye was the subject form of the second person plural and you was the object form. Eventually you became used for subject and object, singular and plural.

The singular subject form was thou and the singular object form was thee. For example, the opening lines of Shakespeare’s Sonnet 18:

Shall I compare thee to a summer’s day?
Thou art more lovely and more temperate.

Originally the singular forms were intimate and the plural forms were formal. Only later did thee and thou take on an air of reverence or formality.

Notes on HTML, XML, TeX, and Unicode

This week’s resource post: some notes on typesetting, Unicode, etc.

See also blog posts tagged LaTeX, HTML, and Unicode and the Twitter account TeXtip.

Last week: C++ resources

Next week: Special functions

Why assign two characters to the same symbol?

Unicode often counts the same symbol (glyph) as two or more different characters. For example, Ω is U+03A9 when it represents the Greek letter omega and U+2126 when it represents Ohms, the unit of electrical resistance. Similarly, M is U+004D when it’s used as a Latin letter but U+216F when it’s used as the Roman numeral for 1,000.

The purpose of such distinctions is to capture semantic differences. One example of how this could be useful is increased accessibility. A text-to-speech reader should pronounce things the same way people do. When such software sees “a 25 Ω resistor” it should say “a twenty five Ohm resistor” and not “a twenty five uppercase omega resistor,” just as a person would. [1]

Making text more accessible to the blind helps everyone else as well. For example, it makes the text more accessible to search engines as well. As Elliotte Rusty Harold points out in Refactoring HTML:

Wheelchair ramps are far more commonly used by parents with strollers, students with bicycles, and delivery people with hand trucks than they are by people in wheelchairs. When properly done, increasing accessibility for the disabled increases accessibility for everyone.

However, there are practical limits to how many semantic distinctions Unicode can make without becoming impossibly large, and so the standard is full of compromises. It can be quite difficult to decide when two uses of the same glyph should correspond to separate characters, and no standard could satisfy everyone.

***

[1] Someone may discover that when I wrote “a 25 Ω resistor” above, I actually used an Omega  (Ω, U+03A9) rather than an Ohm character (Ω, U+2126). That’s because font support for Unicode is disappointing. If I had used the technically correct Ohm character, some people would not be able to see it.  Ironically, this would make the text less accessible.

On my Android phone, I can see Ω (Ohm) but I cannot see Ⅿ (Roman numeral M) because the installed fonts have a glyph for the former but not the latter.

***

This post first appeared on Symbolism, a blog that I’ve now shut down.

Updating blog posts

I’ve been going through my old blog posts and fixing a few problems. I found a few missing images, code samples that had lost their indentation, etc. Most of the errors have been my fault, but some were due to bugs in plug-ins.

If you see any problems with a post, please let me know. You could send me an email, or leave a comment on the post. (For a while I had comments automatically turn off on older posts, but I’ve disabled that. Now you can comment on any post.)

For the first couple years, this blog didn’t have many readers, and so not many people pointed out my errors. Now that there are more readers, I find out about errors more quickly. But I’ve found some egregious errors in some of the older posts.

Thanks for your contribution to this blog. I’ve been writing here for almost seven years, and I’ve benefited greatly from your input.

R resources

This is the third in my weekly series of posts pointing out resources on this site. This week’s topic is R.

See also posts tagged Rstats.

I started the Twitter account RLangTip and handed it over the folks at Revolution Analytics.

Last week: Emacs resources

Next week: C++ resources

After a coin comes up heads 10 times

Suppose you’ve seen a coin come up heads 10 times in a row. What do you believe is likely to happen next? Three common responses:

  1. Heads
  2. Tails
  3. Equal probability of heads or tails.

Each is reasonable in its own context. The last answer is correct assuming the flips are independent and heads and tails are equally likely.

But as I argued here, if you see nothing but heads, you have reason to question the assumption that the coin is fair. So there’s some justification for the first answer.

The reasoning behind the second answer is that tails are “due.” This isn’t true if you’re looking at independent flips of a fair coin, but it could reasonable in other settings, such as sampling without replacement.

Say there are a number of coins on a table, covered by a cloth. A fixed number are on the table heads up, and a fixed number tails up. You reach under the cloth and slide a coin out. Every head you pull out increases the chances that the next coin will be tails. If there were an equal number of heads and tails under the cloth to being with, then after pulling out 10 heads tails are indeed more likely next time.

Related post: Long runs

First two impressions of statistics

When I was a postdoc I asked a statistician a few questions and he gave me an overview of his subject. (My area was PDEs; I knew nothing about statistics.) I remember two things that he said.

  1. A big part of being a statistician is knowing what to do when your assumptions aren’t met, because they’re never exactly met.
  2. A lot of statisticians think time series analysis is voodoo, and he was inclined to agree with them.