Complex AGM

Posted on 30 July 2022 by John

The arithmetic-geometric mean (AGM) of two non-negative real numbers a and b is defined as the limit of the iteration starting with a₀ = a and b₀ = b and

a_n+1 = ½ (a_n + b_n)
b_n+1 = √(a_n b_n)

for n > 0. This sequence converges very quickly and is useful in numerical algorithms. The limit can be expressed in terms of an elliptic function, and that elliptic function can then be related to other functions. See, for example, this post for how the AGM can be used to compute logarithms to arbitrary precision.

Since the AGM is useful in computing special functions, and we’re often interested in evaluating special functions at complex values, it’s natural to want to evaluate the AGM for complex numbers.

But we immediately run into a difficulty: which square root do we pick to find the new b?

For a non-negative real number x, √x is defined to be the non-negative real number y such that y² = x. But for more general values of x we have to choose a branch of the square root function. Which branch should we use in the AGM?

Often when we need to take the square root of complex numbers we can use the “principal branch,” the branch that gives positive values for positive real inputs and extends to the rest of the complex plane with the negative axis removed. If you compute the square root of a complex number in software, as we’ll do below, this is probably the value you’ll get by default.

But it turns out we cannot simply pick the principal branch. Gauss discovered two centuries ago that the right branch to take could vary at each step [0]. What does that even mean? How do we know we’ve made the “right” choice?

For one thing, we want our iteration to converge. And for another, we’d like it to converge to something non-zero if we start with non-zero inputs [1]. The right choice will guarantee this [2].

So what is this right choice? We provisionally update a and b as above, using either square root for b, and keep the value of b if

|a − b| ≤ |a + b|

|a − b| = |a + b|

and the imaginary part of b/a is positive. In other words, chose the possible value of b that’s closer to a, and use the imaginary part of b/a as a tie breaker.

Here is the AGM implemented in Python, using the right square root at each step.

    def right(a, b):
        d = abs(a + b) - abs(a - b)
        return d > 0 or d == 0 and (b/a).imag > 0

    def agm(a, b):
        while abs(a-b) > 1e-14:
            a1 = 0.5*(a+b)
            b1 = np.sqrt(a*b)
            if not right(a1, b1):
                b1 = -b1
            a, b = a1, b1
        return a1

The test d == 0 should make you concerned. Testing floating point numbers for exact equality with zero is seldom the right thing to do, but we’ll gloss over numerical issues for this post.

Here’s an example, giving the a values of the iteration starting with 7 + 30i and 20 + 22i. The iteration converges in four steps.

    13.500000000000000 + 26.000000000000000j
    13.784944719026262 + 26.397404494892115j
    13.783557503026870 + 26.395953326186888j
    13.783557473769877 + 26.395953309190112j

In this example, the right root is the principal root every time. To find an example where the other root is chosen, I generated random starting points until I found one that took an alternate root.

Here are the values of b starting with −1.654 − 1.178i and 2.244 − 1.956i. An asterisk indicates that the principal root was not the right root.

     0.2328790598285062 - 0.728412421988127j
    -0.6829999999999998 - 0.589000000000000j
    -0.2254063569280081 - 0.799311791579126j *
    -0.2250604700857468 - 0.658706210994063j
    -0.2261796153095098 - 0.725905503054624j *
    -0.2252334135068774 - 0.729009001286595j
    -0.2257078598391289 - 0.727456168426402j *
    -0.2257065144081936 - 0.727457252170610j
    -0.2257071871240875 - 0.727456710298747j *
    -0.2257071871236612 - 0.727456710298506j
    -0.2257071871238743 - 0.727456710298626j *
    -0.2257071871238744 - 0.727456710298626j

Letter-like Unicode symbols

Posted on 16 June 2022 by John

Unicode provides a way to distinguish symbols that look alike but have different meanings.

We can illustrate this with the following Python code.

    import unicodedata as u

    for pair in [('K', 'K'), ('Ω', 'Ω'), ('ℵ', 'א')]:
        for c in pair:
            print(format(ord(c), '4X'), u.bidirectional(c), u.name(c))

This produces

      4B L LATIN CAPITAL LETTER K
    212A L KELVIN SIGN
     3A9 L GREEK CAPITAL LETTER OMEGA
    2126 L OHM SIGN
    2135 L ALEF SYMBOL
     5D0 R HEBREW LETTER ALEF

Even though K and K look similar, the former is a Roman letter and the latter is a symbol for temperature in Kelvin. Similarly, Ω and Ω are semantically different even though they look alike.

Or rather, they probably look similar. A font may or may not use different glyphs for different code points. The editor I’m using to write this post uses a font that makes no difference between ohm and omega. The letter K and the Kelvin symbol are slightly different if I look very closely. The two alefs appear substantially different.

Note that the mathematical symbol alef is a left-to-right character and the Hebrew latter alef is a right-to-left character. The former could be useful to tell a word processor “This isn’t really a Hebrew letter; it’s a symbol that looks like a Hebrew letter. Don’t change the direction of my text or switch any language-sensitive features like spell checking.”

These letter-like symbols can be used to provide semantic information, but they can also be used to deceive. For example, a malicious website could change a K in a URL to a Kelvin sign.

General tetrahedral numbers

Posted on 4 June 2022 by John

Start with a list of ones:

1, 1, 1, 1, 1, …

Taking the partial sums of this sequence gives consecutive numbers. That is, the nth number of the new series is the sum of the first n terms of the previous series.

1, 2, 3, 4, 5, …

If we take partial sums again, we get the triangular numbers.

1, 3, 6, 10, 15, …

They’re called triangle numbers because if we arrange coins in a triangle, with n on a side, the total number of coins is the nth triangular number.

If we repeat the process again, we get the tetrahedral numbers.

1, 4, 10, 20, 35, …

The nth tetrahedral number is the number of cannonballs is a tetrahedral stack, with each layer of the stack being a triangle.

We can produce the examples above in Python using the cumsum (cumulative sum) function on NumPy arrays.

    >>> import numpy as np
    >>> x = np.ones(5, int)
    >>> x
    array([1, 1, 1, 1, 1])
    >>> x.cumsum()
    array([1, 2, 3, 4, 5])
    >>> x.cumsum().cumsum()
    array([ 1, 3, 6, 10, 15])
    >>> x.cumsum().cumsum().cumsum()
    array([ 1, 4, 10, 20, 35])

We could continue this further, taking tetrahedral numbers of degree k. The sequences above could be called the tetrahedral numbers of degree k for k = 0, 1, 2, and 3.

Here’s Python code to find the nth tetrahedral number of a given dimension.

    def tetrahedral(n, dim):
        x = np.ones(n, int)
        for _ in range(dim):
             x = x.cumsum()
        return x[-1]

It turns out that

$T(n, d) = \binom{n+d-1}{d}$

and so we could rewrite the code above more simply as

    from math import comb

    def tetrahedral(n, dim):
        return comb(n + dim - 1, dim)

Sampling with replacement until you’ve seen everything

Posted on 14 May 2022 by John

Suppose you have a standard deck of 52 cards. You pull out a card, put it back in the deck, shuffle, and pull out another card. How long would you expect to do this until you’ve seen every card?

Here’s a variation on the same problem. Suppose you’re a park ranger keeping data on tagged wolves in a national park. Each day you catch a wolf at random, collect some data, and release it. If there are 100 wolves in the park, how long until you’ve seen every wolf at least once?

We’ll solve this problem via simulation, then exactly.

Simulation

The Python code to solve the problem via simulation is straight forward. Here we’ll simulate our ranger catching and releasing one of 100 wolves. We’ll repeat our simulation five times to get an idea how much the results vary.

    import numpy as np
    from numpy.random import default_rng

    rng = default_rng()
    num_runs = 5
    N = 100 # number of unique items

    for run in range(num_runs):
        tally = np.zeros(N,dtype=int)
        trials = 0
        while min(tally) == 0:
            tally[rng.integers(N)] += 1
            trials += 1
        print(trials)

When I ran this, I got 846, 842, 676, 398, and 420.

Exact solution

Suppose we have a desk of N cards, and we sample the deck with replacement M times. We can select the first card N ways. Ditto for the second, third, and all the rest of the cards. so there are N^M possible sequences of card draws.

How many of these sequences include each card at least once? To answer the question, we look at Richard Stanley’s twelvefold way. We’re looking for the number of ways to allocate M balls into N urns such that each urn contains at least one ball. The answer is

N! S(M, N)

where S(M, N) is the Stirling number of the second kind with parameters M and N [1].

So the probability of having seen each card at least once after selecting M cards randomly with replacement is

N! S(M, N) / N^M.

Computing Stirling numbers

We’ve reduced our problem to the problem of computing Stirling numbers (of the second kind), so how do we do that?

We can compute Stirling numbers in terms of binomial coefficients as follows.

$S(n,k) = \frac{1}{k!}\sum_{i = 0}^k (-1)^i {k \choose i} (k-i)^n$

Now we have a practical problem if we want to use this formula for larger values of n and k. If we work with floating point numbers, we’re likely to run into overflow. This is the perennial with probability calculations. You may need to compute a moderate-sized number as the ratio of two enormous numbers. Even though the final result is between 0 and 1, the intermediate results may be too large to store in a floating point number. And even if we don’t overflow, we may lose precision because we’re working with an alternating sum.

The usual way to deal with overflow is to work on a log scale. But that won’t work for Stirling numbers because we’re adding terms together, not multiplying them.

One way to solve our problem—not the most efficient way but a way that will work—is to work with integers. This is easy in Python because integers by default can be arbitrarily large.

There are two ways to compute binomial coefficients in Python: scipy.special.binom and math.comb. The former uses floating point operations and the latter uses integers, so we need to use the latter.

We can compute the numerator of our probability with

    from math import comb
    def k_factorial_stirling(n, k):
        return sum((-1)**i * comb(k, i)*(k-i)**n for i in range(k+1))

Then our probability is

    k_factorial_stirling(M, N) / N**M

If we compute the probability that our ranger has seen all 100 wolves after catching and releasing 500 times, we get 0.5116. So the median number of catch and release cycles is somewhere around 500, which is consistent with our simulation above.

Note that the numerator and denominator in our calculation are both on the order of 10¹⁰⁰⁰ while the largest floating point number is on the order of 10³⁰⁸, so we would have overflowed if we’d used binom instead of comb.

[1] It’s unfortunate that these were called the “second kind” because they’re the kind that come up most often in application.

Image: “Fastening a Collar” by USFWS Mountain Prairie is licensed under CC BY 2.0 .

Logarithmic spiral

Posted on 9 May 2022 by John

I’ve seen an image similar to the following many times, but I don’t recall any source going into detail regarding how the spiral is constructed. This post will do just that.

The previous post constructed iterated golden rectangles. We start with a golden rectangle and imagine chopping of first the blue, then the green, then the orange, and then the red square. We could continue this process on the inner most rectangle, over and over.

At this point, many sources would say “Hey look! We can draw a spiral that goes through two corners of each square” without explicitly stating an equation for the spiral. We’ll find spiral, making everything explicit.

In the previous post we showed how to find the limit of the iterative process at the intersection of the two diagonal lines below.

We showed that the intersection is at

x₀ = (2φ + 1)/(φ + 2)

and

y₀ = 1/(φ + 2).

Logarithmic spirals

A logarithmic spiral centered at the origin has the polar equation

r(θ) = a exp(kθ)

for some constants a and k. Our spiral will be such a spiral shifted to center at (x₀, y₀).

Imagine a spiral going from the lower left to the top left of the blue square, then to the top left of the green square, then to the top right of the orange square etc., passing through two corners of every square. Every time θ goes through a quarter turn, i.e. π/2 radians, our rectangles get smaller by a factor of φ. We can use this to solve for k because

a exp(k(θ + π/2)) = φ a exp(kθ)

and so

exp(kπ/2) = φ

and

k = 2log(φ) / π.

To summarize what we know so far, if we shift our spiral to the origin, it has equation in polar form

r(θ) = a exp(kθ)

where we now know k but don’t yet know a.

Finding θ and a

To get our actual spiral, we have to shift the origin to (x₀, y₀) which we have calculated above. Converting from polar to rectangular coordinates so we can do the shift, we have

x(θ) = x₀ + a exp(kθ) cos(θ)

y(θ) = y₀ + a exp(kθ) sin(θ)

We can find the parameter a by knowing that our spiral passes through the origin, the bottom left corner in the plots above. At what value of θ does the curve go through the origin? We have a choice; values of θ that differ by multiples of 2π will give different values of a.

The angle θ is the angle made by the line connecting (0, 0) to (x₀, y₀). We can find θ via

θ₀ = atan2(−y₀, −x₀).

Here’s Python code to find θ₀:

    import numpy as np

    phi = (1 + 5**0.5)/2
    y0 = 1/(2+phi)
    x0 = (2*phi + 1)*y0
    theta0 = np.arctan2(-y0, -x0)

Then we solve for a from the equation x(θ₀) = 0.

    k = 2*np.log(phi)/np.pi
    a = -x0 / (np.exp(k*theta0)*np.cos(theta0)))

Plots

Now we have all the parameters we need to define our logarithmic spiral. The following code draws our logarithmic spiral on top of the plot of the rectangles.

    t = np.linspace(-20, theta0, 1000)
    def x(t): return x0 + a*np.exp(k*t)*np.cos(t)
    def y(t): return y0 + a*np.exp(k*t)*np.sin(t)
    plt.plot(x(t), y(t), "k-")

The result is the image at the top of the post.

Non-associative multiplication

Posted on 1 February 2022 by John

There are five ways to parenthesize a product of four things:

((ab)c)d
(ab)(cd)
(a(b(cd))
(a(bc))d
(a((bc)d)

In a context where multiplication is not associative, the five products above are not necessarily the same. Maybe all five are different.

This post will give two examples where the products above are all different: octonions and matrix multiplication.

If you’re thinking “But wait: matrix multiplication is associative!” then read further and see in what sense I’m saying it’s not.

Octonions

Octonion multiplication is not associative. I wrote a blog post a while back asking how close the octonions come to being associative. That is, if we randomly generate unit-length octonions a, b, and c, we can calculate the norm of

(ab)c − a(bc)

and ask about its expected value. Sometimes for a triple of octonions this value is zero, but on average this expression has norm greater than 1. I estimated the average value via simulation, and later Greg Egan worked out the exact value. His write-up had gone down with the Google+ ship, but recently Greg posted a new version of his notes.

In this post I gave Python code for multiplying octonions using a function called CayleyDickson named after the originators of the algorithm. Let’s rename it m to have something shorter to work with and compute all five ways of associating a product of four octonions.

    import numpy as np

    def m(x, y):
        return CayleyDickson(x, y)

    def take_five(a, b, c, d):
        return [
            m(a, (m(b, m(c, d)))),
            m(m(a, b), m(c, d)),
            m(m(m(a, b), c), d),
            m(a, m(m(b, c), d)),
            m(m(a, m(b, c)), d)
    ]

I first tried products of basis elements, and I only got two different products out of the five ways of associating multiplication, but I only tried a few examples. However, when I tried using four random octonions, I got five different products.

    np.random.seed(20220201)

    a = np.random.rand(8)
    b = np.random.rand(8)
    c = np.random.rand(8)
    d = np.random.rand(8)

    for t in take_five(a, b, c, d):
        print(t)

This gave a very interesting result: I got five different results, but only two different real (first) components. The five vectors all differed in the last seven components, but only produced two distinct first components.

    [ 2.5180856  -2.61618184  ...]
    [ 2.5180856   0.32031027  ...]
    [ 2.5180856   1.13177500  ...]
    [ 3.0280984  -0.30169446  ...]
    [ 3.0280984  -2.36523580  ...]

I repeated this experiment a few times, and the first three results always had the same real component, and the last two results had another real component.

I suspect there’s a theorem that says

Re( ((ab)c)d ) = Re( (ab)(cd) ) = Re( a(b(cd)) )

and

Re( (a(bc))d ) = Re( a((bc)d) )

but I haven’t tried to prove it. If you come with a proof, or a counterexample, please post a link in the comments.

Matrix multiplication

Matrix multiplication is indeed associative, but the efficiency of matrix multiplication is not. That is, any two ways of parenthesizing a matrix product will give the same final matrix, but the cost of the various products are not the same. I first wrote about this here.

This is a very important result in practice. Changing the parentheses in a matrix product can make the difference between a computation being practical or impractical. This comes up, for example, in automatic differentiation and in backpropagation for neural networks.

Suppose A is an m by n matrix and B is an n by p matrix. Then AB is an m by p matrix, and forming the product AB requires mnp scalar multiplications. If C is a p by q matrix, then (AB)C takes

mnp + mpq = mp(n + q)

scalar multiplications, but computing A(BC) takes

npq + mnq = nq(m + p)

scalar multiplications, and in general these are not equal.

Let’s rewrite our multiplication function m and our take_five function to compute the cost of multiplying four matrices of random size.

We’ve got an interesting programming problem in that our multiplication function needs to do two different things. First of all, we need to know the size of the resulting matrix. But we also want to keep track of the number of scalar multiplications the product would require. We have a sort of main channel and a side channel. Having our multiplication function return both the dimension and the cost would make composition awkward.

This is kind of a fork in the road. There are two ways to solving this problem, one high-status and one low-status. The high-status approach would be to use a monad. The low-status approach would be to use a global variable. I’m going to take the low road and use a global variable. What’s one little global variable among friends?

    mults = 0

    def M(x, y):
        global mults
        mults += x[0]*x[1]*y[0]
        return (x[0], y[1])

    def take_five2(a, b, c, d):

        global mults
        costs = []

        mults = 0; M(a, (M(b, M(c, d)))); costs.append(mults)
        mults = 0; M(M(a, b), M(c, d));   costs.append(mults)
        mults = 0; M(M(M(a, b), c), d);   costs.append(mults)
        mults = 0; M(a, M(M(b, c), d));   costs.append(mults)
        mults = 0; M(M(a, M(b, c)), d);   costs.append(mults)

        return costs

Next, I’ll generate five random integers, and group them in pairs as sizes of matrices conformable for multiplication.

    dims = np.random.random_integers(10, size=5)
    dim_pairs = zip(dims[:4], dims[1:])
    c = take_five2(*[p for p in dim_pairs])

When I ran this dims was set to

[3 9 7 10 6]

and so my matrices were of size 3×9, 9×7, 7×10, and 10×6.

The number of scalar multiplications required by each way of multiplying the four matrices, computed by take_five2 was

[1384, 1090, 690, 1584, 984]

So each took a different number of operations. The slowest approach would take more than twice as long as the fastest approach. In applications, matrices can be very long and skinny, or very wide and thin, in which case one approach may take orders of magnitude more operations than another.

Planetary code golf

Posted on 18 September 2021 by John

Suppose you’re asked to write a function that takes a number and returns a planet. We’ll number the planets in order from the sun, starting at 1, and for our purposes Pluto is the 9th planet.

Here’s an obvious solution:

    def planet(n):
        planets = [
            "Mercury",
            "Venus",
            "Earth",
            "Mars",
            "Jupiter",
            "Saturn",
            "Uranus",
            "Neptune",
            "Pluto"
        ]
        # subtract 1 to correct for unit offset
        return planets[n-1]

Now let’s have some fun with this and play a little code golf. Here’s a much shorter program.

    def planet(n): return chr(0x263e+n)

I was deliberately ambiguous at the top of the post, saying the code should return a planet. The first function naturally interprets that to mean the name of a planet, but the second function takes it to mean the symbol of a planet.

The symbol for Mercury has Unicode value U+263F, and Unicode lists the symbols for the planets in order from the sun. So the Unicode character for the nth planet is the character for Mercury plus n. That would be true if we numbered the planets starting from 0, but conventionally we call Mercury the 1st planet, not the 0th planet, so we have to subtract one. That’s why the code contains 0x263e rather than 0x263f.

We could make the function just a tiny bit shorter by using a lambda expression and using a decimal number for 0x263e.

    planet = lambda n: chr(9790+n)

Display issues

Here’s a brief way to list the symbols from the Python REPL.

    >>> [chr(0x263f+n) for n in range(9)]
    ['☿', '♀', '♁', '♂', '♃', '♄', '♅', '♆', '♇']

You may see the symbols for Venus and Mars above rendered with a colored background, depending on your browser and OS.

Here’s what the lines above looked like at my command line.

Here’s what the symbols looked like when I posted them on Twitter.

For reasons explained here, some software interprets some characters as emoji rather than literal characters. The symbols for Venus and Mars are used as emoji for female and male respectively, based on the use of the symbols in biology. Incidentally, these symbols were used to represent planets long before biologists used them to represent sexes.

Spaceship operator in Python

Posted on 12 April 2021 by John

Some programming languages, such as Perl, have an infix operator <=> that returns a three-state comparison. The expression

    a <=> b

evaluates to -1, 0, or 1 depending on whether a < b, a = b, or a > b. You could think of <=> as a concatenation of <, =, and >.

The <=> operator is often called the “spaceship operator” because it looks like Darth Vader’s ship in Star Wars.

Python doesn’t have a spaceship operator, but you can get the same effect with numpy.sign(a-b). For example, suppose you wanted to write a program to compare two integers.

You could write

    from numpy import sign
    def compare(x, y):
        cmp = ["equal to", "greater than", "less than"][sign(x-y)]
        print(f"{x} is {cmp} {y}.")

Here we take advantage of the fact that an index of -1 points to the last element of a list.

The sign function will return an integer if its argument is an integer or a float if its argument is a float. The code above will break if you pass in floating point numbers because sign will return −1.0, 0.0, or 1.0. But if you replace sign(x-y) with int(sign(x-y)) it will work for floating point arguments.

Related post: Symbol pronunciation

Fraction comparison trick

Posted on 25 February 2021 by John

If you want to determine whether

a/b > c/d,

it is often enough to test whether

a + d > b + c.

Said another way a/b is usually greater than c/d when a + d is greater than b + c.

This sounds imprecise if not crazy. But it is easy to make precise and [1] shows that it is true.

Examples

Consider 4/11 vs 3/7. In this case 4 + 7 = 11 < 11 + 3 = 14, which suggests 4/11 < 3/7, which is correct.

But the rule of thumb can lead you astray. For example, suppose you want to compare 3/7 and 2/5. We have 3 + 5 = 8 < 7 + 2 = 9, but 3/7 > 2/5.

The claim isn’t that the rule always works; clearly it doesn’t. The claim is that it usually works, in a sense that will be made precise in the next section.

Rigorous statement

Let N be a large integer, and pick integers a, b, c, and d at random uniformly between 1 and N. Let

x = a/b − c/d

and

y = (a+d) − (b+c).

Then the probability x and y are both positive, or both negative, or both zero, approaches 11/12 as N goes to infinity.

Simulation

I won’t repeat the proof of the theorem above; see [1] for that. But I’ll give a simulation that illustrates the theorem.

    import numpy as np
    
    np.random.seed(20210225)
    
    N = 1_000_000
    numreps = 10_000
    
    count = 0
    for _ in range(numreps):
        (a, b, c, d) = np.random.randint(1, N+1, 4, dtype=np.int64)
        x = a*d - b*c
        y = (a+d) - (b+c)
        if np.sign(x) == np.sign(y):
            count += 1
    print(count/numreps)

This prints 0.9176, and 11/12 = 0.91666….

The random number generator randint defaults to 32-bit integers, but this could lead to overflow since 10⁶ × 10⁶ > 2³². So I used 64-bit integers.

Instead of computing a/b − c/d, I multiply by bd and compute ad − bc instead because this avoids any possible floating point issues.

In case you’re not familiar with the sign function, it returns 1 for positive numbers, 0 for 0, and −1 for negative numbers.

The code suggests a different statement of the theorem: if you generate two pairs of integers, their sums and their products are probably ordered the same way.

Psychology

This post has assumed that the numbers a, b, c, and d are all chosen uniformly at random. But the components of the fractions for which you have any doubt whether a/b is greater or less than c/d are not uniformly distributed. For example, consider 17/81 versus 64/38. Clearly the former is less than 1 and the latter is greater than 1.

It would be interesting to try to assess how often the rule of thumb presented here is correct in practice. You might try to come up with a model for the kinds of fractions people can’t compare instantly, such as proper fractions that have similar size.

[1] Kenzi Odani. A Rough-and-Ready Rule for Fractions. The Mathematical Gazette, Vol. 82, No. 493 (Mar., 1998), pp. 107-109

Python triple quote strings and regular expressions

Posted on 30 January 2021 by John

There are several ways to quote strings in Python. Triple quotes let strings span multiple lines. Line breaks in your source file become line break characters in your string. A triple-quoted string in Python acts something like “here doc” in other languages.

However, Python’s indentation rules complicate matters because the indentation becomes part of the quoted string. For example, suppose you have the following code outside of a function.

x = """\
abc
def
ghi
"""

Then you move this into a function foo and change its name to y.

def foo():
    y = """\
    abc
    def
    ghi
    """

Now x and y are different strings! The former begins with a and the latter begins with four spaces. (The backslash after the opening triple quote prevents the following newline from being part of the quoted string. Otherwise x and y would begin with a newline.) The string y also has four spaces in front of def and four spaces in front of ghi. You can’t push the string contents to the left margin because that would violate Python’s formatting rules. (Update: Oh yes you can! See Aaron Meurer’s comment below.)

We now give three solutions to this problem.

Solution 1: textwrap.dedent

There is a function in the Python standard library that will strip the unwanted space out of the string y.

import textwrap 

def foo():
    y = """\
    abc
    def
    ghi
    """
    y = textwrap.dedent(y)

This works, but in my opinion a better approach is to use regular expressions [1].

Solution 2: Regular expression with a flag

We want to remove white space, and the regular expression for a white space character is \s. We want to remove one or more white spaces so we add a + on the end. But in general we don’t want to remove all white space, just white space at the beginning of a line, so we stick ^ on the front to say we want to match white space at the beginning of a line.

import re 

def foo():
    y = """\
    abc
    def
    ghi
    """
    y = re.sub("^\s+", "", y)

Unfortunately this doesn’t work. By default ^ only matches the beginning of a string, not the beginning of a line. So it will only remove the white space in front of the first line; there will still be white space in front of the following lines.

One solution is to add the flag re.MULTILINE to the substitution function. This will signal that we want ^ to match the beginning of every line in our multi-line string.

    y = re.sub("^\s+", "", y, re.MULTILINE)

Unfortunately that doesn’t quite work either! The fourth positional argument to re.sub is a count of how many substitutions to make. It defaults to 0, which actually means infinity, i.e. replace all occurrences. You could set count to 1 to replace only the first occurrence, for example. If we’re not going to specify count we have to set flags by name rather than by position, i.e. the line above should be

    y = re.sub("^\s+", "", y, flags=re.MULTILINE)

That works.

You could also abbreviate re.MULTILINE to re.M. The former is more explicit and the latter is more compact. To each his own. There’s more than one way to do it. [2]

Solution 3: Regular expression with a modifier

In my opinion, it is better to modify the regular expression itself than to pass in a flag. The modifier (?m) specifies that in the rest of the regular the ^ character should match the beginning of each line.

    y = re.sub("(?m)^\s+", "", y)

One reason I believe this is better is that moves information from a language-specific implementation of regular expressions into a regular expression syntax that is supported in many programming languages.

For example, the regular expression

    (?m)^\s+

would have the same meaning in Perl and Python. The two languages have the same way of expressing modifiers [3], but different ways of expressing flags. In Perl you paste an m on the end of a match operator to accomplish what Python does with setting flasgs=re.MULTILINE.

One of the most commonly used modifiers is (?i) to indicate that a regular expression should match in a case-insensitive manner. Perl and Python (and other languages) accept (?i) in a regular expression, but each language has its own way of adding modifiers. Perl adds an i after the match operator, and Python uses

    flags=re.IGNORECASE

    flags=re.I

as a function argument.

More on regular expressions

[1] Yes, I’ve heard the quip about two problems. It’s funny, but it’s not a universal law.

[2] “There’s more than one way to do it” is a mantra of Perl and contradicts The Zen of Python. I use the line here as a good-natured jab at Python. Despite its stated ideals, Python has more in common with Perl than it would like to admit and continues to adopt ideas from Perl.

[3] Python’s re module doesn’t support every regular expression modifier that Perl supports. I don’t know about Python’s regex module.

Python

Complex AGM

More AGM posts

Letter-like Unicode symbols

Related posts

General tetrahedral numbers

Related posts

Sampling with replacement until you’ve seen everything

Simulation

Exact solution

Computing Stirling numbers

Related posts

Logarithmic spiral

Logarithmic spirals

Finding θ and a

Plots

Non-associative multiplication

Octonions

Matrix multiplication

Related posts

Planetary code golf

Display issues

Related posts

Spaceship operator in Python

Fraction comparison trick

Examples

Rigorous statement

Simulation

Psychology

Related posts

Python triple quote strings and regular expressions

Solution 1: textwrap.dedent

Solution 2: Regular expression with a flag

Solution 3: Regular expression with a modifier

More on regular expressions