Brace expansion tree

Here’s a crazy bash one-liner I found via an article by Peter Krumins:

echo {w,t,}h{e{n{,ce{,forth}},re{,in,fore,with{,al}}},ither,at}

This prints 30 English words:

when, whence, whenceforth, where, wherein, wherefore, wherewith, wherewithal, whither, what, then, thence, thenceforth, there, therein, therefore, therewith, therewithal, thither, that, hen, hence, henceforth, here, herein, herefore, herewith, herewithal, hither, hat

This post will explain how the one-liner works.

Bash brace expansion iterates through all possibilities listed within curly braces, with possibilities separated by a comma. Note that the comma is a separator and not a terminator. And so, for example, the expression {w,t,} is effectively {w,t,""}.

When bash sees two brace expressions, these expand to the cartesian product of the two expressions. For example,

echo {A,B}{1,2,3}

produces

A1 A2 A3 B1 B2 B3

In the expression above we have

{w,t,}h{e…,ither,at}

So the expansion will enumerate all possibilities of {w,h,} multiplied by all possibilities of {e…,ither,at} where e… is itself a brace expression.

A diagram will help a lot.

The brace expansion does a depth-first traversal of this tree.

When will the decimals in a/b repeat?

The previous post looked at how many digits are in the reduced fraction for the nth harmonic number. I was curious about how long the cycle of digits in a harmonic number might be.

I wrote about the period length for the digits of fractions almost a decade ago. This post includes code so I can apply it to harmonic demoninators.

from sympy import lcm, factorint, n_order

def period(n):
    factors = factorint(n)
    exp2 = factors.get(2, 0)
    exp5 = factors.get(5, 0)
    r = max(exp2, exp5)

    d = n // (2**exp2 * 5**exp5)
    s = 1 if d == 1 else n_order(10, d)
    return (r, s)

This function returns two numbers: r is the number of non-repeating digits at the beginning and s is the length of the repeating part.

The following code

from functools import reduce

def lcm_range(n):
    return reduce(lcm, range(1, n + 1))

print( period( lcm_range(50) ) )

prints (5, 1275120) meaning that 1/lcm(1, 2, 3, …, 49, 50) has five non-repeating digits following by 1,275,120 digits that repeat ad infinitum. And so the decimals in the expansion of H50 go have a cycle length of 1,275,120.

Height of harmonic numbers

The previous post looked at writing the harmonic numbers as reduced fractions and estimating the number of digits in the numerator and denominator based on asymptotics. This is a follow up post with plots.

We’ll choose our base b to be 2. And we’ll look at the total number of bits in both the numerator and denominator, which we will use as the height of the fractions.

First, let’s look at the actual and estimated heights, using the estimates from the previous post.

Next let’s look at the difference between the actual and estimated heights.

In the previous post I looked at n = 50, which was kind of a lucky choice, the error being smaller than usual. I had also looked at, but didn’t publish, n = 100, which would be an unlucky choice.

Finally, let’s look at the relative error in the estimates, and plot over a larger range of n.

The error goes to zero, as predicted by the asymptotic estimates. And it goes noisily, which you’d expect since the heights are related to the distribution of primes.

Writing down harmonic numbers

The nth harmonic number is the sum of the reciprocals of the first n positive integers.

Hn = 1 + 1/2 + 1/3 + 1/4 + … + 1/n

The product of all the denominators is n!, so you could write Hn as a fraction

Hn = p/q

where pn! Hn is an integer and qn!.

While p/q is a way to write Hn as a fraction, it’s not the most efficient because p and n! will have common factors.

If we write Hn as a reduced fraction, the denominator will be the least common multiple of the integers 1 through n. That number is asymptotically exp(n). That estimate follows from the prime number theorem.

So for large n the denominator will be roughly exp(n), and in base b it would have around

n/log(b)

digits.

The numerator will be exp(n) Hn, and since Hn is asymptotically log(n) + γ, the numerator for large n will be roughly

exp(n) (log(n) + γ)

and will have around

(n + log log(n) ) / log(b)

digits.

Let’s see how well our asymptotic estimates work for n = 50. The 50th harmonic number is

H50 = 13943237577224054960759 / 3099044504245996706400.

This fraction has 23 digits in the numerator and 22 in the denominator. We would have predicted around

(50 + log(log(50)))/log(10) = 22.3

digits in the numerator and

50/log(10) = 21.7

digits in the denominator.

Let’s try a larger example, looking at the 1000th harmonic number in binary. We’ll use the following Python code.

from fractions import Fraction

def bits(n):
    H = sum(Fraction(i, i+1) for i in range(1, n+1))
    p, q = H.numerator, H.denominator
    # subtract 2 because bin returns a string starting with 0b.
    return len(bin(p)) - 2, len(bin(q)) - 2

print(bits(1000))

This returns 1448 and 1438. We would have estimated

(1000 + log(log(1000)))/log(2) = 1445.4

bits in the numerator and

1000/log(2) = 1442.7

bits in the denominator.

Update: See the next post for plots as a function of n.

Related posts

Hart’s theorem

Hart’s theorem says

If a triangle be formed by the arcs of three circles, the inscribed and the three escribed circles are all tangent to a new circle or line.

Here “triangle” means a three-sided figure whose sides are portions of a circle. The inscribed circle is the largest circle that can fit inside the three-sided figure.

The “escribed” circles are analogous to the excircles in the previous post: you extend two sides and find a circle that is tangent to the triangle side and the extended side. The difference here being that the side extensions are now circles.

Incircles and Excircles of Pythagorean triangles

This post will reveal the connection between my two previous posts: one on the Star Trek lemma and one on Pythagorean triples.

In the process of writing the latter, I looked at the Wikipedia article on Pythagorean triples and noticed this curious paragraph.

In every Pythagorean triangle, the radius of the incircle and the radii of the three excircles are positive integers. Specifically, for a primitive triple the radius of the incircle is r = n(m − n), and the radii of the excircles opposite the sides m2 − n22mn, and the hypotenuse m2 + n2 are respectively m(m − n), n(m + n), and m(m + n).

The citation for the paragraph above was the book by my former officemate, which led to the post on the Star Trek lemma. The passage in Arthur Baragar’s book that Wikipedia cites is Exercise 15.3.

Let ΔABC be a right angle triangle with sides of integer length. Prove that the inradius r and the exradii ra, rb, and rc are all integers.

I don’t know whether Arthur discovered this theorem, but I’ll call it Baragar’s theorem for this post.

Incircles and excircles

To unpack Baragar’s theorem, let’s start by saying what incircles and excircles are. Incircles are more familiar. The incircle of a triangle is the largest circle that can be inscribed inside the triangle, and the radius of this circle is the inradius.

Since an incircle is an inscribed circle, you might expect an excircle to be a circumscribed circle, but that’s not it. There are three excircles, one for each side. To find the excircle for a side, extend the other two sides and find the circle tangent to the side and the two extensions. The radius of an excircle is its exradius.

Proof

Baragar’s theorem follows directly from Euclid’s formula for Pythagorean triples mentioned in the previous post

\begin{align*} a &= m^2 - n^2 \\ b &= 2mn \\ c &= m^2 + n^2 \end{align*}

and formulas for the inradius r and the exradii ra, rb, and rc.

\begin{align*} r &= \frac{K}{s} \\ r_a &= \frac{K}{s - a} \\ r_b &= \frac{K}{s - b} \\ r_c &= \frac{K}{s - c} \\ \end{align*}

Here K is the area of the triangle, which in our case is ab/2, and s is the semiperimeter, half the perimeter.

Expressing the radii in terms of m and n gives the values cited by Wikipedia above.

Illustrating the theorem

I’d like to write a Python script to illustrate the theorem, and knowing the radii of the circles help, but we also need to know the centers of the circles.

The center of the incircle is the weighted average of the vertices, with weights given by the lengths of the opposite sides. That is, if the vertices are AB, and C, and the sides opposite these vertices are ab, and c, the the incenter is

I = \frac{aA + bB + cC}{a + b + c}

The centers for the excircles have remarkably similar expressions. For the incenter of the circle opposite a vertex, flip the sign of the corresponding side.

\begin{align*} I_a = \frac{-aA + bB + cC}{-a + b + c} \\ I_a = \frac{aA - bB + cC}{a - b + c} \\ I_a = \frac{aA + bB - cC}{a + b - c} \\ \end{align*}

Python code

Putting it all together, here’s an illustrate the theorem.

And here’s the code that produced it. Note that everything in this section works for right triangles in general, not just Pythagorean triangles.

import numpy as np
import matplotlib.pyplot as plt

def connect(A, B):
    plt.plot([A[0], B[0]], [A[1], B[1]], "C0")

def draw_circle(c, r, color):
    t = np.linspace(0, 2*np.pi)
    plt.plot(r*np.cos(t) + c[0], r*np.sin(t) + c[1], color=color)

a, b, c, = 3, 4, 5

A = np.array([0, b])
B = np.array([-a, 0])
C = np.array([0, 0])

s = (a + b + c)/2
K = a*b/2
r = K/s
ra = K/(s - a)
rb = K/(s - b)
rc = K/(s - c)
I = (a*A + b*B + c*C)/(a + b + c)
Ia = (-a*A + b*B + c*C)/(-a + b + c)
Ib = (a*A - b*B + c*C)/(a - b + c)
Ic = (a*A + b*B - c*C)/(a + b - c)

draw_circle(I, r, "C1")
draw_circle(Ia, ra, "C2")
draw_circle(Ib, rb, "C3")
draw_circle(Ic, rc, "C4")

plt.plot([-2*rc, 2*rb], [0, 0], "C0")
plt.plot([0, 0], [-2*ra, 2*rc], "C0")
plt.plot([(-2*ra - b)*a/b, 2*rb], [-2*ra, 2*rb*b/a + b], "C0")

plt.gca().set_aspect("equal")
plt.show()

Consecutive Pythagorean triangle sides

In this post we find all Pythagorean triples that contain consecutive numbers, all Pythagorean triples (abc) such that a + 1 = b or b + 1 = c.

a + 1 = b

George Osborne wrote a paper [1] addressing the question of when the squares of two consecutive numbers is also a square. Geometrically this is asking for primitive Pythagorean triples for which the legs are consecutive integers.

He proved that the sequence shorter legs satisfies the recurrence relation

u_{n+2} = 6 u_{n+1} - u_{n+1} + 2
with initial conditions u0 = 0 and u1 = 1. This is OEIS sequence A001652.

The method for solving recurrences like the one above is analogous to the method for solving linear differential equations. See a solution here. This gives us the following formula for the terms:

u_n = \dfrac{1 + \sqrt{2}}{4} \left(3 + 2\sqrt{2}\right)^n + \dfrac{1 - \sqrt{2}}{4} \left(3 - 2\sqrt{2}\right)^n - \dfrac{1}{2}

b + 1 = c

It’s also possible for the longer side and hypotenuse of a Pythagorean triangle to be consecutive numbers, as in the (5, 12, 13) triangle.

All primitive Pythagorean triples are given by Euclid’s formula

\begin{align*} a &= m^2 - n^2 \\ b &= 2mn \\ c &= m^2 + n^2 \end{align*}

with integers m > n > 0. If b and c are consecutive numbers, then

c - b = 1 = m^2 + n^2 - 2mn = (m -n)^2

and so mn + 1. Therefore all possible values of b are given by 2n(n + 1) for n > 1.

[1] Geo. A. Osborne. A Problem in Number Theory. The American Mathematical Monthly, Vol. 21, No. 5 (May, 1914), pp. 148-150

The Star Trek lemma

I was reading an article this evening and saw a footnote to a book by Arthur Baragar [1]. This caught my eye because he was my officemate at UT for a year.

I found his book on Archive.org and was surprised to see “The Star Trek Lemma” in the table of contents. What could this be?

It’s a theorem that goes back to Euclid that applies to an angle formed by connecting a point to two other points on a circle. The theorem says “The measure of an inscribed angle is half the measure of the arc it subtends.” But why call it the Star Trek lemma? Quoting Arthur:

In the spirit of Euclid, we will refer to this theorem as the Star Trek lemma because of the figure associated with the statement of the theorem. … Before Star Trek, as far as I know, this theorem had no name, though some might call it Euclid III.20, which is its proposition number in Euclid’s Elements (Book III, Proposition 20).

Here is my reconstruction of the figure given in the book.

Baragar's illustration of the Star Trek lemma

The lemma says that ∠BAC is half of ∠BOC.

[1] Baragar, Arthur (2001), A Survey of Classical and Modern Geometries: With Computer Activities, Prentice Hall

 

Regular expressions that work “everywhere”

The most frustrating aspect of regular expressions is that implementations vary. Features supported in one tool may not be supported at all in another tool, or they may be supported with slightly different syntax.

I learned regular expressions in the context Perl, a maximalist regex environment. This led to frustration when features I expect to work are missing [1]. One way around this is to use Perl analogs of other tools, but this is very non-standard. I want to be able to send colleagues and clients code that works out of the box.

As I mentioned in my post on computational survivalism, I occasionally need to work on computers that I cannot install software on. So a better approach is to identify a subset of regex features that work everywhere. The stricter your definition of “everywhere” the less this includes. The strictest subset would be

  • literals
  • character classes […]
  • the special characters . * ^ $

A more relaxed definition of “everywhere” would be the tools you most care about. Currently the tools I most want to use with regular expressions are sed, awk, grep, and Emacs.

Awk as lowest common denominator

If you use the Gnu versions of sed, awk, and grep, and use the -E option with sed and grep, then the list of common features is bigger. The regular expression features of of the three tools are similar, and awk’s features are supported in the other tools, with one exception: word boundaries in awk are \< and \> rather than \b and \B.

I wrote about Awk’s regex features here.

Emacs as the oddball

Emacs supports analogs of most of awk’s regex features. However, the characters

    + ? ( ) { } |

all require a backslash in front in order to act like the awk counterparts. Also, the analog of \s and \S in awk is \s- and \S- in Emacs.

Instead of meaning space or nonspace, \s and \S in Emacs begin a (negated) character class, and one of those classes is - for space. But there are many others. For example, \s. stands for a punctuation character and \S. stands for a non-punctuation character.

What works everywhere

So for my definition of “everywhere,” with the caveats mentioned above, the following features work everywhere. YMMV.

    .
    ^, $
    […], [^…]
    *
    \w, \W, \s, \S
    \1 - \9 backreferences
    \b \B
    ? + 
    | alternation
    {n,m} for counting matches
    (...) capturing

One footnote is that gawk supports backreferences in replacement strings but not in regular expressions per se.

[1] To some extent, basic Perl features work elsewhere and advanced features do not, depending on your idea of what is basic or advanced. I think of look-around features as advanced, and that tracks. But I think of \d for digits as basic, but that’s not supported in many regex flavors.

Lobachevsky’s integral formula

Let f be an even function with period π. Then the following remarkable theorem by Lobachevsky holds.

\int_0^\infty \frac{\sin^2 x}{x^2} f(x) \, dx = \int_0^\infty\frac{\sin x} x f(x) \, dx = \int_0^{\pi/2} f(x) \, dx

This theorem is useful in Fourier analysis and signal processing. It’s useful to know even in the special case f(x) = 1.

For a “jinc” analog, see this paper.

***

Every time I see the name Lobachevsky I think of Tom Lehrer’s song about him. You can find the words here and the audio here.

Related posts