Quasi-random sequences in art and integration

Sometimes when people say they want random points, that’s not what they really want. Random points clump more than most people expect. Quasi-random sequences are not random in any mathematical sense, but they might match popular expectations of randomness better than the real thing, especially for aesthetic applications. If by “random” someone means “scattered around without a clear pattern and not clumped together” then quasi-random sequences might do the trick.

Here are the first 50 points in a quasi-random sequence of points in the unit square.

quasi-random points

By contrast, here are 50 points in a unit square whose coordinates are uniform random samples.

random points

The truly random points clump together. Notice the cluster of three points in the top right corner. There are few other instances of pairs of points being very close together. Also, there are fairly large areas that don’t contain any random points. The quasi-random points by contrast are better spread out. They have a self-avoiding property that keeps them from clustering, and they fill the space more efficiently.

Quasi-random sequences could be confused with pseudo-random sequences. They’re not at all the same thing. Pseudo-random sequences are computer-generated sequences that in many ways behave as if they were truly random, even though they were produced by deterministic algorithms. For many practical purposes, including sensitive statistical tests, pseudo-random sequences are simply random sequences. (The “truly” random points above were technically “pseudo-random” points.)

The quasi-random points above were part of a Sobol sequence, a common quasi-random sequence. Other quasi-random sequences include the Halton sequence and the Hammersley sequence. Mathematically, these sequences are defined has having low-discrepancy. Roughly speaking, this means the “discrepancy” between the number of points that actually fall in a volume and the number of points you’d expect to fall in the same volume is small. See the Wikipedia article on quasi-random sequences for more mathematical details.

Besides being aesthetically useful, quasi-random sequences are useful in applied mathematics. Because these sequences explore a space more efficiently than random sequences, quasi-random sequences sometimes lead to more efficient high-dimensional integration algorithms than Monte Carlo integration. Quasi-Monte Carlo integration, i.e. integration based on quasi-random sequences rather than random sequences, is popular in financial applications. Art Owen has written insightful papers on Quasi-Monte Carlo integration (QMC). He has classified which integration problems can be efficiently computed via QMC methods and which cannot. In a nutshell, QMC works well when the effective dimension of a problem is significantly lower than the actual dimension. For example, a financial model might ostensibly depend on 1000 variables, but 50 of those variables contribute far more to the integrand than all the other variables. The integrand might essentially be a function of only 50 variables. In that case, QMC will work well. Note that it is not necessary to identify these 50 variables or do any change of variables. QMC just magically takes advantage of the situation.

One disadvantage of QMC integration is that it doesn’t naturally lead to an estimate of its own accuracy, unlike Monte Carlo integration. Several hybrid approaches have been proposed to combine QMC integration and Monte Carlo integration to get the efficiency of the former and the error estimates of the latter. For example, one could randomly jitter the quasi-random points or randomly permute their components. Some of these results are in Art Owen’s papers.

To read more about quasi-random sequences, see the book Random Number Generation and Quasi-Monte Carlo Methods.

What happened to XSLT?

Around 2000, some people believed that nearly all programming would be a matter of transporting and transforming XML. XML would be the universal data format, and all software would be a matter of transforming XML. If that were the case, then it would be very handy to use a language designed especially for transforming XML. That language was XSLT.

I’m sure some programmers write XSLT on a daily basis, and there’s probably a large amount of machine-generated XSLT out there. But it’s safe to say XSLT never became extremely popular. Maybe because the syntax is awkward. Not many developers like to write lots of angle brackets. Or maybe the declarative/functional style of the language isn’t as comfortable as a more imperative style of programming for most developers. I don’t really know. I thought XSLT sounded like a good idea, but I never learned it. My experience with XSLT consists of a few dozen lines I wrote six or seven years ago.

Along these lines, I ran across the following picture in a blog post entitled A picture is worth a thousand words: is XML dying? containing the following photo:

Sharps and flats in HTML

Apparently there’s no HTML entity for the flat symbol, ♭. In my previous post, I just spelled out B-flat because I thought that was safer; it’s possible not everyone would have the fonts installed to display B♭ correctly.

So how do you display music symbols for flat, sharp, and natural in HTML? You can insert any symbol if you know its Unicode value, though you run the risk that someone viewing the page may not have the necessary fonts installed to view the symbol. Here are the Unicode values for flat, natural, and sharp.

Since the flat sign has Unicode value U+266D, you could enter ♭ into HTML to display that symbol.

The sharp sign raises an interesting question. I’m sure most web pages referring to G-sharp would use the number sign # (U+0023) rather than the sharp sign ♯ (U+266F). And why not? The number sign is conveniently located on a standard keyboard and the sharp sign isn’t. It would be nice if people used sharp symbols rather than number signs. It would make it easier to search on specifically musical terms. But it’s not going to happen.

Update: See this post on font support for Unicode. Most people can see all three symbols, but some, especially Android users, might not see the natural sign.

Related posts

Typesetting music with LilyPond

I tried typesetting music in LaTeX some time ago and gave up. The packages I found were hard to install, the examples didn’t work, etc. This weekend I decided to try again. I tried plowing through the MusiXTeX documentation and got no further than I did last time.

I posted a note on StackOverflow and got some good responses. Nikhil Chelliah suggested I look at LilyPond. I had looked at LilyPond before, and @jleedev explained how to integrate LaTeX and LilyPond.

Here’s some sheet music I included in my previous post, March in 7/4 time.

sheet music example

Here’s a full-sized PDF file version of the music above. And here’s the LilyPond source code used to create the music.

\relative c' {
\time 7/4
\key f \major
\clef treble
f g f \times 2/3{ c8 c c} f4 g a
g a8. bes16 a4 g f g c,
f g f \times 2/3{ c8 c c} f4 g a
g a8. bes16 a4 g f e f
}

The notation looks cryptic at first, but it makes sense after a few minutes. The command relative c' means that the following pitches will be relative to middle C. For example, the first note, F, is the F closest to middle C. Each note is the same length as the previous note by default, and the first note is a quarter note by default. The notation c8 means that the C is an eighth note, except it’s in the context of a triplet (times 2/3) and so it’s an eighth note triplet. The next F is denoted f4 to indicate that we’re back to quarter notes.

The notation a8. says that the A is a dotted eighth note. For the next note, bes16 means a B-flat sixteenth note. The suffix “es” stands for “flat” and “is” stands for “sharp.” (The documentation says it’s Dutch. I’ve never seen it before.) I don’t understand why I had to tell it that the B was flat. The code specified earlier that the key was F major, which implies B’s are flat. I suppose the code for individual notes is decoupled from the code to draw the key signature. That would make entering music painful in keys that have lots of sharps or flats. Maybe there’s a way to specify default sharps or flats.

The comma in c, gives the absolute pitch of the C. In relative mode, LilyPond assumes by default that each pitch name refers to the pitch closest to its predecessor. The C closest to the previous note, F, would have been the C up one fourth rather than down one fifth, so the comma was necessary to tell LilyPond to go down.

If I were to do a lot of music processing, I’d probably look at a commercial package such as Sibelius. But for now I’m just interested in producing small excerpts like that above, and it looks like LilyPond may be fine.

Update: I double checked the rules about flats etc. Yes, I do have to specify explicitly that the B in this example is B-flat. If I just say b rather than bes, LilyPond will add a natural sign in front of the B! It’s strange. It is aware of the key signature: when I tell it the B is flat, it says “OK, then I don’t have to mark that specially since it’s implicit in the key signature.” And if I don’t tell it the B is flat, it says “Oh, that’s an exception to the key signature. Better mark it with a natural sign.”

March in 7/4 time

After writing my post on music in 5/4 time, I remembered a march in 7/4 time that I played in band many years ago. Here’s an excerpt, about all I can remember.

sheet music example

In case the music above is too hard to read, here’s a full-sized PDF file version.

Marches are always in even meters: your left foot has to come down on the first beat of every measure when you’re marching. And yet this odd meter tune comes across as a convincing march. (It was a concert march. Actually marching to it would have been odd, pun intended.)

This march had a 4/4 + 3/4 feel, emphasis on the first and fifth beats of each 7/4 measure.

I’ve just started blogging about music recently, and I’ve got a lot to learn. I’m not set up to record audio clips. My next post will describe the software I used to post the sheet music above.

Update: Many thanks to Nikhil Chelliah for identifying the march. It’s the first movement from Third Suite by Robert Jager. The sheet music and a sound clip are available here.

Related post: Blue Rondo à la Turk

Beatbox flute

Greg Pattillo plays beatbox flute. It’s hard to imagine what that means until you hear it. Here’s a video of Pattillo playing the theme from Sesame Street.

And here’s a video of Pattillo playing Peter and the Wolf.

Fractional derivatives

Is there a way to make sense of the nth derivative of a function when n is not a positive integer?

The notation f(n) is usually introduced in calculus classes in order to make Taylor’s theorem easier to state:

f(x) = \sum_{n=0}^\infty f^{(n)}(0) \frac{x^n}{n!}

To make the above statement work, the 0th derivative is defined to be the function itself, i.e. don’t take any derivatives. This makes a modest extension of the notation f(n) from requiring n to be a positive integer to being a non-negative integer. Can we make sense of the case when n is negative or non-integer? Before answering that question, let’s think about what the fractional derivative might be in some special cases if we could define it.

When we take derivatives of powers of x, we get factorial-like coefficients. The first derivative of xm is m xm-1, the second derivative is m(m-1) xm-2, the third derivative is m(m-1)(m-2) xm-3, etc. We can use the gamma function to extend the factorial function to non-integer argument, so maybe we could do the same to compute non-integer derivatives. If m > -1, and n is a positive integer, the nth derivative of xm is (m!/(mn)!) xmn. We could rewrite this as (Γ(m+1) / Γ(mn+1)) xmn. The result holds for integer values of n, and so we could hope it holds for non-integer values of n.

If n is an integer and we take the nth derivative of ebx we get bn ebx. We might guess that for non-integer values of n the same formula holds.

It is indeed possible to define derivatives of order n for non-integer values of n, and the speculations above are correct, subject to some conditions. In fact there are several ways to define non-integer derivatives and the differences can be complicated.

What about negative derivatives? Well, it makes sense that these could be anti-derivatives, i.e. integrals. We could define, for example, the -3rd derivative of f(x) to be a function whose third derivative is f(x). However, anti-derivatives are only determined up to a constant. We could use the Fundamental Theorem of Calculus to uniquely specify anti-derivatives if we agree on a lower limit of integration c, such as c = 0 or maybe c = -∞.

f^{(-1)}(x) = \int_c^x f(t)\, dt

Here’s one way fractional derivatives could be defined. Suppose the Fourier transform of f(x) is g(ξ). Then for positive integer n, the nth derivative of f(x) has Fourier transform (2π i ξ)n g(ξ). So you could take the nth derivative of f(x) as follows: take the Fourier transform, multiply by (2π i ξ)n, and take the inverse Fourier transform. This suggests the same procedure could be used to define the nth derivative when n is not an integer.

Fractional derivatives have practical uses. The book An Atlas of Functions makes frequent use of fractional derivatives, especially derivatives of order 1/2 and –1/2, to show connections between different classes of functions.

Related article: Generalizing binomial coefficients

Shorter URLs by using Unicode

Tinyarro.ws is a service like tinyurl.com and others that shorten URLs. However, unlike similar services, Tinyarro.ws uses Unicode characters, allowing it to encode more possibilities into each character. These sub-compact URLs may contain Chinese characters, for example, or other symbols unfamiliar to many users. They’re no good for reading aloud, say over the phone or on a podcast. But they’re ideal for Twitter because you only have to click on the link, not type it into a browser.

Here’s a URL I got when I tried the Tinyarro.ws site:

screen shot from tinyarro.ws

The resulting URL may not display correctly in your browser depending on what fonts you have installed: http://➡.ws/㣸.

I pasted the URL into Microsoft Word and used Alt-x to see what the Unicode characters were. (See Three ways to enter Unicode characters in Windows.) The arrow is code point U+27A1 and the final character is code point U+38F8. I have no idea what that character means. I would appreciate someone letting me know in the comments.

Unicode character U=38F8

Related post: How to insert graphics in Twitter messages

How to grep Twitter

Twitter has an extensive search API. To build the URL for a query, start with the base http://search.twitter.com/search.atom?q=. To search for a word, just append that word to the base, such as http://search.twitter.com/search.atom?q=Coltrane to search for tweets containing “Coltrane.”

To search for a term within a particular user’s tweet stream, start with the base URL and append +from%3A and the user’s name. (The %3A is a URL- encoded colon.) See the search API page for other options, such as specifying the number of requests per page to return (look for rpp) or restricting the language (look for lang).

As far as I can tell, the API does not support regular expressions, but you could loop over the search results programmatically. Here’s how you’d do it in PowerShell.

First, grab the search results as a string. Say we want to search through the latest tweets from Hal Rottenberg, @halr9000.

$base = "http://search.twitter.com/search.atom?q="
$query = "from%3Ahalr9000"
$str = (new-object net.webclient).DownloadString($base + $query)

Now $str contains an XML string of results formatted as an Atom feed. The root element is <feed> and individual tweets are contained in <entry> tags under that. The text of the tweet is in the <title> tag and the other tags contain auxiliary data such as time stamps. The following code shows how to search for the regular expression d{4}. (Look for four-digit numbers.)

([ xml] $str).feed.entry | where-object {$_.title -match "d{4}"}

In English, the code says to cast $str to an XML document object and pipe the <entry> contents to the filter that selects those objects whose <title> strings match the regular expression.

The search API limits the number of entries it will return, so it’s best to do as much filtering as you can via the Twitter site before doing your own filtering.

Related posts

Regular expressions in PowerShell and Perl
Table-driven text munging in PowerShell