Uncategorized

Ways to connect

If you visit this blog once in a while, here are a few ways to hear from me more regularly.

Subscription

You can subscribe to the blog via RSS or email.

I often use SVG images because they look great on a variety of devices, but most email clients won’t display that format. If you subscribe by email, there’s always a link to open the article in a browser and see all the images.

I also have a monthly newsletter. It highlights the most popular posts of the month and usually includes a few words about what I’ve been up to.

Twitter

I have 17 Twitter accounts that post daily on some topic and one personal account.

Twitter icons

The three most popular are CompSciFact, AlgebraFact, and ProbFact. These accounts are a little broader than the names imply. For example, if I run across something that doesn’t fall neatly into another mathematical category, I’ll post it to AlgebraFact. Miscellaneous applied math tends to end up on AnalysisFact.

You can find a list of all accounts and their descriptions here.

I don’t keep up with replies to my topical accounts, but I usually look at replies to my personal account. If you want to be sure I get your message, please call or send me email.

Occasionally people ask whether there’s a team of people behind my Twitter accounts. Nope. Just me. I delegate other parts of my business, but not Twitter. I schedule most of the technical account tweets in advance, but I write them. My personal account is mostly spontaneous.

Contact info

Here’s my contact info.

contact info

My phone number isn’t on there. It’s 832.422.8646. If you’d like, you can import my contact info as a vCard or use the QR code below.

Integrating polynomials over a sphere or ball

Spheres and balls are examples of common words that take on a technical meaning in math, as I wrote about here. Recall the the unit sphere in n dimensions is the set of points with distance 1 from the origin. The unit ball is the set of points of distance less than or equal to 1 from the origin. The sphere is the surface of the ball.

Integrating a polynomial in several variables over a ball or sphere is easy. For example, take the polynomial xy² + 5x²z² in three variables. The integral of the first term, xy², is zero. If any variable in a term has an odd exponent, then the integral of that term is zero by symmetry. The integral over half of the sphere (ball) will cancel out the integral over the opposite half of the sphere (ball). So we only need to be concerned with terms like 5x²z².

Now in n dimensions, suppose the exponents of x1, x2, …, xn are a1, a2, …, an respectively. If any of the a‘s are odd, the integral over the sphere or ball will be zero, so we assume all the a‘s are even. In that case the integral over the unit sphere is simply

2 B(b_1, b_2, \ldots, b_n)

where

B(b_1, b_2, \ldots, b_n) = \frac{\Gamma(b_1) \Gamma(b_2) \cdots \Gamma(b_n)}{ \Gamma(b_1 + b_2 + \cdots + b_n)}

is the multivariate beta function and for each i we define bi = (ai + 1)/2. When n = 2 then B is the (ordinary) beta function.

Note that the integral over the unit sphere doesn’t depend on the dimension of the sphere.

The integral over the unit ball is

\frac{2 B(b_1, b_2, \ldots, b_n)}{ a_1 + a_2 + \cdots + a_n + n}

which is proportional to the integral over the sphere, where the proportionality constant depends on the sum of the exponents (the original exponents, the a‘s, not the b‘s) and the dimension n.

Note that if we integrate the constant polynomial 1 over the unit sphere, we get the surface area of the unit sphere, and if we integrate it over the unit ball, we get the volume of the unit ball.

You can find a derivation for the integral results above in [1]. The proof is basically Liouville’s trick for integrating the normal distribution density, but backward. Instead of going from rectangular to polar coordinates, you introduce a normal density and go from polar to rectangular coordinates.

[1] Gerald B. Folland, How to Integrate a Polynomial over a Sphere. The American Mathematical Monthly, Vol. 108, No. 5 (May, 2001), pp. 446-448.

Emacs features that use regular expressions

The syntax of regular expressions in Emacs is a little disappointing, but the ways you can use regular expressions in Emacs is impressive.

I’ve written before about the syntax of Emacs regular expressions. It’s a pretty conservative subset of the features you may be used to from other environments as summarized in the diagram below.

But there are many, many was to use regular expressions in Emacs. I did a quick search and found that about 15% of the pages in the massive Emacs manual contain at least one reference to regular expressions. Exhaustively listing the uses of regular expressions would not be practical or very interesting. Instead, I’ll highlight a few uses that I find helpful.

Searching and replacing

One of the most frequently used features in Emacs is incremental search. You can search forward or backward for a string, searching as you type, with the commands C-s (isearch-forward) and C-r (isearch-backward). The regular expression counterparts of these commands are C-M-s (isearch-forward-regexp) and C-M-r (isearch-backward-regexp).

Note that the regular expression commands add the Alt (meta) key to their string counterparts. Also, note that Emacs consistently refers to regular expressions as regexp and never, as far as I know, as regex. (Emacs relies heavily on conventions like this to keep the code base manageable.)

A common task in any editor is to search and replace text. In Emacs you can replace all occurrences of a regular expression with replace-regexp or interactively choose which instances to replace with query-replace-regexp.

Purging lines

You can delete all lines in a file that contain a given regular expression with flush-lines. You can also invert this command, specifying which lines not to delete with keep-lines.

Aligning code

One lesser-known but handy feature is align-regexp. This command will insert white space as needed so that all instances of a regular expression in a region align vertically. For example, if you have a sequence of assignment statements in a programming language you could have all the equal signs line up by using align-regexp with the regular expression consisting simply of an equal sign. Of course you could also align based on a much more complex pattern.

Although I imagine this feature is primarily used when editing source code, I imagine you could use it in other context such as aligning poetry or ASCII art diagrams.

Directory editing

The Emacs directory editor dired is something like the Windows File Explorer or the OSX Finder, but text-based. dired has many features that use regular expressions. Here are a few of the more common ones.

You can mark files based on the file names with % m (dired-mark-files-regexp) or based on the contents of the files with % g (dired-mark-files-containing-regexp). You can also mark files for deletion with % d (dired-flag-files-regexp).

Inside dired you can search across a specified set of files by typing A (dired-do-find-regexp), and you can interactively search and replace across a set of files by typing Q (dired-do-find-regexp-and-replace).

Miscellaneous

The help apropos command (C-h a) can take a string or a regular expression.

The command to search for available fonts (list-faces-display) can take a string or regular expression.

Interactive highlighting commands (highlight-regexp, unhighlight-regexp, highlight-lines-matching-regexp) take a regular expression argument.

You can use a regular expression to specify which buffers to close with kill-matching-buffers.

Maybe the largest class of uses for regular expressions in Emacs is configuration. Many customizations in Emacs, such as giving Emacs hints to determine the right editing mode for a file or how to recognize comments in different languages, use regular expressions as arguments.

Resources

You can find more posts on regular expressions and on Emacs by going to my technical notes page. Note that the outline at the top has links for regular expressions
and for Emacs.

For daily tips on regular expressions or Unix-native tools like Emacs, follow @RegexTip and @UnixToolTip on Twitter.

Product review policies

I’ve often reviewed books on this site and may review other products some day. I wanted to let readers and potential vendors know what my policies are regarding product reviews.

I don’t get paid for reviews. I review things that I find interesting and think that readers would find interesting.

I don’t do reviews with strings attached. Most publishers don’t try to attach strings. They simply ask me if I’d like a copy of their book, and that’s that. A couple publishers have tried to exert more control, and I don’t review their books.

I don’t write negative reviews because they’re not interesting. There are millions of products you won’t buy this year. Who cares about another thing not to buy? A negative review could be interesting if it were for a well-known product that many people were thinking about buying, but I haven’t been asked to review anything like that. If I find something disappointing, I don’t write a review.

Books need to be on paper. Electronic files are fine for reference and for short-form reading, but I prefer paper for long-form reading.

I’m open to reviewing hardware if it’s something I would use and something that I think my readers would be interested in. I haven’t reviewed hardware to date, but someone offered me a device that expect to review when it gets here.

Free technical books, mostly chemical engineering

Retiring professor Leonard Fabiano contacted me looking to give away a set of technical books, mostly chemical engineering books. If you’re interested please email him at lenfab@live.com.

Here are the books:

Chemical engineering books

Click on the image to see a larger version.

Two titles are not possible to read in the photo. These are

  • Conduction of heat in solids by Carslaw and Jaeger
  • Molecular Thermodynamics of Fluid Phase Equilibria by Prausnitz

Scholarship versus research

One of the things about academia that most surprised and disappointed me was the low regard for scholarship. Exploration is tolerated as long as it results in a profusion of journal articles, and of course grant money, but is otherwise frowned upon. For example, I know someone who ruined his academic career by writing a massive scholarly book rather than cranking out papers.

I recently ran across an essay [1] in which C. S. Lewis expressed similar concerns sixty years ago, referring to “the incubus of Research.” In the essay he describes a young academic in the humanities who

… far from being able or anxious … to add to the sum of human knowledge, wants to acquire a good deal more of the knowledge we already have. He has lately begun to discover how many things he needs to know in order to follow up his budding interests … To head him off from these studies, to pinfold him in some small inquiry whose chief claim is that no one has made it before is cruel and frustrating. It wastes such years as he will never have again …

My favorite part of the quote is describing research as “some small inquiry whose chief claim is that no one has made it before.” As Lewis said elsewhere, striving for originality can thwart originality.

[1] “Interim Report.” First published in The Cambridge Review in 1956 and reprinted as chapter 17 of Present Concerns.

Intellectual onramps

Tyler Cowen’s latest blog post gives advice for learning about modern China. He says that “books about sequences of dynasties are mind-numbing and not readily absorbed” and recommends finding other entry points before reading about dynasties.

Find an “entry point” into China of independent intrinsic interest to you, be it basketball, artificial intelligence, Chinese opera, whatever.

In a podcast interview—sorry, I no longer remember which one—Cowen talked more generally about finding entry points or onramps for learning big topics. The blog post mentioned above applies this specifically to China, but he gave other examples of coming to a subject through a side door rather than the front entrance. If I remember correctly, he mentioned learning the politics or economics of a region by first studying its architecture or food.

I’ve stumbled upon a number of intellectual onramps through my career, but I haven’t been as deliberate as Cowen in seeking them out. I had no interest in medicine before I ended up working for the world’s largest cancer center. I learned a bit about cancer and genetics from working at MD Anderson and I’ve since learned a little about other areas of medicine working with various clients. Right now I’m working on projects in nephrology and neurology.

Applied math is my onramp to lots of things I might not pursue otherwise. As John Tukey said, you get to play in everyone else’s backyard.

There are many things I’ve tried and failed to learn via a frontal assault. For example, I’ve tried several times to learn algebraic geometry by simply reading a book on the subject. But I find all the abstract machinery mind-numbing and difficult to absorb, just as Cowen described his first exposure to Chinese history. If I’m ever to learn much algebraic geometry, it will start with an indirect entry point, such as a concrete problem I need to solve.

Hypergeometric functions are key

From Orthogonal Polynomials and Special Functions by Richard Askey:

At first the results we needed were in the literature but after a while we ran out of known results and had to learn something about special functions. This was a very unsettling experience for there were very few places to go to really learn about special functions. At least that is what we thought. Actually there were many, but the typical American graduate education which we had did not include anything about hypergeometric functions. And hypergeometric functions are the key to this subject, as I have found out after many years of fighting them.

Emphasis added.

Askey’s book was written in 1975, and he was describing his experience from ten years before that. Special functions, and in particular hypergeometric functions, went from being common knowledge among mathematicians at the beginning of the 20th century to being arcane by mid century.

I learned little about special functions and nothing about hypergeometric functions as a graduate student. I first ran into hypergeometric functions reading in Concrete Mathematics how they are used in combinatorics and in calculating sums in closed form. Then when I started working in statistics I found that they are everywhere.

Hypergeometric functions are very useful, but not often taught anymore. Like a lot of useful mathematics, they fall between two stools. They’re considered too advanced or arcane for the undergraduate curriculum, and not a hot enough area of research to be part of the graduate curriculum.

Related posts:

Toxic pairs, re-identification, and information theory

Database fields can combine in subtle ways. For example, nationality is not usually enough to identify anyone. Neither is religion. But the combination of nationality and religion can be surprisingly informative.

Information content of nationality

How much information is contained in nationality? That depends on exactly how you define nations versus territories etc., but for this blog post I’ll take this Wikipedia table for my raw data. You can calculate that nationality has entropy of 5.26 bits. That is, on average, nationality is slightly more informative than asking five independent yes/no questions. (See this post for how to calculate information content.)

Entropy measures expected information content. Knowing that someone is from India (population 1.3 billion) carries only 2.50 bits of information. Knowing that someone is from Vatican City (population 800) carries 23.16 bits of information.

One way to reduce the re-identification risk of PII (personally identifiable information) such as nationality is to combine small categories. Suppose we lump all countries with a population under one million into “other.” Then we go from 240 categories down to 160. This hardly makes any difference to the entropy: it drops from 5.26 bits to 5.25 bits. But the information content for the smallest country on the list is now 8.80 bits rather than 23.16.

Information content of religion

What about religion? This is also subtle to define, but again I’ll use Wikipedia for my data. Using these numbers, we get an entropy of 2.65 bits. The largest religion, Christianity, has an information content 1.67 bits. The smallest religion on the list, Rastafari, has an information content of 13.29 bits.

Joint information content

So if nationality carries 5.25 bits of information and religion 2.65 bits, how much information does the combination of nationality and religion carry? At least 5.25 bits, but no more than 5.25 + 2.65 = 7.9 bits on average. For two random variables X and Y, the joint entropy H(X, Y) satisfies

max( H(X), H(Y) ) ≤ H(X, Y) ≤ H(X) + H(Y)

where H(X) and H(Y) are the entropy of X and Y respectively.

Computing the joint entropy exactly would require getting into the joint distribution of nationality and religion. I’d rather not get into this calculation in detail, except to discuss possible toxic pairs. On average, the information content of the combination of nationality and religion is no more than the sum of the information content of each separately. But particular combinations can be highly informative.

For example, there are not a lot of Jews living in predominantly Muslim countries. According to one source, there are at least five Jews living in Iraq. Other sources put the estimate as “less than 10.” (There are zero Jews living in Libya.)

Knowing that someone is a Christian living in Mexico, for example, would not be highly informative. But knowing someone is a Jew living in Iraq would be extremely informative.

More information

Why don’t you simply use XeTeX?

From an FAQ post I wrote a few years ago:

This may seem like an odd question, but it’s actually one I get very often. On my TeXtip twitter account, I include tips on how to create non-English characters such as using \AA to produce Å. Every time someone will ask “Why not use XeTeX and just enter these characters?”

If you can “just enter” non-English characters, then you don’t need a tip. But a lot of people either don’t know how to do this or don’t have a convenient way to do so. Most English speakers only need to type foreign characters occasionally, and will find it easier, for example, to type \AA or \ss than to learn how to produce Å or ß from a keyboard. If you frequently need to enter Unicode characters, and know how to do so, then XeTeX is great.

One does not simply type Unicode characters.

Related posts:

Team dynamics and encouragement

When you add people to a project, the total productivity of the team as a whole may go up, but the productivity per person usually goes down. Someone suggested that as a rule of thumb, a company needs to triple its number of employees to double its productivity. Fred Brooks summarized this saying

“Many hands make light work” — Often
But many hands make more work — Always

I’ve seen this over and over. But I think I’ve found an exception. When work is overwhelming, a lot of time is absorbed by discouragement and indecision. In that case, new people can make a big improvement. They not only get work done, but they can make others feel more like working.

Flood cleanup is like that, and that’s what motivated this note. Someone new coming by to help energizes everyone else. And with more people, you see progress sooner and make more progress, in a sort of positive feedback loop.

This is all in the context of fairly small teams. There must be a point where adding more people decreases productivity per person or even total productivity. I’ve heard reports of a highly bureaucratic relief organization that makes things worse when they show up to “help.” The ideal team size is somewhere between a couple discouraged individuals and a bloated bureaucracy.

Related post: Optimal team size

Relearning from a new perspective

I had a conversation with someone today who said he’s relearning logic from a categorical perspective. What struck me about this was not the specifics but the pattern:

Relearning _______ from a _______ perspective.

Not relearning something forgotten, but going back over something you already know well, but from a different starting point, a different approach, etc.

Have any experiences along these lines you’d like to share in the comments? Anything you have relearned, attempted to relearn, or would like to relearn from a new angle?

Hurricane Harvey update

As you may know, I live in the darkest region of the rainfall map below.

Hurricane Harvey rainfall map

My family and I are doing fine. Our house has not flooded, and at this point it looks like it will not flood. We’ve only lost electricity for a second or two.

Of course not everyone in Houston is doing so well. Harvey has done tremendous damage. Downtown was hit especially hard, and apparently they are in for more heavy rain. But it looks like the worst may be over for my area.

Update (5:30 AM, August 28): More flooding overnight, some of it near by. We’re still OK. It looks like the heaviest rain is over, but there’s still rain in the forecast and there’s no place for more rain to go.

Houston has two enormous reservoirs west of town that together hold about half a billion cubic meters of water. This morning they started releasing water from the reservoirs to prevent dams from breaking.

Space City Weather has been the best source of information. The site offers “hype-free forecasts for greater Houston.” It’s a shame that a news source should have to describe itself as “hype-free,” but they are indeed hype-free and other sources are not.

Update (August 29): Looks like the heavy rain is over. We’re expecting rain for a few more days, but the water is receding faster than it’s collecting, at least on the northwest side.