Liminal and subliminal

It occurred to me for the first time this morning that the words liminal and subliminal must be related, just after reading an article by Vicki Boykis that discusses liminal spaces.

I hear the two words in such in different contexts—architecture versus psychology—and hadn’t thought about the connection until now. If I were playing a word association game, my responses would be these.

Q: Liminal?

A: Spaces.

Q: Subliminal?

A: Message.


I checked Etymonline to verify that the two words are indeed cognate. Both come from the Latin word limis for threshold. Something is subliminal if it is below the threshold, typically the threshold of consciousness.


Surely the word subliminal is far more common than liminal. To verify this, I turned to Google’s Ngram Viewer. I’ve included a screenshot below, and you can find the original here.

Ngram of liminal vs subliminal

It’s not surprising that subliminal was a popular term during the career of Sigmund Freud. He published The Interpretation of Dreams in 1899 and died in 1939.

What is surprising, at least to me, is that the word liminal has been gaining popularity and passed subliminal around the turn of the century. I didn’t expect liminal to be anywhere near as common as subliminal.


Google’s Ngram data comes from books. Word frequencies in books can be very different than word frequencies in common speech or other writing as this example shows. I can’t recall ever hearing someone use liminal in conversation. Maybe civil engineers and architects hear it all the time. As I type this, my spell checker puts a red squiggly line under every instance of liminal, showing that the word is not in its default dictionary, though it does recognize subliminal.

Related posts

On this day

This morning as a sort of experiment I decided to look back at all my blog posts written on May 30 each year. There’s nothing special about this date, so I thought it might give an eclectic cross section of things I’ve written about.


Last year on this day I wrote about Calendars and continued fractions, based on a connection between the two topics I found in the book Calendrical Calculations.


Two years ago on this day I wrote about Color theory questions. I’ve been interested in color theory off and on for a while. At one point I thought I might “get to the bottom” of it and figure everything out to my satisfaction. I’ve since decided that color theory is a bottomless well: there’s no getting to the bottom of it. I might pick it back up some day with the more modest goal of learning a little more than I currently know.


I didn’t write a post on May 30 in 2014, 2015, or 2016.


On this day in 2013, I wrote a riff on a quote from Matt Briggs to the effect that there are no outliers, only measurements that don’t fit with your theory.


In 2012 on this day I posted Writing software for someone else. Most of what I’ve read about software development does not make the distinction between writing software for yourself and writing software for someone else, or at least does not emphasize the distinction.

When computer science students become professional programmers, they have to learn empathy. Or at least ideally they learn empathy. They go from completing homework assignments to writing programs that other people will work on and that other people will use. They learn “best practices,” best in this new context.

I made the opposite transition a few months after writing that post when I left MD Anderson Cancer Center to go out on my own. It took a while for me to decide what works best for me, mostly writing software for my own use. Sometimes I deliver software to clients, but more often I deliver reports that require me to write software that the client isn’t directly interested in.


My post for May 30, 2011 was just a quote from Richard Feynman speculating that in the long run, the development of Maxwell’s equations will be seen as the most important event of the 19th century.


In 2010 on this day I posted a quote from Paul Buchheit about the effect of suddenly acquiring wealth. For most people it would not be a good thing.


The post for May 30 in 2009 was called Killing too much of a tumor. You can actually make a tumor more harmful by killing off portions that were suppressing its growth. Reminds me now of how in war you want to leave enough of the enemy’s command in tact that they have the ability to surrender.


Finally, on this day in 2008 I announced that I’d started a web site at reproducibleresearch.org. I later gave the URL to people who had started a similar site with the same name, but ending in .org.

Promoting reproducible research seemed like a somewhat quixotic project at the time, but fortunately it has gained traction since then.


Is there a common theme in these posts? They are all about things that interest me, but that’s necessarily the case since they’re on my blog. One thing that surprises me is that the posts are not particularly mathematical. I would have expected that a quasi-random sample of posts would have turned up more math. But I did write about cancer and software development more when I worked in a cancer center and managed software developers.

Unifiers and Diversifiers

I saw a couple tweets this morning quoting Freeman Dyson’s book Infinite in All Directions.

Unifiers are people whose driving passion is to find general principles which will explain everything. They are happy if they can leave the universe looking a little simpler than they found it.

Diversifiers are people whose passion is to explore details. They are in love with the heterogeneity of nature … They are happy if they leave the universe a little more complicated than they found it.

Presumably these categories correspond to what Freeman elsewhere calls birds and frogs, or what others call hedgehogs and foxes. I imagine everyone takes pleasure in both unification and diversification, though in different proportions. Some are closer to one end of the spectrum than the other.

The scientific heroes presented to children are nearly always unifiers like Newton or Einstein [1]. You don’t see as many books celebrating, for example, a biologist who discovered that what was thought to be one species is really 37 different species. This creates an unrealistic picture of science since not many people discover grand unifying principles, though more find unifying principles on a small scale. I imagine many are discouraged from a career in science because they believe they have to be a unifier / bird / hedgehog, when in fact there are more openings for a diversifier / frog / fox.

Dyson may be taking a subtle swipe at unifiers by saying they want to leave the world looking a little simpler than they found it. There may be an unspoken accusation that unifiers create the illusion of unity by papering over diversity. True and significant unifying theories like general relativity are hard to come by. It’s much easier to come up with unifying theories that are incomplete or trivial.

Related posts

[1] Or at least scientists best known for their unifying work. Newton, for example, wasn’t entirely a unifier, but he’s best known for discovering unifying principles of gravity and motion.

Internet privacy as seen from 1975

Science fiction authors set stories in the future, but they don’t necessarily try to predict the future, and so it’s a little odd to talk about what they “got right.” Getting something right implies they were making a prediction rather than imagining a setting of a story.

However, sometimes SF authors do indeed try to predict the future. This seems to have been at least somewhat the case with John Brunner and his 1975 novel The Shockwave Rider because he cites futurist Alvin Toffler in his acknowledgement.

The Shockwave Rider derives in large part from Alvin Toffler’s stimulating study Future Shock, and in consequence I’m much obliged to him.

In light of Brunner’s hat tip to Toffler, I think it’s fair to talk about what he got right, or possibly what Toffler got right. Here’s a paragraph from the dust jacket that seemed prescient.

Webbed in a continental data-net that year by year draws tighter as more and still more information is fed to it, most people are apathetic, frightened, resigned to what ultimately will be a total abolishment of individual privacy. A whole new reason has been invented for paranoia: it is beyond doubt — whoever your are! — that someone, somewhere, knows something about you that you wanted to keep a secret … and you stand no chance of learning what it is.

Related posts

Impossible to misunderstand

“The goal is not to be possible to understand, but impossible to misunderstand.”

I saw this quote at the beginning of a math book when I was a student and it stuck with me. I would think of it when grading exams. Students often assume it is enough to be possible to understand, possible for an infinitely patient and resourceful reader to reverse engineer the thought process behind a stream of consciousness.

The quote is an aphorism, and so not intended to be taken literally, but I’d like to take the last part literally for a moment. I think the quote would be better advice if it said “unlikely to misunderstand.” This ruins the parallelism and the aesthetics of the quote, but it gets to an important point: trying to be impossible to misunderstand leads to bad writing. It’s appropriate when writing for computers, but not when writing for people.

Trying to please too wide and too critical an audience leads to defensive, colorless writing.

You’ll never use an allusion for fear that someone won’t catch it.

You’ll never use hyperbole for fear that some hyper-literalist will object.

You’ll never leave a qualification implicit for fear that someone will pounce on it.

Social media discourages humor, at least subtle humor. If you say something subtle, you may bring a smile to 10% of your audience, and annoy 0.1%. The former are much less likely to send feedback. And if you have a large enough audience, the feedback of the annoyed 0.1% becomes voluminous.

Much has been said about social media driving us to become partisan and vicious, and certainly that happens. But not enough has been said about an opposite effect that also happens, driving us to become timid and humorless.

Fascination burnout

Here a little dialog from Anathem by Neal Stephenson that I can relate to:

“… I don’t care …”

Asribalt was horrified. “But how can you not be fascinated by—”

“I am fascinated,” I insisted. “That’s the problem. I’m suffering from fascination burnout. Of all the things that are fascinating, I have to choose just one or two.”

Regular expression for ICD-9 and ICD-10 codes

Suppose you’re searching for medical diagnosis codes in the middle of free text. One way to go about this would be to search for each of the roughly 14,000 ICD-9 codes and each of the roughly 70,000 ICD-10 codes. A simpler approach would be to use regular expressions, though that may not be as precise.

In practice regular expressions may have some false positives or false negatives. The expressions given here have only false positives. That is, no valid ICD-9 or ICD-10 codes will go unmatched, but the regular expressions may match things that are not diagnosis codes. The latter is inevitable anyway since a string of characters could coincide with a diagnosis code but not be used as a diagnosis code. For example 1234 is a valid ICD-9 code, but 1234 in a document could refer to other things, such as a street address.

ICD-9 diagnosis code format

Most ICD-9 diagnosis codes are just numbers, but they may also start with E or V.

Numeric ICD-9 codes are at least three digits. Optionally there may be a decimal followed by one of two more digits.

An E code begins with E and three digits. These may be followed by a decimal and one more digit.

A V code begins with a V followed by two digits. These may be followed by a decimal and one or two more digits.

Sometimes the decimals are left out.

Here are regular expressions that summarize the discussion above.

    N = "\d{3}\.?\d{0,2}"
    E = "E\d{3}\.?\d?"
    V = "V\d{2}\.?\d{0,2}"
    icd9_regex = "|".join([N, E, V])

Usually E and V are capitalized, but they don’t have to be, so it would be best to do a case-insensitive match.

ICD-10 diagnosis code format

ICD-10 diagnosis codes always begin with a letter (except U) followed by a digit. The third character is usually a digit, but could be an A or B [1]. After the first three characters, there may be a decimal point, and up to three more alphanumeric characters. These alphanumeric characters are never U. Sometimes the decimal is left out.

So the following regular expression would match any ICD-10 diagnosis code.


As with ICD-9 codes, the letters are usually capitalized, but not always, so it’s best to do a case-insensitive search.

Testing the regular expressions

As mentioned at the beginning, the regular expressions here may have false positives. However, they don’t let any valid codes slip by. I downloaded lists of ICD-9 and ICD-10 codes from the CDC and tested to make sure the regular expressions here matched every code.

Regular expression features used

Character ranges are supported everywhere, such as [A-TV-Z] for the letters A through T and V through Z.

Not every regular expression implementation supports \d to represent a digit. In Emacs, for example, you would have to use[0-9] instead since it doesn’t support \d.

I’ve used \.? for an optional decimal point. (The . is a special character in regular expressions, so it needs to be escaped to represent a literal period.) Some people wold write [.]? instead on the grounds that it may be more readable. (Periods are not special characters in the context of a character classes.)

I’ve used {m} for a pattern that is repeated exactly m times, and {m,n} for a pattern that is repeated between m and n times. This is supported in Perl and Python, for example, but not everywhere. You could write \d\d\d instead of \d{3} and \d?\d? instead of \d{0,2}.

Related posts

[1] The only ICD-10 codes with a non-digit in the third position are those beginning with C4A, C7A, C7B, D3A, M1A, O9A, and Z3A.

Rare and strange ICD-10 codes

ICD-10 is a set of around 70,000 diagnosis codes. ICD stands for International Statistical Classification of Diseases and Related Health Problems. The verbosity of the name is foreshadowing.

Some of the ICD-10 codes are awfully specific, and bizarre.

For example,

  • V95.4: Unspecified spacecraft accident injuring occupant
  • V97.33XA: Sucked into jet engine, initial encounter
  • V97.33XD: Sucked into jet engine, subsequent encounter

As I understand it, V97.33XD refers to a subsequent encounter with a health care professional, not a subsequent encounter with a jet engine. But you have to wonder how many people who have been sucked into a jet engine survive to have one, much less two, medical visits.

There is a specific code, Y92.146, for injuries in a prison swimming pool. It seems strange to combine a medical diagnosis and its location into a single code. Is a swimming injury in a prison pool medically different than a swimming injury in a YMCA pool?

I understand that the circumstance of a diagnosis is not recorded strictly for medical reasons. But while 70,000 is an unwieldy large set of codes, it’s kinda small when it has to account for both malady and circumstance. Surely there are 70,000 circumstances alone that are more common than being in a spacecraft, for instance.

Is there a code for being at the opera? Why yes there is: Y92.253. However, there are no codes that are unique to being at a Costco, Walmart, or Jiffy Lube.

Related posts

State privacy laws to watch

US map with states highlighted

A Massachusetts court ruled this week that obtaining real-time cell phone location data requires a warrant.

Utah has passed a law that goes into effect next month that goes further. Police in Utah will need a warrant to obtain location data or to search someone’s electronic files. (Surely electronic files are the contemporary equivalent of one’s “papers” under the Fourth Amendment.)

Vermont passed the nation’s first data broker law. It requires data brokers to register with the state and to implement security measures, but as far as I have read it doesn’t put much restriction what they can do.

Texas law expands HIPAA’s notation of a “covered entity” so that it applies to basically anyone handling PHI (protected health information).

California’s CCPA law goes into effect on January 1. In some ways it’s analogous to GDPR. It will be interesting to see what the law ultimately means in practice. It’s expected that the state legislature will amend the law, and we’ll have to wait on precedents to find out in detail what the law prohibits and allows.

Update: Maine passed a bill May 30, 2019 that prohibits ISPs from selling browsing data without consent.

Related posts

Maybe you should’t script it after all

Programmers have an easier time scaling up than scaling down. You could call this foresight or over-engineering, depending on how things work out. Scaling is a matter of placing bets.

Experienced programmers are rightfully suspicious of claims that something only needs to be done once, or that quick-and-dirty will be OK [*]. They’ve been burned by claims that something was “temporary” and they have war stories in which they did more than required and were later vindicated. These stories make good blog posts.

But some things really do need to only be done once, or only so infrequently that it might as well be only once. And it might be OK for intermediate steps to be quick-and-dirty if the deliverable is high quality.

As a small business owner and a former/occasional programmer, I think about this often. For years I had a little voice in my head say “You really should script this.” And I have automated a few common tasks. But I’ve made peace with the fact that I often do things that (1) could be done far more elegantly and efficiently, and that (2) I will likely never do again [**].

Related posts

[*] “People forget how fast you did a job, but they remember how well you did it.” — Howard Newton

[**] I include in “never do again” things I might do in the future, but far enough in the future that I won’t remember where I saved the script I wrote to do the task last time, if I saved it. Or I saved the script, was able to find it, but it depends on a library that has gone away. Or the task is just different enough that I’d need to practically rewrite the script. Or …