HCPCS (“hick pics”) codes

HCPCS stands for Healthcare Common Procedure Coding System. HCPCS codes are commonly pronounced like “hick pics.” I was curious/afraid to see what DALL-E would create with the prompt “HCPCS hick pics” and was pleasantly surprised that it produced the image above. More on that latter.

Searching for medical codes

I occasionally need to search for medical codes—ICD-9 codes, ICD-10 codes, SNOMED codes, etc.—including HCPCS codes. Here’s some data that is supposed to contain a certain kind of code; does it? Or, more importantly, this text is supposed to have been scrubbed of medical codes; was it?

The most accurate way to search for HCPCS codes would be to have a complete list of codes and search for each one. There are a couple reasons why this isn’t so useful. First of all, I’m doing a quick scan, not writing data validation software. So a quick-and-dirty regular expression is fine. In fact, it’s better: I may be more interested in finding things that look like HCPCS codes than actual HCPCS codes. The former would include some typos or codes that were added after my script was written.


Another advantage of using a regular expression that approximates HCPCS codes is that HCPCS codes are copyrighted. I don’t know whether a script using a list of HCPCS codes would be fair use, but it doesn’t matter if I’m using a regular expression.

According to Wikipedia, “CPT is a registered trademark of the American Medical Association, and its largest single source of income.” CPT is part of HCPCS, namely Level I. It is controversial that CPT codes are a commercial product whose use is required by law.

Code patterns

There are three levels of HCPCS codes. Level I is synonymous with CPT and consists of five-digit numbers. It’s easy to search for five-digit numbers, say with the regular expression \d{5}, but of course many five-digit numbers are not medical codes. But if text that has supposedly been stripped of medical codes (and zip codes) has a lot of five-digit numbers, it could be something to look into.

Level II and Level III codes are five-character codes consisting of four digits and either an F or a T. So you might search on \d{4}[FT]. While a five-digit number is likely to be a false positive when searching for HCPCS codes, four-digits following by an F or T is more likely to be a hit.

About the image

I edited the image that DALL-E produced two ways. First, I cropped it vertically; the original image had a lot more grass at the bottom.

Second, I lowered the resolution. The images DALL-E generates are very large. Even after cutting off the lower half of the image, the remaining part was over a megabyte. I used Squoosh to reduce the image to 85 kb. Curiously, when I opened the squooshed output in GIMP and lowered the resolution further, the image actually got larger, so went back to the output from Squoosh.

The image looks more natural than anything I’ve made using DALL-E. I look at the image and think it would be nice to be there. But if you look at the barn closely, it has a little of that surreal quality that DALL-E images usually have. This is not a photograph or a painting.

This discussion of AI-generated images is a bit of a rabbit trail, but it does tie in with searching for medical codes. Both are a sort of forensic criticism: is this what it’s supposed to be or is something out of place? When looking for errors or fraud, you have to look at data similarly to the way you might analyze an image.

Related posts