The US Food and Drug Administration tracks drugs using an identifier called the NDC or National Drug Code. It is described as a 10-digit code, but it may be more helpful to think of it as a 12-character code.
An NDC contains 10 digits, separated into three segments by two dashes. The three segments are the labeler code, product code, and package code. The FDA assigns the labeler codes to companies, and each company assigns its own product and package codes.
Format
The segments are of variable length and so the dashes are significant. The labeler code could be 4 or 5 digits. The product code could be 3 or 4 digits, and the package code could be 1 or 2 digits. The total number of digits is must be 10, so there are three possible combinations:
- 4-4-2
- 5-3-2
- 5-4-1.
There’s no way to look at just the digits and know how to separate them into three segments. My previous post looked at self-punctuating codes. The digits of NDC codes are not self-punctuating because they require the dashes. The digit combinations are supposed to be unique, but you can’t tell how to parse a set of digits from the digits alone.
Statistics
I downloaded the NDC data from the FDA to verify whether the codes work as documented, and to see the relative frequency of various formats.
(The data change daily, so you may get different results if you do try this yourself.)
Format
All the codes were 12 characters long, and all had the documented format as verified by the regular expression [1]
\d{4,5}-\d{3,4}-\d{1,2}
Uniqueness exception
I found one exception to the rule that the sequence of digits should be unique. The command
sed "s/-//g" ndc.txt | sort | uniq -d
returned 2950090777.
The set of NDC codes contained both 29500-907-77 and 29500-9077-7.
Distribution
About 60% of the codes had the form 5-3-2. About 30% had the form 5-4-1, and the remaining 10% had the form 4-4-2.
There were a total of 252,355 NDC codes with 6,532 different lablelers (companies).
There were 9448 NDC codes associated with the most prolific labeler. The 1,424 least prolific labelers had only one DNC code. In Pareto-like fashion, the top 20% of labelers accounted for about 90% of the codes.
Related posts
[1] Programming languages like Python or Perl will recognize this regular expression, but by default grep
does not support \d
for digits. The Gnu implementation of grep
with the -P
option will. It will also understand notation like {4,5}
to mean a pattern is repeated 4 or 5 times, with or without -P
, but I don’t think other implementations of grep
necessarily will.
I was a member of ncpdp.org, who pretty much wrote the book on the use of NDC for billing prescription claims in the US. Some of the wonks on the committee that should know such things believe that we are dangerously close to running out of NDC codes and some other system will need to be used or developed. Part of the reason is that the product identifier can be used, and is used for drug manufacturing intermediary products that would never be billed as a final drug product. In other words, NDC was probably a poor choice for billing from the get-go.
The reason it’s being raised as an issue now, is that HIPPA rules promulgated and enforced by CMS limit the types of codes that can be transmitted in a billing. UPC, for example, would not be kosher. It will take something like eight or ten years get everyone on the same page because CMS will need to adopt a new standard identifier that is allowed in a claim.
The second sentence under the first “Format” heading had “product code” twice, instead of “product code” and “package code”. (Thanks for the posts! They are worth reading.)
So, that’s what those numbers are about. Thanks.
If you’ve ever wondered about the names:
“Wonder where generic drug names come from? Two women in Chicago, that’s where”
https://www.latimes.com/business/lazarus/la-fi-lazarus-drug-names-20190719-story.html