DICOM image data

X ray of hand and arm

 

The previous post discussed EXIF data embedded in a digital photo. DICOM files are analogous medical images.

You can think of a DICOM image as a JPEG with medical metadata. Strictly speaking a DICOM file is a sort of database, and one of the fields in the database contains the pixels. The pixels are usually stored in JPEG format or some variation thereof, but they don’t have to be.

DICOM stands for Digital Imaging and Communications in Medicine. It is a standard created by the ACR (American College of Radiology) and NEMA (National Electrical Manufacturers Association). DICOM uses other standards, such as JPEG, and is used by standards built on top of it, such as IHE and HL7.

Consumer photos may contain a lot of EXIF metadata, but DICOM images can contain even more metadata. The DICOM standard is huge. It comes in 16 parts, and the data dictionary part alone is 274 pages. Pages 23 through 176 of the data dictionary consist of one long table defining possible DICOM data fields. Assuming an average of 30 fields per page, this is about 4,600 data fields. To make matters worse more flexible, many of these fields can contain arbitrarily long text strings. Well, not entirely arbitrary: fields must be less than 232 characters. Moby Dick is about 220 characters, so a 232 character limit is essentially no practical limit.

I have had numerous clients send me a description of their data in a small Excel file. Then they’ll say “and we have some images” meaning DICOM images. Maybe the Excel file contains a few dozen fields, but then the DICOM images potentially contain thousands of fields. The Excel file is burying the lede: the vast majority of the data (potentially) is in the DICOM images.

The enormous number of fields, and the lack of much structure to these fields, is a widely recognized problem. According to Mustra et al [1],

A major disadvantage of the DICOM Standard is the possibility for entering probably too many optional fields. This disadvantage is mostly showing in inconsistency of filling all the fields with the data. Some image objects are often incomplete because some fields are left blank and some are filled with incorrect data.

This is quite an understatement, saying that over four thousand fields is “probably too many optional fields.”

Related posts

[1] Mario Mustra, Kresimir Delac, Mislav Grgic. Overview of the DICOM Standard. 50th International Symposium ELMAR-2008, 10-12 September 2008, Zadar, Croatia

Photo by Cara Shelton on Unsplash