HTML entity data

It’s surprisingly hard to find a complete list of HTML entities in the form of a data file. There are numerous sites that give lists, often incomplete, in a page formatted to be human-readable but not machine-readable.

Here’s an XML file from the W3C.

Here’s a two-column text file I created from the W3C data.

5 thoughts on “HTML entity data

  1. The W3C is a (maybe “the”?) authoritative source for this info, so it seems pretty straightforward to get it right from them.

    Awhile back I wrote an RSS feed generation library. HTML entities are generally not valid XML, so part of what the library does is convert any of those to their numeric equivalents. Here’s the part that downloads the data from the W3C and converts it to a hash table: https://github.com/otherjoel/splitflap/blob/main/splitflap-lib/private/build.rkt#L23

Comments are closed.