It’s surprisingly hard to find a complete list of HTML entities in the form of a data file. There are numerous sites that give lists, often incomplete, in a page formatted to be human-readable but not machine-readable.
Here’s an XML file from the W3C.
Here’s a two-column text file I created from the W3C data.
The W3C is a (maybe “the”?) authoritative source for this info, so it seems pretty straightforward to get it right from them.
Awhile back I wrote an RSS feed generation library. HTML entities are generally not valid XML, so part of what the library does is convert any of those to their numeric equivalents. Here’s the part that downloads the data from the W3C and converts it to a hash table: https://github.com/otherjoel/splitflap/blob/main/splitflap-lib/private/build.rkt#L23
I agree about the W3C being authoritative, but even finding this on their site was not easy.
https://github.com/w3c/xml-entities/blob/gh-pages/characters.xsl
David Carlisle did a lot of work on entities.
There may be more on w3c
The official source is
https://w3c.github.io/xml-entities/
which has links to the html set and the full set (which includes isogrk[1,2,4])
as both json and xml dtd declarations
eg XML declarations matching teh entities in HTML and MathML are
https://www.w3.org/2003/entities/2007/htmlmathml-f.ent
(or same at github)
https://github.com/w3c/xml-entities/blob/gh-pages/docs/2007/htmlmathml.json
being the json file which is on the github that you linked to originally.