Arabic numerals and numerals that are Arabic

The characters 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9 are called Arabic numerals, but there are a lot of other numerals that are Arabic.

I discovered this when reading the documentation on Perl regular expressions, perlre. Here’s the excerpt from that page that caught my eye.

Many scripts have their own sets of digits equivalent to the Western 0 through 9 ones. A few, such as Arabic, have more than one set. For a string to be considered a script run, all digits in it must come from the same set of ten, as determined by the first digit encountered.

Emphasis added.

I took some code I’d written for previous posts on Unicode numbers and modified it to search the range of Arabic Unicode characters and report all characters that represent 0 through 9.

from unicodedata import numeric, name

a = set(range(0x00600, 0x006FF+1)) | \
    set(range(0x00750, 0x0077F+1)) | \
    set(range(0x008A0, 0x008FF+1)) | \
    set(range(0x00870, 0x0089F+1)) | \
    set(range(0x0FB50, 0x0FDFF+1)) | \
    set(range(0x0FE70, 0x0FEFF+1)) | \
    set(range(0x10EC0, 0x10EFF+1)) | \
    set(range(0x1EE00, 0x1EEFF+1)) | \
    set(range(0x1EC70, 0x1ECBF+1)) | \
    set(range(0x1ED00, 0x1ED4F+1)) | \
    set(range(0x10E60, 0x10E7F+1)) 

f = open('digits.txt','w',encoding='utf8')

def uni(i):
    return "U+" + format(i, "X")

for i in sorted(a):
    ch = chr(i)
    if ch.isnumeric() and numeric(ch) in range(10):
        print(ch, uni(i), numeric(ch), name(ch), file=f)

Apparently there are two ways to write 0, eight ways to write 2, and seven ways to write 1, 3, 4, 5, 6, 7, 8, and 9. I’ll include the full results at the bottom of the post.

I first wrote my Python script to write to the command line and redirected the output to a file. This resulted in some of the Arabic characters being replaced with a blank or with 0. Then I changed the script as above to write to a file opened to receive UTF-8 text. All the characters were preserved, though I can’t see most of them because the font my editor is using doesn’t have glyphs for the characters outside the BMP (i.e. those with Unicode values above 0xFFFF).

Related posts

٠ U+660 0.0 ARABIC-INDIC DIGIT ZERO
١ U+661 1.0 ARABIC-INDIC DIGIT ONE
٢ U+662 2.0 ARABIC-INDIC DIGIT TWO
٣ U+663 3.0 ARABIC-INDIC DIGIT THREE
٤ U+664 4.0 ARABIC-INDIC DIGIT FOUR
٥ U+665 5.0 ARABIC-INDIC DIGIT FIVE
٦ U+666 6.0 ARABIC-INDIC DIGIT SIX
٧ U+667 7.0 ARABIC-INDIC DIGIT SEVEN
٨ U+668 8.0 ARABIC-INDIC DIGIT EIGHT
٩ U+669 9.0 ARABIC-INDIC DIGIT NINE
۰ U+6F0 0.0 EXTENDED ARABIC-INDIC DIGIT ZERO
۱ U+6F1 1.0 EXTENDED ARABIC-INDIC DIGIT ONE
۲ U+6F2 2.0 EXTENDED ARABIC-INDIC DIGIT TWO
۳ U+6F3 3.0 EXTENDED ARABIC-INDIC DIGIT THREE
۴ U+6F4 4.0 EXTENDED ARABIC-INDIC DIGIT FOUR
۵ U+6F5 5.0 EXTENDED ARABIC-INDIC DIGIT FIVE
۶ U+6F6 6.0 EXTENDED ARABIC-INDIC DIGIT SIX
۷ U+6F7 7.0 EXTENDED ARABIC-INDIC DIGIT SEVEN
۸ U+6F8 8.0 EXTENDED ARABIC-INDIC DIGIT EIGHT
۹ U+6F9 9.0 EXTENDED ARABIC-INDIC DIGIT NINE
 U+10E60 1.0 RUMI DIGIT ONE
 U+10E61 2.0 RUMI DIGIT TWO
 U+10E62 3.0 RUMI DIGIT THREE
 U+10E63 4.0 RUMI DIGIT FOUR
 U+10E64 5.0 RUMI DIGIT FIVE
 U+10E65 6.0 RUMI DIGIT SIX
 U+10E66 7.0 RUMI DIGIT SEVEN
 U+10E67 8.0 RUMI DIGIT EIGHT
 U+10E68 9.0 RUMI DIGIT NINE
 U+1EC71 1.0 INDIC SIYAQ NUMBER ONE
 U+1EC72 2.0 INDIC SIYAQ NUMBER TWO
 U+1EC73 3.0 INDIC SIYAQ NUMBER THREE
 U+1EC74 4.0 INDIC SIYAQ NUMBER FOUR
 U+1EC75 5.0 INDIC SIYAQ NUMBER FIVE
 U+1EC76 6.0 INDIC SIYAQ NUMBER SIX
 U+1EC77 7.0 INDIC SIYAQ NUMBER SEVEN
 U+1EC78 8.0 INDIC SIYAQ NUMBER EIGHT
 U+1EC79 9.0 INDIC SIYAQ NUMBER NINE
 U+1ECA3 1.0 INDIC SIYAQ NUMBER PREFIXED ONE
 U+1ECA4 2.0 INDIC SIYAQ NUMBER PREFIXED TWO
 U+1ECA5 3.0 INDIC SIYAQ NUMBER PREFIXED THREE
 U+1ECA6 4.0 INDIC SIYAQ NUMBER PREFIXED FOUR
 U+1ECA7 5.0 INDIC SIYAQ NUMBER PREFIXED FIVE
 U+1ECA8 6.0 INDIC SIYAQ NUMBER PREFIXED SIX
 U+1ECA9 7.0 INDIC SIYAQ NUMBER PREFIXED SEVEN
 U+1ECAA 8.0 INDIC SIYAQ NUMBER PREFIXED EIGHT
 U+1ECAB 9.0 INDIC SIYAQ NUMBER PREFIXED NINE
 U+1ECB1 1.0 INDIC SIYAQ NUMBER ALTERNATE ONE
 U+1ECB2 2.0 INDIC SIYAQ NUMBER ALTERNATE TWO
 U+1ED01 1.0 OTTOMAN SIYAQ NUMBER ONE
 U+1ED02 2.0 OTTOMAN SIYAQ NUMBER TWO
 U+1ED03 3.0 OTTOMAN SIYAQ NUMBER THREE
 U+1ED04 4.0 OTTOMAN SIYAQ NUMBER FOUR
 U+1ED05 5.0 OTTOMAN SIYAQ NUMBER FIVE
 U+1ED06 6.0 OTTOMAN SIYAQ NUMBER SIX
 U+1ED07 7.0 OTTOMAN SIYAQ NUMBER SEVEN
 U+1ED08 8.0 OTTOMAN SIYAQ NUMBER EIGHT
 U+1ED09 9.0 OTTOMAN SIYAQ NUMBER NINE
 U+1ED2F 2.0 OTTOMAN SIYAQ ALTERNATE NUMBER TWO
 U+1ED30 3.0 OTTOMAN SIYAQ ALTERNATE NUMBER THREE
 U+1ED31 4.0 OTTOMAN SIYAQ ALTERNATE NUMBER FOUR
 U+1ED32 5.0 OTTOMAN SIYAQ ALTERNATE NUMBER FIVE
 U+1ED33 6.0 OTTOMAN SIYAQ ALTERNATE NUMBER SIX
 U+1ED34 7.0 OTTOMAN SIYAQ ALTERNATE NUMBER SEVEN
 U+1ED35 8.0 OTTOMAN SIYAQ ALTERNATE NUMBER EIGHT
 U+1ED36 9.0 OTTOMAN SIYAQ ALTERNATE NUMBER NINE

2 thoughts on “Arabic numerals and numerals that are Arabic

  1. Interesting related fact: built-in `int` and `float` functions do recognize these non-ASCII numerals, e.g. `int(‘۳’) == 3`.

  2. There’s an investigation to be done on “What’s the encoding on your BASH’s “re-direct” and “pipe”?”. I suspect the answer will also depend on one’s terminal program, and I hope to never have to carry it out.

Comments are closed.