Roman numeral puzzle

I noticed an ad for Super Bowl XLVI on a pizza box this morning. The Roman numeral XLVI does not repeat any character. This brought up a couple questions.

  • How many Roman numerals are possible if you’re not allowed to repeat a character?
  • Could you write a (reasonably short) regular expression to find all such numbers?

You can post your solutions to either question in the comments.

There has never been universal agreement on the rules for constructing Roman numerals, so your solution would depend on your choice of rules. For our purposes here, assume the valid characters are I, V, X, L, C, D, and M. Also, assume any character can be subtracted from a larger character. For example, you can assume IL is a valid representation of 49.

For a more challenging problem, you can use the more restrictive subtraction rules.

  1. I can be subtracted from V and X only.
  2. X can be subtracted from L and C only.
  3. C can be subtracted from D and M only.
  4. V, L, and D can never be subtracted.

Other puzzle posts:

27 thoughts on “Roman numeral puzzle

  1. I’ll need to play with the problem a little before answering. Right now I can only say that Perl’s extended version of regular expressions can include enough logic to match such numbers (in Perl it’s possible to modify the pattern on the fly based on what has already matched, which is why Perl’s regular expressions can determine if parentheses are properly balanced but classic regular expressions can’t).

  2. Not totally sure I’ve got this right, but here goes…

    It uses the more restrictive rules, with the relaxation that it also generates the empty string.

    = "(M? D? C? | M? CD | CM D?) ((L? X? | XL) (V? I? | IV) | L? IX V?) | M? D? XC L? (V? I? | IV)"

    enumerateRegex = nub . sort . fst . go1 . filter (not . isSpace)
    go1 = go [""] [""]
    go :: [String] -> [String] -> String -> ([String], String)
    go prefix atom ('(':xs) = let (a,b) = go1 xs in go (allPairs prefix atom) a b
    go prefix atom (')':xs) = (allPairs prefix atom, xs)
    go prefix atom ('|':xs) = let (a,b) = go1 xs in (allPairs prefix atom ++ a, b)
    go prefix atom ('?':xs) = go (allPairs prefix (atom ++ [""])) [""] xs
    go prefix atom ('?':xs) = go prefix atom xs
    go prefix atom ( c :xs) = go (allPairs prefix atom) [[c]] xs
    go prefix atom [] = (allPairs prefix atom, "")
    allPairs :: [String] -> [String] -> [String]
    allPairs a b = concatMap (x -> map (x ++) b) a

    (I guess most of your readers will recognise the language, but for those that don’t, it’s Haskell)

  3. Sorry, bad formatting. You’ll have to copy to a text editor to see the layout properly and get rid of the emoticon. Also, piping through ‘sort’ was unnecessary, but I always forget how nub works and assume that it only removes equal and *adjacent* items.

  4. Could you write a (reasonably short) regular expression to find all such numbers?

    No – regular expressions can’t count and therefore would not be able to determine if you repeated a character.

  5. In Python:

    def roman_numerals():
        import itertools
        values = list('IVXLCDM')
        index = dict((j,i) for i,j in enumerate(values))
        prefix = {0: (1,2), 2: (3,4), 4: (5,6)}
        for i in xrange(2, 7):
            for perm in itertools.permutations(values[:7], i):
                for j, v in enumerate(map(index.get, perm[:-1])):
                    vn = index[perm[j+1]]
                    if vn > v and vn not in prefix.get(v, ()):
        return values
  6. Anyone want to venture a combinatorial argument, i.e. using math instead of writing a program to count the possibilities?

    Josiah: I fixed your indentation problem by changing your code tag to a pre tag.

    Mr Being: I know what you’re referring to, but regexes can count to 1. If it weren’t for the subtraction rule, you could use something like M?D?C?L?X?V?I?. And if you use more general regular expressions like the first comment alluded to, there are new possibilities. (Though these are not strictly regular expressions in the CS sense.)

  7. at least


    checks for all combinations of I, V, X with no repeats.


  8. Combinatorial:

    Rules: restrictive subtraction rules. I’ll also add a rule that a
    smaller value cannot appear before a larger value except as a subtraction.

    The alphabet contains 7 characters, which are (in order from lowest-value
    to highest-value): I, V, X, L, C, D, and M.

    V, L, D and M can never be subtracted, but may be included or not.

    I, X and C each have four possibilities: they can be skipped; they can
    appear in their positive position, or they can appear in one of two negative

    Those choices lead to 2**4 * 4**3 = 2**10 possibilities. Howeve, that assumes
    all the choices are independent, which they are not. If ‘V’ does not appear,
    then ‘I’ cannot be subtracted from it; if ‘X’ is subtracted from ‘L’ or ‘C’,
    then ‘I’ cannot be subtracted from ‘X’, and so on.

    So let’s start again. We can split the possibilities into four cases,
    based on the number of subtractions in each one. Given N subtractions,
    we have 2^(7-2N) possibilities. This comes from having 7 – 2N positive
    value characters that can (each, independently) be chosen to appear or
    not appear in the string. For the case of N = 0, we subtract 1 to remove
    the possibility of the empty string.

    0 subtractions: 2^7 – 1
    1 subtraction: 6 * 2^5
    2 subtractions: 8 * 2^3
    3 subtractions: 2 * 2
    (IV,XL,CD ; IV,XL,CM)

    Total possibilities: 2^7 + 6*2^5 + 8*2^3 + 4 – 1 = 287

    This matches the number of results emitted by the code in my previous
    comment, which gives me a lot of confidence that the result is correct.

    I’ve put the code from my first comment (plus this combinatorial explanation) into a github gist:

  9. My previous comment was ambiguous. Where I said “which gives me a lot of confidence that the result is correct,” I meant that working out that combinatorial analysis gave me confidence that my code was correct, not that my code gave me confidence that the maths was correct (though I suppose both are true to some extent).

    It took several attempts to find an approach that was clear enough for me to be confident in it, so I’m glad to have done it: a fun exercise all round!

  10. An observation: The problem is a lot easier if your regular expression language includes set intersection and difference. Regular languages are closed under these operators, so there’s really no excuse.

  11. So, a friend pointed out that the value I gave in comment #10 was incorrect:
    2^7 + 6*2^5 + 8*2^3 + 4 – 1 = 387
    Not 287 (just a typo when I wrote the result into the comment). The argument stands, just not the final number.

  12. I get 316, both mathematically and with an emacs lisp program to implement the math:

    (defun rn-MDCLXVI () (rn-union (rn-cross '("MD" "M" "D" "") (rn-CLXVI)) (rn-cross '("MCD" "CM" "CD") (rn-LXVI))))
    (defun rn-CLXVI () (rn-union (rn-cross '("CL" "C" "L" "") (rn-XVI)) (rn-cross '("CXL" "XC" "XL") (rn-VI))))
    (defun rn-LXVI () (rn-union (rn-cross '("L" "") (rn-XVI)) (rn-cross '("XL") (rn-VI))))
    (defun rn-XVI () (rn-union (rn-cross '("XV" "X" "V" "") (rn-I)) (rn-cross '("XIV" "IX" "IV") (rn-))))
    (defun rn-VI () (rn-union (rn-cross '("V" "") (rn-I)) '("IV")))
    (defun rn-I () '("I" ""))
    (defun rn- () '(""))
    (defun rn-union (left right)
      (let ((result right))
        (dolist (left-elt left)
          (if (member left-elt result)
              (princ (format "duplicate: %sn" left-elt))
              (push left-elt result)))
    (defun rn-cross (left right)
      (let ((result '()))
        (dolist (left-elt left)
          (dolist (right-elt right)
            (push (concat left-elt right-elt) result)))
    (1- (length (rn-MDCLXVI)))

    Anybody see a mistake?

  13. Ah. I think John Bartholomew is counting invalid constructs like “IXV”. At least I think they are invalid. One could argue that the restriction on subtraction was ambiguous.

  14. I agree that IXV is invalid. I haven’t checked whether John Bartholomew is including it.

    By the way, the reason I specified a reasonably short regular expression is this: there are a finite number of possibilities, so one could create a regular expression by brute force simply by listing them all: I|IV|V|IX| … A short regular expression that matches a little too much or not enough is more interesting than a correct expression created by brute force.

  15. John – The line:
    I noticed an ad for Super Bowl XLVI on a pizza box this morning.
    conjures a picture of you awakening in the middle of the opening scene from ‘The Bachelor Party’. :-)

  16. I am including IXV, since that restriction was not mentioned in the post at all: ‘I’ can only be subtracted from ‘X’ or ‘V’ does not in itself imply that ‘IX’ cannot appear with ‘V’ — that restriction comes purely external knowledge of common Roman numeral writing rules.

    I agree that excluding such constructs is a sensible rule though, and produces a “nicer” set of valid numerals.

  17. @Jacob I agree with your result of 316.

    I have updated my github gist with the adjusted regular expression and combinatorial solution for excluding CMD, XCL and IXV.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>