I noticed an ad for Super Bowl XLVI on a pizza box this morning. The Roman numeral XLVI does not repeat any character. This brought up a couple questions.

- How many Roman numerals are possible if you’re not allowed to repeat a character?
- Could you write a (reasonably short) regular expression to find all such numbers?

You can post your solutions to either question in the comments.

There has never been universal agreement on the rules for constructing Roman numerals, so your solution would depend on your choice of rules. For our purposes here, assume the valid characters are I, V, X, L, C, D, and M. Also, assume any character can be subtracted from a larger character. For example, you can assume IL is a valid representation of 49.

For a more challenging problem, you can use the more restrictive subtraction rules.

- I can be subtracted from V and X only.
- X can be subtracted from L and C only.
- C can be subtracted from D and M only.
- V, L, and D can never be subtracted.

**Other puzzle posts**:

I’ll need to play with the problem a little before answering. Right now I can only say that Perl’s extended version of regular expressions can include enough logic to match such numbers (in Perl it’s possible to modify the pattern on the fly based on what has already matched, which is why Perl’s regular expressions can determine if parentheses are properly balanced but classic regular expressions can’t).

This sounds like a fun little problem to work on while watching the Super Bowl (-:

Not totally sure I’ve got this right, but here goes…

It uses the more restrictive rules, with the relaxation that it also generates the empty string.

romanNumerals

= "(M? D? C? | M? CD | CM D?) ((L? X? | XL) (V? I? | IV) | L? IX V?) | M? D? XC L? (V? I? | IV)"

`enumerateRegex = nub . sort . fst . go1 . filter (not . isSpace)`

where

go1 = go [""] [""]

go :: [String] -> [String] -> String -> ([String], String)

go prefix atom ('(':xs) = let (a,b) = go1 xs in go (allPairs prefix atom) a b

go prefix atom (')':xs) = (allPairs prefix atom, xs)

go prefix atom ('|':xs) = let (a,b) = go1 xs in (allPairs prefix atom ++ a, b)

go prefix atom ('?':xs) = go (allPairs prefix (atom ++ [""])) [""] xs

go prefix atom ('?':xs) = go prefix atom xs

go prefix atom ( c :xs) = go (allPairs prefix atom) [[c]] xs

go prefix atom [] = (allPairs prefix atom, "")

allPairs :: [String] -> [String] -> [String]

allPairs a b = concatMap (x -> map (x ++) b) a

(I guess most of your readers will recognise the language, but for those that don’t, it’s Haskell)

Sorry, bad formatting. You’ll have to copy to a text editor to see the layout properly and get rid of the emoticon. Also, piping through ‘sort’ was unnecessary, but I always forget how nub works and assume that it only removes equal and *adjacent* items.

Could you write a (reasonably short) regular expression to find all such numbers?

No β regular expressions canβt count and therefore would not be able to determine if you repeated a character.

In Python:

Bah! Better indentation here: http://pastebin.com/ZtB9eKxx

Anyone want to venture a combinatorial argument, i.e. using math instead of writing a program to count the possibilities?

Josiah: I fixed your indentation problem by changing your code tag to a pre tag.

Mr Being: I know what you’re referring to, but regexes can count to 1. If it weren’t for the subtraction rule, you could use something like M?D?C?L?X?V?I?. And if you use more general regular expressions like the first comment alluded to, there are new possibilities. (Though these are not strictly regular expressions in the CS sense.)

at least

^([IVX])(?!.*1)([IVX])?(?!.*2)([IVX]?)$

checks for all combinations of I, V, X with no repeats.

|=

Combinatorial:

Rules: restrictive subtraction rules. I’ll also add a rule that a

smaller value cannot appear before a larger value except as a subtraction.

The alphabet contains 7 characters, which are (in order from lowest-value

to highest-value): I, V, X, L, C, D, and M.

V, L, D and M can never be subtracted, but may be included or not.

I, X and C each have four possibilities: they can be skipped; they can

appear in their positive position, or they can appear in one of two negative

positions.

Those choices lead to 2**4 * 4**3 = 2**10 possibilities. Howeve, that assumes

all the choices are independent, which they are not. If ‘V’ does not appear,

then ‘I’ cannot be subtracted from it; if ‘X’ is subtracted from ‘L’ or ‘C’,

then ‘I’ cannot be subtracted from ‘X’, and so on.

So let’s start again. We can split the possibilities into four cases,

based on the number of subtractions in each one. Given N subtractions,

we have 2^(7-2N) possibilities. This comes from having 7 – 2N positive

value characters that can (each, independently) be chosen to appear or

not appear in the string. For the case of N = 0, we subtract 1 to remove

the possibility of the empty string.

0 subtractions: 2^7 – 1

1 subtraction: 6 * 2^5

2 subtractions: 8 * 2^3

(IV,XL ; IV,XC; IV,CD; IV,CM; IX,CD ; IX,CM; XL,CD ; XL,CM)

3 subtractions: 2 * 2

(IV,XL,CD ; IV,XL,CM)

Total possibilities: 2^7 + 6*2^5 + 8*2^3 + 4 – 1 = 287

This matches the number of results emitted by the code in my previous

comment, which gives me a lot of confidence that the result is correct.

I’ve put the code from my first comment (plus this combinatorial explanation) into a github gist:

https://gist.github.com/1613780

My previous comment was ambiguous. Where I said “which gives me a lot of confidence that the result is correct,” I meant that working out that combinatorial analysis gave me confidence that my code was correct, not that my code gave me confidence that the maths was correct (though I suppose both are true to some extent).

It took several attempts to find an approach that was clear enough for me to be confident in it, so I’m glad to have done it: a fun exercise all round!

An observation: The problem is a lot easier if your regular expression language includes set intersection and difference. Regular languages are closed under these operators, so there’s really no excuse.

So, a friend pointed out that the value I gave in comment #10 was incorrect:

2^7 + 6*2^5 + 8*2^3 + 4 β 1 = 387

Not 287 (just a typo when I wrote the result into the comment). The argument stands, just not the final number.

For listing them all I rolled up http://pastebin.com/9zr5gcug which is a bit rough and ready (in particular it really ought to work from the right-hand end of $string). As a regexp I rolled http://pastebin.com/ziZ3qtsh because I happen just to have read about named backreferences.

I get 316, both mathematically and with an emacs lisp program to implement the math:

Anybody see a mistake?

Ah. I think John Bartholomew is counting invalid constructs like “IXV”. At least I think they are invalid. One could argue that the restriction on subtraction was ambiguous.

I agree that IXV is invalid. I haven’t checked whether John Bartholomew is including it.

By the way, the reason I specified a

reasonably shortregular expression is this: there are a finite number of possibilities, so one could create a regular expression by brute force simply by listing them all: I|IV|V|IX| … A short regular expression that matches a little too much or not enough is more interesting than a correct expression created by brute force.John – The line:

I noticed an ad for Super Bowl XLVI on a pizza box this morning.

conjures a picture of you awakening in the middle of the opening scene from ‘The Bachelor Party’. π

I am including IXV, since that restriction was not mentioned in the post at all: ‘I’ can only be subtracted from ‘X’ or ‘V’ does not in itself imply that ‘IX’ cannot appear with ‘V’ — that restriction comes purely external knowledge of common Roman numeral writing rules.

I agree that excluding such constructs is a sensible rule though, and produces a “nicer” set of valid numerals.

@Jacob I agree with your result of 316.

I have updated my github gist with the adjusted regular expression and combinatorial solution for excluding CMD, XCL and IXV.

My solution in python is here https://gist.github.com/1626535 . It is not the best solution but still good enough to share.

I always thought it was interesting that the Number of the Beast, 666, in Roman numerals is DCLXVI.

function ConvertTo-RomanNumeral

{

[CmdletBinding()]

[OutputType([string])]

Param

(

[Parameter(Mandatory=$true,

HelpMessage=”Enter an integer in the range 1 to 3,999″,

ValueFromPipeline=$true,

Position=0)]

[ValidateRange(1,3999)]

[int]

$Number

)

Begin

{

$DecimalToRoman = @{

Thousands = “”,”M”,”MM”,”MMM”

Hundreds = “”,”C”,”CC”,”CCC”,”CD”,”D”,”DC”,”DCC”,”DCCC”,”CM”

Tens = “”,”X”,”XX”,”XXX”,”XL”,”L”,”LX”,”LXX”,”LXXX”,”XC”

Ones = “”,”I”,”II”,”III”,”IV”,”V”,”VI”,”VII”,”VIII”,”IX”

}

$column = @{

Thousands = 0

Hundreds = 1

Tens = 2

Ones = 3

}

}

Process

{

[int[]]$digits = $Number.ToString().PadLeft(4,”0″).ToCharArray() |

ForEach-Object { [Char]::GetNumericValue($_) }

$RomanNumeral = “”

$RomanNumeral += $DecimalToRoman.Thousands[$digits[$column.Thousands]]

$RomanNumeral += $DecimalToRoman.Hundreds[$digits[$column.Hundreds]]

$RomanNumeral += $DecimalToRoman.Tens[$digits[$column.Tens]]

$RomanNumeral += $DecimalToRoman.Ones[$digits[$column.Ones]]

$RomanNumeral

}

End

{

}

}

1..3999 |

ForEach-Object { ConvertTo-RomanNumeral $_} |

ForEach-Object { if ($_ -notmatch ‘^.*(.).*\1.*$’) { $_ } }