I ran into a delightfully strange blog post today called Finnegans Ewok that edits the first few paragraphs of Finnegans Wake to make it into something like Return of the Jedi.
(Unfortunately the page has gone away since I first wrote this. Some of the text is preserved in this Python script.)
The author, Adam Roberts, said via Twitter “What I found interesting here was how little I had to change Joyce’s original text. Tweak a couple of names and basically leave it otherwise as was.”
So what I wanted to do is quantify just how much had to change using the Levenshtein distance, which is essentially the number of one-character changes necessary to transform one string into another.
Here’s the first paragraph from James Joyce:
riverrun, past Eve and Adam’s, from swerve of shore to bend of bay, brings us by a commodius vicus of recirculation back to Howth Castle and Environs.
And here’s the first paragraph from Adam Roberts:
movierun, past new and hopes, from strike of back to bend of jeday, brings us by a commodius lucas of recirculation back to forestmoon and endor.
The original paragraph is 150 characters, the parody is 145 characters, and the Levenshtein distance is 44.
Here’s a summary of the results for the first four paragraphs.
|-------+---------+----------| | Joyce | Roberts | Distance | |-------+---------+----------| | 150 | 145 | 44 | | 700 | 727 | 119 | | 594 | 615 | 145 | | 1053 | 986 | 333 | |-------+---------+----------|
The fifth paragraph seems to diverge more from Joyce. I maybe have gotten something misaligned, and reading enough of Finnegans Wake to debug the problem made my head hurt, so I stopped.
Update: See the next post for sequence alignment applied to the two sources. This lets you see not just the number of edits but what the edits are. This show why I was having difficulty aligning the fifth paragraphs.