A new paper in Science suggests that all human languages carry about the same amount of information per unit time. In languages with fewer possible syllables, people speak faster. In languages with more syllables, people speak slower.
Researchers quantified the information content per syllable in 17 different languages by calculating Shannon entropy. When you multiply the information per syllable by the number of syllables per second, you get around 39 bits per second across a wide variety of languages.
If a language has N possible syllables, and the probability of the ith syllable occurring in speech is pi, then the average information content of a syllable, as measured by Shannon entropy, is
For example, if a language had only eight possible syllables, all equally likely, then each syllable would carry 3 bits of information. And in general, if there were 2n syllables, all equally likely, then the information content per syllable would be n bits. Just like n zeros and ones, hence the term bits.
Of course not all syllables are equally likely to occur, and so it’s not enough to know the number of syllables; you also need to know their relative frequency. For a fixed number of syllables, the more evenly the frequencies are distributed, the more information is carried per syllable.
If ancient languages conveyed information at 39 bits per second, as a variety of modern languages do, one could calculate the entropy of the language’s syllables and divide 39 by the entropy to estimate how many syllables the speakers spoke per second.
According to this overview of the research,
Japanese, which has only 643 syllables, had an information density of about 5 bits per syllable, whereas English, with its 6949 syllables, had a density of just over 7 bits per syllable. Vietnamese, with its complex system of six tones (each of which can further differentiate a syllable), topped the charts at 8 bits per syllable.
One could do the same calculations for Latin, or ancient Greek, or Anglo Saxon that the researches did for Japanese, English, and Vietnamese.
If all 643 syllables of Japanese were equally likely, the language would convey -log2(1/637) = 9.3 bits of information per syllable. The overview says Japanese carries 5 bits per syllable, and so the efficiency of the language is 5/9.3 or about 54%.
If all 6949 syllables of English were equally likely, a syllable would carry 12.7 bits of information. Since English carries around 7 bits of information per syllable, the efficiency is 7/12.7 or about 55%.
Taking a wild guess by extrapolating from only two data points, maybe around 55% efficiency is common. If so, you could estimate the entropy per syllable of a language just from counting syllables.