Vowel sounds can be visualized in a two-dimensional space according to tongue position. The vertical axis is runs from open down to closed, and the horizontal runs from front to back. See a linguistics textbook for far more detail.
English has five vowel letters, but a lot more than five vowel sounds. Scholars argue about how many vowel sounds English and other languages have because there’s room for disagreement on how much two sounds can differ and still be considered variations on the same sound. The IPA Handbook  lists 11 vowel sounds in American English, not counting diphthongs.
When I wrote about Japanese hiragana and katakana recently, I showed how the letters are arranged into a grid with one side labeled with English vowel letters. Is that justified? Does Japanese really have just five vowel sounds, and are they similar to five English vowels? Essentially yes. This post will show how English and Japanese vowel sounds compare according to .
Here are versions of the vowel charts for the two languages that I made using Python’s matplotlib.
And now the two combined on one plot:
Four out of the five Japanese vowels have a near equivalent in English. The exception is the Japanese vowel with IPA symbol ‘a’, which is midway between the English vowels with symbols æ (U+0230) and ɑ (U+0251), somewhere between the a in had and the a in father.
Update: See the comments for a acoustic phonetician’s input regarding frequency analysis.
Update: Here is a similar post for Swedish. Swedish is interesting because it has a lot of vowel sounds, and the sounds are in tight clusters.
Analogy with KL divergence
The differences between English and Japanese vowels are asymmetric: an English speaker will find it easier to learn Japanese vowels than a Japanese speaker would find it to learn English vowels. This reminiscent of the Kullback-Leibler divergence in probability and statistics.
KL-divergence is a divergence and not a distance, even though it is often called a distance, because it’s not symmetric. The KL-divergence between two random variables X and Y, written KL(X || Y), is the average surprise in seeing Y when you expected X. If you expect English vowel sounds and hear Japanese vowel sounds you’re not as surprised as if you expect Japanese vowel sounds and hear English. The English student of Japanese hears familiar sounds shifted a bit, but the Japanese student of English hears new sounds.
 Handbook of the International Phonetic Association: A Guide to the Use of the International Phonetic Alphabet. Cambridge University Press, 2021.