CAAPR - A Combined Anglo-American Pronunciation Reference

Alan Beale
  17 April 2006

This page describes CAAPR, the Combined Anglo-American Pronunciation Reference.  CAAPR is a pronouncing dictionary for both British English (RP) and American English (GA).  It is written in a compact and easy-to-read notation which systematizes the differences between these two varieties of English, while still handling exceptions gracefully and accurately.  The CAAPR notation is itself called CAAPR, this time standing for Combined Anglo-American Pronunciation Representation.

CAAPR is based on two primary sources, the FEWL list for American English and the online EPD dictionary for British English.  This latter dictionary is of very high quality, and I wish I knew who collected it so I could offer them my extravagant thanks.  During the development of CAAPR, I transformed, rearranged and occasionally corrected the EPD, but I find it remarkable how few corrections were needed.  Whatever merits CAAPR may possess are in large measure derived from the labor of the unknown contributors to the EPD.

CAAPR actually consists of three word lists: CAAPR-A, CAAPR-B and CAAPR-C.  CAAPR-A(merican) is a list of words with their GA pronunciations, derived via reformatting from the FEWL list.  CAAPR-B(ritish) is a somewhat different list of words with their RP pronunciations, derived via subsetting and reformatting from the online EPD dictionary (plus some additional common words unaccountably omitted from that document).  CAAPR-C(ombined) is a combined list, comprising the words which are both in CAAPR-A and CAAPR-B, showing both pronunciations together in a single notation.  As will be described here, the CAAPR notation is slightly different for each document, in ways that are unlikely to give the user any difficulty.  The CAAPR-A and CAAPR-B lists each contain approximately 30,000 words.  The CAAPR-C list contains the approximately 28,000 words common to both the A and B lists.

The CAAPR lists show a single (RP or GA) pronunciation for each word on the list.  Of course, the world is much more complicated than this, because many words have multiple acceptable pronunciations, with conflicting data about which is preferable or most prevalent.  Both lists use the technique of lexicographic consensus to resolve such questions.  Note that differences between the GA and RP pronunciations listed in CAAPR may not reflect actual Anglo-American differences.  For instance, both forms may represent pronunciations which are common in both varieties, distinguished more by happenstance than by geography.

The pronunciations in CAAPR-A were determined by consensus of the following dictionaries: The Longman Pronunciation Dictionary, the Merriam-Webster Collegiate Dictionary CD-ROM, and the American Heritage Dictionary CD-ROM.  In difficult cases, the Random House Unabridged Dictionary CD-ROM was also consulted.  Similarly, the pronunciations of CAAPR-B were determined by consensus of the online EPD, the Longman Pronunciation Dictionary and the Shorter OED CD-ROM.  In difficult cases, the Cambridge Pronunciation Dictionary was also consulted.  The Longman Pronunciation Dictionary is the most precise of all these sources, and was often used to resolve issues which could not be easily settled using the other, less technical sources.

Each list has a similar format.  A typical entry from the CAAPR-A and CAAPR-B lists looks like this:

wordplay : w&'dpLE·

The entry is divided by the delimiting string " : " into the traditional spelling and the CAAPR representation of a word.  In some cases, like the following:

ill-use (n) : i'LyU's
ill-use (v) : i'LyU'z

different forms of the same word may be distinguished by a qualifying word or phrase in parentheses.

The format of the CAAPR-C list is similar.  Here are a few example lines:

combat (n) : kombat
combat (v) : k[ø|o]mbat *

Some entries may be followed by an asterisk, which indicates that the American and British stress patterns for the word are different.  The programming which determines this is still under development, and the absence of the asterisk cannot be trusted to mean that the two pronunciations are in fact similar in their stress.  (Note that the CAAPR combined notation does not directly indicate stress, due to technical difficulties.)

The three CAAPR lists can be downloaded using the following link:

An Example of CAAPR

CAAPR is not intended to be used for transcription of continuous text.  Nevertheless, it is useful to give you an idea of the overall appearance of CAAPR, and I know no better way to do this than to transcribe a short bit of prose.  Here is the first paragraph of H.G. Wells' "The Star" (see here for a plain English version, among others) written in CAAPR-C.  It may seem cryptic at first glance, but as one uses CAAPR, one rapidly becomes accustomed to its conventions, and soon such passages present little mystery.

it w[u/o]z on Dø f&ßst dE øv Dø n!U yïr Dat Dý ønWnsm°nt w[u/o]z mEd, ØLmOst s[Y/i]m°LtEnÿøsLý fr[u<o]m TrI øbz&ßvøtòrý$, Dat Dø mOX°n øv Dø pLanît nept!Un, Dý WtRmOst øv ØL Dø pLanît$ Dat µIL øbWt Dø sun, had bikum verý iratik. ø ritAßdEX°n in its v3Losêtý had b[i\I]n søspektîd in disembR. Den, ø fEnt, rimOt spek øv LYt w[u/o]z diskuvRþ in Dø rIj°n øv Dø pRt&ßbþ pLanît. at f&ßst Dis did not kØz ený verý grEt iksYtm°nt. sYøntifik pIp°L, hWevR, fWnd Dý inteLîj°ns rimAßkøb°L inuf, Iv°n bifØß it bikEm nOn Dat Dø n!U bodý w[u/o]z rapîdLý grOiG LAßjR and brYtR, and Dat its mOX°n w[u/o]z kwYt dif°r°nt fr[u<o]m Dý ØßdRLý pr[o|O]gr[ø<e]s øv Dø pLanît$.

Final comments

Though this version of CAAPR has been thoroughly proofread, it is still likely to contain errors and other faults.  Thus, you should inform me when you encounter errors, whether isolated or systematic.  If you discover ways in which CAAPR could be changed to improve its usefulness, I'd also like to hear of them.  Sometimes I suspect that my work on this and other similar projects is no more than talking to myself in public.  If this is not so, and there are ways I can make my forays into dictionary building more generally useful, it would be a shame if no one bothered to tell me.

