This page describes CAAPR, the Combined Anglo-American Pronunciation Reference. CAAPR is a pronouncing dictionary for both British English (RP) and American English (GA). It is written in a compact and easy-to-read notation which systematizes the differences between these two varieties of English, while still handling exceptions gracefully and accurately. The CAAPR notation is itself called CAAPR, this time standing for Combined Anglo-American Pronunciation Representation.
CAAPR is based on two primary sources,
the FEWL list for
American English and the online
EPD dictionary
for British
English. This latter dictionary is of very high quality, and
I
wish I knew who collected it so I could offer them my extravagant
thanks. During the development of CAAPR, I transformed,
rearranged and occasionally corrected the EPD, but I find it remarkable
how few corrections were needed. Whatever merits CAAPR may
possess are in large measure derived from the labor of the unknown
contributors to the EPD.
CAAPR actually consists of three word
lists: CAAPR-A,
CAAPR-B and CAAPR-C. CAAPR-A(merican) is a list of words with
their GA pronunciations, derived via reformatting from the FEWL
list. CAAPR-B(ritish) is a somewhat different list of words
with
their RP pronunciations, derived via subsetting and reformatting from
the online EPD dictionary (plus some additional common words unaccountably
omitted from that document). CAAPR-C(ombined) is a combined
list,
comprising the words which are both in CAAPR-A and CAAPR-B, showing
both pronunciations together in a single notation. As will be
described here, the CAAPR notation is slightly different for each
document, in ways that are unlikely to give the user any
difficulty. The CAAPR-A and CAAPR-B lists each contain
approximately 30,000 words. The CAAPR-C list contains the
approximately 28,000 words common to both the A and B lists.
The CAAPR lists show a single (RP or GA)
pronunciation for
each word on the list. Of course, the world is much more
complicated than this, because many words have multiple acceptable
pronunciations, with conflicting data about which is preferable or most
prevalent. Both lists use the technique of lexicographic
consensus to resolve such questions. Note that differences
between the GA and RP pronunciations listed in CAAPR may not reflect
actual Anglo-American differences. For instance, both forms
may
represent pronunciations which are common in both varieties,
distinguished more by happenstance than by geography.
The pronunciations in CAAPR-A were
determined by consensus
of the following dictionaries: The Longman Pronunciation Dictionary,
the Merriam-Webster Collegiate Dictionary CD-ROM, and the American
Heritage Dictionary CD-ROM. In difficult cases, the Random
House
Unabridged Dictionary CD-ROM was also consulted. Similarly,
the
pronunciations of CAAPR-B were determined by consensus of the online
EPD, the Longman Pronunciation Dictionary and the Shorter OED
CD-ROM. In difficult cases, the Cambridge Pronunciation
Dictionary was
also consulted. The Longman Pronunciation Dictionary is the
most
precise of all these sources, and was often used to resolve issues
which could not be easily settled using the other, less technical
sources.
Each list has a similar
format. A typical entry from
the CAAPR-A and CAAPR-B lists looks like this:
wordplay : w&'dpLE·
The entry is divided by the delimiting string " : "
into the
traditional spelling and the CAAPR representation of a word.
In
some cases, like the following:
ill-use
(n) :
i'LyU's
ill-use (v) : i'LyU'z
different forms of the same word may be distinguished by a qualifying
word or phrase in parentheses.
The format of the CAAPR-C list is
similar. Here are
a few example lines:
combat (n) : kombat
combat (v) : k[ø|o]mbat *
Some entries may be followed by an asterisk, which indicates
that the American and British stress patterns for the word are
different. The programming which determines this is still
under
development, and the absence of the asterisk cannot be trusted to mean
that the two pronunciations are in fact similar in their
stress.
(Note that the CAAPR combined notation does not directly indicate
stress, due to technical difficulties.)
The three CAAPR lists can be downloaded
using the
following link:
CAAPR is not intended to be used for
transcription of
continuous text. Nevertheless, it is useful to
give
you an idea of the overall appearance of CAAPR, and I know no better
way to do this than to transcribe a short bit of prose. Here
is
the first paragraph of H.G. Wells' "The Star" (see here for a
plain English
version, among others) written in CAAPR-C. It may seem
cryptic at
first glance, but as one uses CAAPR, one rapidly becomes accustomed to
its conventions, and soon such passages present little mystery.
it w[u/o]z on Dø
f&ßst
dE øv Dø n!U yïr Dat Dý
ønWnsm°nt w[u/o]z mEd, ØLmOst
s[Y/i]m°LtEnÿøsLý
fr[u<o]m TrI
øbz&ßvøtòrý$,
Dat Dø
mOX°n øv Dø pLanît nept!Un,
Dý
WtRmOst øv ØL Dø pLanît$ Dat
µIL øbWt Dø sun, had bikum
verý iratik.
ø ritAßdEX°n in its
v3Losêtý
had b[i\I]n søspektîd in disembR. Den,
ø fEnt, rimOt spek øv LYt w[u/o]z
diskuvRþ in Dø rIj°n øv
Dø pRt&ßbþ pLanît. at
f&ßst Dis did not kØz ený
verý grEt
iksYtm°nt. sYøntifik pIp°L, hWevR,
fWnd Dý inteLîj°ns
rimAßkøb°L
inuf, Iv°n bifØß it bikEm nOn Dat
Dø n!U
bodý w[u/o]z rapîdLý grOiG
LAßjR and brYtR, and Dat its
mOX°n w[u/o]z kwYt dif°r°nt fr[u<o]m
Dý
ØßdRLý
pr[o|O]gr[ø<e]s
øv Dø pLanît$.
Though this version of CAAPR has been
thoroughly
proofread, it is still likely to contain errors and other
faults.
Thus, you should inform me when you encounter errors, whether
isolated or systematic. If you discover ways in which CAAPR
could
be changed to improve its usefulness, I'd also like to hear of
them. Sometimes I suspect that my work on this and other similar
projects
is no
more
than talking to myself in public. If this is not so, and
there
are ways I can make my forays into dictionary building more generally
useful, it would be a shame if no one bothered to tell me.
To
comment on this page,
e-mail Alan at wyrdplay.org
Go to wyrdplay.org home
page
Go to wyrdplay.org spelling
system roster