This page describes CAAPR, the Combined
Anglo-American
Pronunciation Reference. CAAPR is a pronouncing dictionary
for
both British English (RP) and American English (GA). It is
written in a compact and easy-to-read notation which systematizes the
differences between these two varieties of English, while still
handling exceptions gracefully and accurately. The CAAPR
notation
is itself called CAAPR, this time standing for Combined Anglo-American
Pronunciation Representation. It is in many ways a
generalization
of my FLOSS
notation. To some
extent,
the CAAPR list resembles the FEWL
list,
but it avoids some of the complexities of that list by concerning
itself almost
exclusively with phonological rather than morphological information.
One may reasonably ask the question:
What is CAAPR good
for? It is possible that, like FEWL, CAAPR may in time come
to be
useful for computer generation of dictionaries for reformed English
orthographies. At this time, however, this seems premature,
especially since most reformers are unwilling to take on the labor
of trying to please both sides of the Atlantic at
once. I
see CAAPR mostly as a useful tool for self-education. In
compiling it, I have learned a lot about the systematic differences
between the two English varieties, and I recommend it to anyone else
who feels the need for greater insight in this area.
CAAPR is based on two primary sources,
the FEWL list for
American English and the online
EPD dictionary
for British
English. This latter dictionary is of very high quality, and
I
wish I knew who collected it so I could offer them my extravagant
thanks. During the development of CAAPR, I transformed,
rearranged and occasionally corrected the EPD, but I find it remarkable
how few corrections were needed. Whatever merits CAAPR may
possess are in large measure derived from the labor of the unknown
contributors to the EPD.
CAAPR actually consists of three word
lists: CAAPR-A,
CAAPR-B and CAAPR-C. CAAPR-A(merican) is a list of words with
their GA pronunciations, derived via reformatting from the FEWL
list. CAAPR-B(ritish) is a somewhat different list of words
with
their RP pronunciations, derived via subsetting and reformatting from
the online EPD dictionary (plus some additional common words unaccountably
omitted from that document). CAAPR-C(ombined) is a combined
list,
comprising the words which are both in CAAPR-A and CAAPR-B, showing
both pronunciations together in a single notation. As will be
described here, the CAAPR notation is slightly different for each
document, in ways that are unlikely to give the user any
difficulty. The CAAPR-A and CAAPR-B lists each contain
approximately 30,000 words. The CAAPR-C list contains the
approximately 28,000 words common to both the A and B lists.
The CAAPR lists show a single (RP or GA)
pronunciation for
each word on the list. Of course, the world is much more
complicated than this, because many words have multiple acceptable
pronunciations, with conflicting data about which is preferable or most
prevalent. Both lists use the technique of lexicographic
consensus to resolve such questions. Note that differences
between the GA and RP pronunciations listed in CAAPR may not reflect
actual Anglo-American differences. For instance, both forms
may
represent pronunciations which are common in both varieties,
distinguished more by happenstance than by geography.
The pronunciations in CAAPR-A were
determined by consensus
of the following dictionaries: The Longman Pronunciation Dictionary,
the Merriam-Webster Collegiate Dictionary CD-ROM, and the American
Heritage Dictionary CD-ROM. In difficult cases, the Random
House
Unabridged Dictionary CD-ROM was also consulted. Similarly,
the
pronunciations of CAAPR-B were determined by consensus of the online
EPD, the Longman Pronunciation Dictionary and the Shorter OED
CD-ROM. In difficult cases, the Cambridge Pronunciation
Dictionary was
also consulted. The Longman Pronunciation Dictionary is the
most
precise of all these sources, and was often used to resolve issues
which could not be easily settled using the other, less technical
sources.
Each list has a similar
format. A typical entry from
the CAAPR-A and CAAPR-B lists looks like this:
wordplay : w&'dpLE·
The entry is divided by the delimiting string " : "
into the
traditional spelling and the CAAPR representation of a word.
In
some cases, like the following:
ill-use
(n) :
i'LyU's
ill-use (v) : i'LyU'z
different forms of the same word may be distinguished by a qualifying
word or phrase in parentheses.
The format of the CAAPR-C list is
similar. Here are
a few example lines:
combat (n) : kombat
combat (v) : k[ø|o]mbat *
Some entries may be followed by an asterisk, which indicates
that the American and British stress patterns for the word are
different. The programming which determines this is still
under
development, and the absence of the asterisk cannot be trusted to mean
that the two pronunciations are in fact similar in their
stress.
(Note that the CAAPR combined notation does not directly indicate
stress, due to technical difficulties.)
The three CAAPR lists can be downloaded
using the
following links:
CAAPR is not intended to be used for
transcription of
continuous text. Nevertheless, it is useful at this point to
give
you an idea of the overall appearance of CAAPR, and I know no better
way to do this than to transcribe a short bit of prose. Here
is
the first paragraph of H.G. Wells' "The Star" (see here for a
plain English
version, among others) written in CAAPR-C. It may seem
cryptic at
first glance, but as one uses CAAPR, one rapidly becomes accustomed to
its conventions, and soon such passages present little mystery.
it w[u/o]z on Dø
f&ßst
dE øv Dø n!U yïr Dat Dý
ønWnsm°nt w[u/o]z mEd, ØLmOst
s[Y/i]m°LtEnÿøsLý
fr[u<o]m TrI
øbz&ßvøtòrý$,
Dat Dø
mOX°n øv Dø pLanît nept!Un,
Dý
WtRmOst øv ØL Dø pLanît$ Dat
µIL øbWt Dø sun, had bikum
verý iratik.
ø ritAßdEX°n in its
v3Losêtý
had b[i\I]n søspektîd in disembR. Den,
ø fEnt, rimOt spek øv LYt w[u/o]z
diskuvRþ in Dø rIj°n øv
Dø pRt&ßbþ pLanît. at
f&ßst Dis did not kØz ený
verý grEt
iksYtm°nt. sYøntifik pIp°L, hWevR,
fWnd Dý inteLîj°ns
rimAßkøb°L
inuf, Iv°n bifØß it bikEm nOn Dat
Dø n!U
bodý w[u/o]z rapîdLý grOiG
LAßjR and brYtR, and Dat its
mOX°n w[u/o]z kwYt dif°r°nt fr[u<o]m
Dý
ØßdRLý
pr[o|O]gr[ø<e]s
øv Dø pLanît$.
The rest of this page uses certain
notations to add
precision to the discussion. English words, used as examples,
are
enclosed in angle brackets, like <this>. CAAPR
representations of words are enclosed in double brackets, like
«Di's». Individual CAAPR symbols or
symbol sequences
are enclosed in apostrophes, like 'sO'. Sampa phonemic
transcriptions are enclosed in slashes, like /soU/.
Individual
letters or short sequences from traditional spelling are generally
written without any punctuation, as in "the letter t" or "the sequence
ng".
As used in the CAAPR-A and CAAPR-B
lists, the CAAPR
notation is mostly phonemic, with certain non-phonemic notations
added. Because the sound repertoires for GA and RP are
different,
the same symbol will sometimes have a distinct (but related) meaning
for the two varieties. For both varieties of English, not all
speakers have exactly the same phonemes. CAAPR-A and CAAPR-B
target idealized speakers of GA and RP respectively. The GA
pronunciations are based on an idealized American who distinguishes
<which> and <witch>, <marry>
and <merry>, and
<cot> and <caught>, and for whom the two
vowels of
<above> are distinct, as are the two vowels of
<murder>. Similarly, the RP pronunciations are
based on an
idealized Briton who distinguishes <candid> and
<candied>,
and for whom the two vowels of <murder> are
distinct.
Speakers with fewer phonemes than the ideal can merge symbols as
necessary to represent their own speech.
Note that CAAPR is not suitable for use as a spelling system. Quite apart from its complexity, it often requires distinct spellings for a single sound, which cannot be resolved by referring to the speech of any particular speaker. From the perspective of a learner rather than of a linguist, the distinctions would seem quite arbitrary.
CAAPR uses a large repertoire of
symbols, including upper
and lower case alphabetic characters, punctuation, and letters with
diacritics. The symbols are organized into groups so that the
members of each group are somewhat similar, making it easier to master
the entire system. As with any complex system, there are
occasional
exceptions to this organization, as described below.
The symbol groups and their significance
is as follows:
The lower-case alphabetic letters. Each symbol in this group is assigned its natural English phonemic meaning. All the vowels are short. (Note that some letters, notably c, q and x, are omitted.)
Upper-case alphabetic letters, plus a few special symbols and punctuation characters. The symbols in this group are assigned meanings that are usually related in some fashion to the corresponding letter (or, in the case of symbols and punctuation, a letter they resemble in shape). Most of the English long vowels, diphthongs, and less common short vowels fall into this group.
Letters with a dieresis (such as
ë and
Ü). These generally indicate vowels or diphthongs
that occur
primarily before the letter r. There is usually a resemblance
in
sound to the unaccented letter.
Letters with a circumflex (such as
ê and
û). These generally indicate sounds which are
different
between American and British English, except for ê and
î,
which
indicate indistinct sounds within both American and British
English. There is usually a resemblance in sound to the
unaccented letter.
Letters with a grave accent (such as
è and
ò). These indicate sounds which not only differ
between
American and British English but are also differently
stressed.
When stressed, there is generally a resemblance in sound to the
unaccented letter.
The letters ý, and Ý. These ought to be written as ŷ and Ŷ, but as these are not in the standard Latin-1 character set, the acute accented y is used instead.
The special characters ',
·, $ and
þ. The first two characters are stress marks, and
the
latter two serve the purpose of identifying plural and past tense
inflections.
| Symbol |
Sampa |
Example |
Applies
to |
Notes |
| a |
{ |
ka't
(cat) |
Both |
|
| ã |
A~ |
elã'
(elan) |
Both |
(1) |
| A |
A: |
fA'Døß
(father) |
Both |
(2) |
| b |
b |
bE'bý
(baby) |
Both |
|
| C |
tS |
Ce'LO
(cello) |
Both |
(3) |
| d |
d |
de'd
(dead) |
Both |
(4) |
| D |
D |
Da't
(that) |
Both |
|
| e |
E,
e |
e'g
(egg) |
Both |
|
| ë |
e@ |
bë'ß
(bear) |
Brit |
(5) |
| E |
eI |
ka'nøpE
(canape) |
Both |
(6) |
| f |
f |
fY'f
(fife) |
Both |
|
| g |
g |
ga'g
(gag) |
Both |
|
| G |
N |
si'GiG
(singing) |
Both |
(7) |
| h |
h |
hO'm
(home) |
Both |
|
| H |
~ |
u'HuH
(uh-uh) |
Both |
(8) |
| i |
I |
bi'g
(big) |
Both |
(9) |
| ï |
I@ |
pï'ßs
(pierce) |
Brit |
(10) |
| I |
i: |
møXI'n
(machine) |
Both |
(11) |
| j |
dZ |
ju'j
(judge) |
Both |
|
| J |
Z |
vi'J°n
(vision) |
Both |
(12) |
| k |
k |
ki'k
(kick) |
Both |
|
| K |
x |
Lo'K
(loch) |
Both |
(1) |
| L |
l |
Li'Lý
(lily) |
Both |
(13) |
| m |
m |
me'mbøß
(member) |
Both |
|
| n |
n |
nu'n
(none) |
Both |
|
| o |
Q |
to'p
(top) |
Brit |
(14) |
| õ |
o~ |
kõ'nsýëßJ
(concierge) |
Both |
(1) |
| ø |
@ |
sO'fø
(sofa) |
Both |
(15) |
| O |
oU,
@U |
rO'd
(road) |
Both |
|
| Ø |
O: |
pØ'z
(pause) |
Both |
|
| p |
p |
po'p
(pop) |
Both |
|
| Q |
OI |
kQ'n
(coin) |
Both |
(16) |
| r |
r |
rØ'riG
(roaring) |
Both |
(17) |
| & |
3`,
3 |
rif&'r°l
(referral) |
Both |
(18) |
| s |
s |
sØ's
(sauce) |
Both |
(19) |
| t |
t |
ti'Lt
(tilt) |
Both |
(4) |
| T |
T |
Ti'k
(thick) |
Both |
|
| u |
V |
fu'z
(fuzz) |
Both |
(20) |
| U |
u:,
u |
sU'p
(soup) |
Both |
(21) |
| Ü |
U@ |
øbskyÜ'ß
(obscure) |
Brit |
|
| v |
v |
va'lv
(valve) |
Both |
|
| V |
U,
u |
gV'd
(good) |
Both |
(20), (21) |
| w |
w |
wE'wøßd
(wayward) |
Both |
(22) |
| µ |
hw,
W |
µi'C
(which) |
Amer |
(23) |
| W |
aU |
frW'n
(frown) |
Both |
|
| X |
S |
no'kXøs
(noxious) |
Both |
(24) |
| y |
j |
yu'mý
(yummy) |
Both |
|
| Y |
aI |
Y's
(ice) |
Both |
|
| z |
z |
zi'gzag
(zigzag) |
Both |
(19) |
Notes:
'ã', 'õ' and 'K' represent non-English sounds that are used by some speakers in a few words. Unlike the FEWL list, the online EPD shows pronunciations for these words that do not use these sounds. I have resolved this difficulty by including in each CAAPR list two distinct pronunciations for these words, one using the unusual sounds, and one using only everyday English sounds. While the dictionaries clearly prefer the foreign pronunciations for these words, I've encountered very few individuals who actually use them. (But it must be admitted that most of these words are read far more commonly than they are spoken.)
Most
occurrences of the 'A' sound
in American English
are spelled with 'o' in CAAPR-A, as described here.
Technically, the ch sound represented by 'C' is the juxtaposition of the two sounds 't' and 'X'. When these sounds occur in different components of a compound word (such as <potshot>), the are spelled in CAAPR with 'tX' rather than with 'C': the CAAPR representation of <potshot> is «po'tXot».
When the sound of /d/ or /t/ occurs as a past tense ending (as in <walked> and <crawled>), the symbol þ is used in its place («wØ'kþ» and «krØ'Lþ»), as described here.
The 'ë' sound is generally found only in British English, though it may occur in American English for Americans who pronounce <Mary> differently from <merry>. The 'ë' symbol is used in American CAAPR in many words where the British pronunciation would be represented as 'ë', as described here.
Most mixed case phonemic notations use the symbol 'N' rather than 'G' for this sound. I have chosen to use 'G' because I find the spelling «si'GiG» for <singing> easier to read correctly at first glance than «si'NiN».
At first glance, use of the symbol 'E' for the English long a may seem highly unnatural. I find it easy to get used to because of the similarity between the long a and the short e sound represented by 'e'. Use of 'E' rather than 'A' for this sound is also unsurprising for those familiar with IPA or Romance languages.
In my particular dialect of American English, there are three words with a nasalized short u: <huh>, <uh-huh> and <uh-uh>. In CAAPR-A, these words are represented by «hu'H», «u'HhuH» and «u'HuH». These words are not in the CAAPR-B list, but the same spellings could be used if these words are pronounced similarly in RP.
In
many words, the /I/ sound is
represented in CAAPR
by 'ê' rather than 'i', as described here.
In British
English, /I/ may also be represented by 'ý', as
noted below.
The 'ï' sound is generally found only in British English. The 'ï' symbol is used in American CAAPR in many words where the British pronunciation would be represented as 'ï', as described here.
At first glance, use of the symbol 'I' for the English long e may seem highly unnatural. I find it easy to get used to because of the similarity between the long e and the short i sound represented by 'i'. Use of 'I' rather than 'E' for this sound is also unsurprising for those familiar with IPA or Romance languages. In American CAAPR, an unstressed 'I' sound will generally be written as 'ý', as noted below.
Most mixed-case phonemic notations use the symbol 'Z' rather than 'J' for this sound. This is mostly a question of taste; «kØrsA'J» is probably a better notation than «kØrsA'Z» for <corsage>, but «vi'Z°n» is probably superior to «vi'J°n» for <vision>.
The symbol for /l/ ought to really be 'l', but in many fonts it is impossible to tell a lower-case l from an upper-case I. Using 'L' for this sound represents a triumph of practicality over principle.
The 'o' sound is found only in British English, as the standard short o sound. The 'o' symbol is used in place of 'A' in American CAAPR in most words where the British pronunciation would be represented as 'o', as described below.
The 'ø' symbol designates the schwa, treated by CAAPR as distinct from both the short u ('u') and the stressed vowel of <bird> ('&'). In many words, the schwa is represented in CAAPR by 'ê' rather than 'ø', as discussed later.
Using
'Q' to represent a vowel
does take a little bit of getting used to. I find it helps to
think of the tail of the 'Q' as an I attached to an O with an unusually
placed ligature.
The 'r' symbol is used for the consonant r. In British English, it is used only when an /r/ is always pronounced. Otherwise, the symbol 'ß' is used, as described here.
The
symbol '&' represents
the sound /3`/ in
American CAAPR, but /3/ in British CAAPR. This means that
<bird> is represented as
«b&'d» in American
CAAPR, but as «b&'ßd» in British
CAAPR. As
will be seen, when CAAPR is used as a combined notation, the British
usage prevails. The character '&' was chosen for its
visual
resemblance to a capital R.
When the sound of /s/ or /z/ occurs as a plural ending (as in <walks> and <crawls>), the symbol $ is used in its place («wØ'k$» and «krØ'L$»), as noted below.
Those familiar with IPA and Sampa might expect 'u' to represent the /U/, and 'V' to represent the /V/ sound. While this would indeed be logical, I find 'u' for /V/ to be considerably more natural and readable, especially given the prevalence of this sound compared to /U/.
The Longman pronunciation dictionary uses the symbol /u/ as a "neutralization" of /u:/ in many words, such as <situation> and <regulate>. In American CAAPR, I have transcribed this vowel as 'U', which is in agreement with the popular American dictionaries. But for British English, I represent it as 'V', which is consistent with popular British dictionaries. When CAAPR is used as a combined notation, the British usage prevails.
Some
occurences of the 'w' sound
in British English
are spelled with 'µ' in CAAPR, as described here.
The 'µ' sound is found only in American English, in association with the digraph wh. (Even though most Americans do not use this sound, American CAAPR is oriented towards an ideal speaker who does.) The 'µ' symbol is used in British CAAPR in most words where the American pronunciation would be represented as 'µ', as noted below.
Most mixed-case phonemic notations use the symbol 'S' rather than 'X' for this sound. I have chosen to use 'X' because I find spellings such as «pre'Xøs» rather than «pre'Søs» for <precious> to be easier to read correctly at first glance.
Both of the English varieties targeted
by CAAPR have some unique sounds
whose use is generally predictable from the spelling of the words
which contain it. For instance, the sound designated by CAAPR
'µ' does not occur in British English, but one can predict,
in
almost all cases, that a word pronounced with a /w/ in British English,
but spelled with wh, will be pronounced as 'µ' in American
English (at
least by those Americans who use that sound). I call the
symbols
with this property ortho-phonemic,
as they have phonemic significance in one variety, but orthographic
significance in the other variety.
The use of the ortho-phonemic symbols in
CAAPR brings the
American and British spellings closer to one another, in a way that
makes sense even for speakers not familiar with the other variety.
The ortho-phonemic symbols are:
| Symbol |
Sampa |
Example |
Variety |
Spelling |
Notes |
| ë |
E |
bë'r
(bear) |
Amer |
air,
ar, are, ear, eir |
(1) |
| ï |
I |
pï'rs
(pierce) |
Amer |
ear,
eer, er, ere, ier |
(2) |
| o |
A: |
to'p
(top) |
Amer |
o,
qua, wa, en |
(3) |
| ß |
(r) |
rØ'ß
(roar) |
Brit |
r,
final or before
cons. |
(4) |
| µ |
w |
µi'C
(which) |
Brit |
wh |
(5) |
Notes:
Most Americans pronounce words which in RP are pronounced with the /e@/ diphthong with a simple /E/, normally written in CAAPR as 'e'. Examples of such words are <fair>, <Mary>, <spare>, <bear> and <their>. American CAAPR represents the combination /Er/ with 'ër' when the regular spelling uses one of the forms listed in the table above.
Most
Americans pronounce words
which in RP are
pronounced with the /I@/ diphthong with a simple /I/, normally written
in CAAPR as 'i'. Examples of such words are
<fear>,
<beer>, <serious>, <here> and
<pierce>.
American CAAPR represents the combination /Ir/ with 'ïr' when
the
regular spelling uses one of the forms listed in the table above.
Most Americans pronounce words which in RP use the short o vowel /Q/ with /A:/, normally written in CAAPR as 'A'. Examples of such words as <stop>, <qualify>, <wander> and <entree>. American CAAPR represents /A:/ with 'o' when the regular spelling has one of the forms listed in the table above.
RP is a non-rhotic form of English. This means that the letter r is generally not pronounced when it is followed by a consonant, or at the end of a word. (The consonant may, however, be spoken in speech at the end of a word when the next word begins with a vowel.) For British English, these suppressed r's are represented in CAAPR by the letter 'ß', as in «fAß» (far) or «sØßd» (sword). This symbol was chosen for its visual resemblance to a capital R. (In fact, the symbol 'R' could have been used, but I think an unusual character is better at communicating the unique nature of the construct.) Note that CAAPR indicates pronunciation of words, not of larger units, so the phrase "here and there" would be written in CAAPR as «hïß ønd Dëß», even though the sequence of sounds would be more like «hïrønDëß».
Britons
generally pronounce words
which Americans might
pronounce with the /hw/ sound with a simple /w/, normally written in
CAAPR as 'w'. Examples of such words are
<where> and
<awhile>. British CAAPR represents the sound
/w/ with
'µ' when the regular spelling uses wh.
CAAPR uses a number of additional
non-phonemic
symbols for various purposes. These symbols are listed in the
table below, and explained in the following notes.
| Symbol |
Sampa |
Type |
Example |
Notes |
| ê |
ø,
I, 1 |
Indef.
sound |
ma'gnêt
(magnet) |
(1) |
| ý |
I,
i, i: |
Indef.
sound |
ha'pý
(happy) |
(2) |
| ÿ |
I,
i, y |
Indef.
sound |
prI'vÿøs
(previous) |
(3) |
| ° |
(ø) |
Optional
sound |
ma'jik°Lý
(magically) |
(4) |
| * |
(ø) |
Optional
sound |
tY'*L
(tile) |
(5) |
| ¹ |
(ø),
(I), (1) |
Optional
sound |
kri'm¹n°L
(criminal) |
(6) |
| þ |
d,
t |
Morpheme |
dra'gþ
(dragged) |
(7) |
| $ |
s,
z |
Morpheme |
dru'g$
(drugs) |
(8) |
| ' |
"
(or ') |
Stress |
øLY'v
(alive) |
(9) |
| · |
%
(or ,) |
Stress |
do·mênE'X°n
(domination) |
(9) |
Notes:
One of the features of both GA and RP is the "indistinct i". Consider the word <magnet>. Some people pronounce this as «ma'gnit», while others pronounce it as «ma'gnøt». Some speakers may use either pronunciation at random. In such cases, I use the term "indistinct i" to describe the vowel. It is not a distinct sound: it is always pronounced as /@/ or /I/ (or according to some authorities as /1/). The indistinct i is represented in CAAPR by the symbol 'ê'.
Many English words are spelled with a final y used as a vowel, as in <many> and <quality>. American dictionaries generally show the sound of this vowel as an unstressed long e. British dictionaries, on the other hand, often show it as a short i. The Longman dictionary uses the symbol /i/ to represent it. For British English /i/ is distinguished from /i:/ by length as well as stress, while for American English, only the stress difference is apparent. CAAPR uses the symbol 'ý' where Longman uses /i/. One difference is that, for American English only, where an unstressed /i:/ occurs, it is spelled in CAAPR as 'ý' rather than 'I'. This mostly affects words ending in /i:z/ such as <rabies>, which rhymes with <babies> for Americans. <rabies> is spelled «rE'býz» in American CAAPR, but «rE'bIz» in British CAAPR.
The
symbol 'ÿ' is a
variant of 'ý', representing a sound which may either be one
of
the vowel sounds of 'ý', or the consonant 'y'. The
Longman
dictionary uses the symbol /i/, linked to the following sound by a tie,
to represent it. The sample word <previous> is
typical of
the words where it occurs, as the word might be pronounced either
«prI'výøs» or
«prI'vyøs».
The symbol '°' (a superscript 0) is an alternate form of 'ø', indicating a schwa sound which may be omitted. '°' will be followed by one of the liquid sounds 'L', 'm', 'n', 'r' or 'ß'. When '°' is the last vowel of a word, or when it is followed by two consonants, it indicates that the following consonant may be syllabic, as in «ba't°L» or American «pO'k°r». The Longman dictionary uses a raised schwa symbol, indicating a sound ordinarily omitted but sometimes spoken, in this situation. The '°' notation is unusual in that CAAPR uses this notation if any of its source dictionaries show the sound as optional rather than insisting on consensus. The notation was chosen due to its suggestion of a tiny 'ø' (or of Longman's raised schwa).
The
'*' symbol is an alternate
form of '°', occurring in certain situations where the possible
schwa sound is often considered to be an interpolation, generally not
indicated by the traditional spelling. The main problem it
addresses is that many speakers will insert a schwa sound between a
long vowel and an 'L' or 'r', a phenomenon Longman calls
"breaking". For instance, I pronounce <boil>
and
<royal> as a rhyme, «bQ'°L»
and
«rQ'°L» respectively. Others may
pronounce them
both without the schwa, or may pronounce only <royal>
with the
schwa. I feel it is useful to somehow distinguish the CAAPR
notations for these two words, and so use «bQ'*L»
for
<boil>, but «rQ'°L» for
<royal>. Use
of '*' in this fashion after a vowel is the most common use, but it may
also appear after a consonant, as in
«k&'r*L» for
<curl> (American) or «re's*LiG» for
wrestling.
The notation was chosen for its similarity to '°'.
Distinguishing '*' from '°' is probably only useful when CAAPR
is
being used as the basis for a spelling system.
The
symbol '¹' (a
superscript 1) is an alternate form of 'ê', indicating an
indistinct i sound which may be omitted. It is probably
easiest
to think of it as indicating a choice between '°' and
'i'. As
with '°', CAAPR will use this notation in place of
'ê' if
justification is found in any of its source dictionaries. The
notation was chosen due to its suggestion of a tiny 'i'.
The symbol 'þ' is used in CAAPR to represent a regular past-tense inflection, spelled as -d or -ed in standard spelling. The pronunciation is either /t/ or /d/, /t/ if preceded by a voiceless consonant, or /d/ otherwise. 'þ' is also used for words derived from past tenses, such as <confusedly> («kønfyU'zêþLý»), and words where an -ed suffix is applied to a noun, as <jeweled> («jU'øLþ»). The symbol 'þ' was chosen here because of its similarity in appearance to a capital D, and its phonetic association with t.
The symbol '$' is used by CAAPR to represent a regular plural or possessive inflection, spelled as -s, -es or 's in standard spelling. The pronounciation is either /s/ or /z/, /s/ if preceded by a voiceless consonant, or /z/ otherwise. '$' is used with Latin or Greek plurals ending with the /z/ sound, even though the form is irregular as in <diagnoses> («dY·øgnO'sI$»). '$' is also used for word derived from plurals or possessives, such as <salesman> («sE'L$møn»). The symbol '$' was chosen here because of its similarity in appearance to a capital S.
In
both CAAPR-A and CAAPR-B,
stress is marked.
The marks appear after the vowel letter. ''' indicates
primary
stress, and '·' indicates secondary stress. For
both
CAAPR-A and CAAPR-B, the placement of stress represents a lexicographic
consensus. But there is an interesting issue here.
American
dictionaries and British dictionaries use different and rather
incompatible systems for representing stress. Consider the
words
<specify>, <newspaperman>,
<predisposition> and
<everyday>. The American and British
pronunciations are
essentially the same, but the consensus American stress is
«spe'søfY·»,
«nU'zpE·p°rma·n»,
«prI·di·spøzi'X°n»
and
«e'vrýdE'», while British dictionaries
assert
«spe'sêfY»,
«nyU'zpEpøßman»,
«prI·dispøzi'X°n» and
«e'vrýdE». These systematic
differences in
representation make it more or less impossible to come up with an
accurate picture of the stress differences between the two varieties,
and for this reason, CAAPR-C omits stress marking altogether.
Exactly how to place stress marks is
controversial. Apparently, the best regarded technique is to
show
stress before the start of the syllable. I reject this for
CAAPR
simply because it makes things harder. It requires that
syllable
boundaries in words be established, which is not otherwise
required. Further, it is no small task, and one about which
various
authorities disagree. I believe that for CAAPR the only
practical
approach is to place the mark adjacent to the vowel which is stressed.
Even here, there is controversy as to whether the
mark is
best placed before or after the vowel. I prefer to place it
after, but don't regard my reasons to be so compelling as to spend time
justifying this decision. Note that CAAPR shows stress in one
syllable words, except for weak forms of words like <the>
and
<of>. This aids computer processing and
transformation of
CAAPR text.
(Just as a curiosity, I observe that the actual definition of X-Sampa calls for the use of the characters /"/ and /%/ to represent stress. I've never actually seen this done: in my experience, the symbols /'/ and /,/, which closely resemble the equivalent IPA symbols, are used instead. This explains the strange notation in the Sampa column of the table above.)
CAAPR-C is the combined CAAPR notation,
which attempts to
merge the American and British spelling for each word, producing a
reasonable composite. The process works as follows.
First, stress marks, which are not used
in CAAPR-C, are
dropped. Next, if the CAAPR-A transcription uses the
'&'
symbol, it is replaced by '&r'. Then, if the
remaining
transcriptions are identical (as for the word <soggy> -
«sogý»), this is the CAAPR-C
representation.
If the revised transcriptions are not identical, then corresponding
characters which are different are collected into a bracketed pair,
first the American version, and then the British one. For
instance, consider the word <forecast>. The
American
«fØrkast» and the British
«fØßkAst» are combined into
«fØ[r,ß]k[a,A]st».
This may possibly be
the end of it, but usually it is not. In many cases, this
combined transcription will contain pairs which are common enough that
there are rules for replacing them with a single letter. For
<forecast>, we have two pairs, [r,ß] and
[a,A].
Almost
always, a British 'ß' will be paired with an American 'r',
and
the symbol 'ß' will be used as the combined
representation.
This reduces <forecast> to the string
«fØßk[a,A]st». But
the combination [a,A]
is also very frequent, occurring in words like <bath>,
<class>, <shaft>, etc. For this
reason, the
combination is given the representation 'â' in
CAAPR-C. So
the final CAAPR-C version of <forecast> is
«fØßkâst».
If there are any bracketed pairs that
cannot be reduced to
a single symbol in this fashion, the CAAPR-C allows the comma between
the symbols of the pair to be replaced by a character indicating
whether one of the pronunciations indicated may be more generally
recognizable than the other. This process and the additional
symbols it uses is described in a later
section.
Stress information is dropped from
CAAPR-C because of the
incompatible systems used in CAAPR-A and CAAPR-B. However,
the
process of determining the composite CAAPR-C representation will
usually notice if the stress has changed in a significant way; these
words are marked in the list with an asterisk. About 1 in
every
40 words is marked like this.
This process introduces a new class of
CAAPR symbols,
which I call "synthetic" symbols, as they represent a synthesis of an
American and a British pronunciation. Some of the symbols
(such
as 'ß') are extended in meaning in a natural way, while
others,
like 'â', are new symbols introduced explicitly to represent
a
common pair.
The following table defines the CAAPR-C
synthetic symbols
in terms of the corresponding pairs of symbols they replace:
| Symbol |
Replaces |
Example |
Notes |
| â |
[a,A] |
kLâs
(class) |
|
| 3,
ê, î |
A
mixture of i,
ê and ø, unstressed |
paL3t
(palate),
sIkrêt (secret), bLaGkît (blanket) |
(1), (2) |
| è |
[e
or ë, ø
or ° or {no sound}] |
sekrêtèrý
(secretary) |
(3) |
| ¹
or ³ |
A
mixture of i,
ê, ¹, ø, ° or
{no sound},
unstressed |
fert¹LYz
(fertilize), kuz³n (cousin) |
(1) |
| ! |
[,y]
before U or V or Ü or ø |
d!Utý
(duty) |
(4) |
| ô |
[Ø,o] |
krôs
(cross) |
|
| ò |
[Ø,ø] |
mandøtòrý
(mandatory) |
(3) |
| °
or * |
A
mixture of ø,
° (or *), or {no sound} |
Epr°n
(apron) |
(1) |
| R |
[°r,
øß] |
piCR
(pitcher) |
(5) |
| ß |
[r,ß] |
mAßk
(mark) |
(1) |
| ü |
[&,u] |
würý
(worry) |
|
| û |
[ø,V] |
regyûLR
(regular) |
|
| Ü |
[V,Ü] |
pyÜß
(pure) |
(1) |
| V |
[U,V]
before vowel |
v&ßCVøs
(virtuous) |
(6) |
| ÿ |
A
mixture of y, ÿ
or ý |
yUnÿøn
(union) |
(1) |
| Ý |
[ê
or ø,
Y or Y*] |
ØßgønÝzEX°n
(organization) |
Notes:
For these symbols, the synthetic meaning is simply a generalization of its phonemic, ortho-phonemic or morphemic meaning as described above.
In
CAAPR-C, the symbol
used for the indistinct i depends on how indistinct it is. If
one
English variety (usually the British) uses 'i' and the other is
indistinct, the synthetic symbol 'î' is used. If
one
variety (again, usually the British) uses 'ø' and the other
is
indistinct, the synthetic symbol '3' is used. In the
remaining
cases, where both varieties are indistinct, or one uses 'ø'
and
the other uses 'i', the symbol 'ê' is used, as in CAAPR-A and
CAAPR-B. Of course, it can be argued that one symbol,
'ê',
would do for all of them, and there is something to this. But
one
could also regard the 'î' as "almost an i", and '3' as
"almost a
schwa", and CAAPR allows you to choose whichever simplification you
prefer. I'm not thrilled with using a digit as a
representation
symbol here, as CAAPR has otherwise managed to avoid this, but it
suggests the IPA "turned e" (ə), and looks better than any alternative
I've come up with. Similarly, the symbol ³ is used
as an
alternate form of ¹ when the schwa (or absent) pronunciation
dominates the short i.
The grave accented symbols 'è' and 'ò' are noteworthy in that their use always implies a stress change. In words like <secretary> in which they occur, the syllable is stressed in GA, but unstressed in RP. In fact, in RP, the vowel is sometimes known to disappear entirely.
The '!' symbol was chosen here for its resemblance to an upside down i, because some authorities regard the 'yU' combination to in fact be an /iu:/ diphthong.
As noted above, where Longman uses a representation of /u/, CAAPR-A uses the symbol 'U', while CAAPR-B uses 'V'. In CAAPR-C they are recombined as V when preceding a vowel. If [U,V] occurs before a consonant, it is possible that one or both of the vowels was different from /u/, and therefore the pair cannot be reduced.
About 1 in 14 words in the CAAPR-C list contain symbol pairs that cannot be reduced to synthetic symbols. Without the use of the synthetic symbols, the percentage of differences would be very much higher.
Most English words have a CAAPR-C
representation without
any symbol pairs, meaning that their British and American
pronunciations differ only in the typical ways cataloged by the
synthetic symbols above. A small number of words, however,
have
differences whose low frequency makes it impractical to define single
symbols for them. If one is seeking the holy grail of an
orthography that will have a single workable spelling for all of
English, then one naturally asks of such words whether there is
additional information that would allow one to choose between the two
incompatible pronunciations. The answer, as it happens, is
"Maybe".
It may happen that, in one of these
words, one or both of
the pronunciations may have some international recognition.
Here
are some simple words illustrating the possibilities:
<byproduct> - written
in CAAPR-C as
«bYprod[ø,u]kt», that is, the consensus
American
pronunciation is «bYprodøkt», and the
consensus
British pronunciation is «bYprodukt».
However, the
British Shorter OED gives the primary pronunciation
«bYprodøkt», and the American
Merriam-Webster gives
the primary pronunciation «bYprodukt». So
it would
appear that both pronunciations are commonly used on both sides of the
Atlantic. In some sense, the pronunciations are equally
recognizable, or equivalent.
<clerk> - written in
CAAPR-C as
«kL[&,A]ßk». This is
the exact opposite of
the situation above - the American dictionaries do not list the British
pronunciation, and vice versa. One might call the
alternatives
incompatible.
<version> - written in
CAAPR-C as
«v&ß[J,X]°n».
This is intermediate
between the two previous cases.
«v&X°n» is
shown as acceptable by some American dictionaries, and
«v&ßJ°n» by some British
dictionaries, but in
neither case is it shown as the primary pronunciation. In
this
situation, I refer to the pronunciations as "weakly equivalent".
<mushroom> - written
in CAAPR-C as
«muXr[U,V]m». The British EPD dictionary
shows
«muXrUm» as the primary pronunciation, but none of
the
American dictionaries I consulted do the same for
«muXrVm». In this case, the American form
dominates
the British one, which is to say that it appears that the American form
is more acceptable in British English than vice versa.
<from> - written in CAAPR-C as «fr[u,o]m». This is a weaker form of the situation above. Several American dictionaries recognize «from» as an alternate (but not primary) American pronounciation, but no British dictionary I've seen gives such acceptance to «frum». «from» is an acceptable, but minority, American form. I would say that the British pronunciation "weakly dominates" the American one in this case.
The CAAPR-C list embellishes the
representations of words
like these by using another symbol in place of the comma within pairs
to indicate equivalence or dominance of the two pronunciations, as
follows:
The symbol '=' indicates equivalence; thus, the embellished CAAPR-C form of <byproduct> is «bYprod[ø=u]kt».
The symbols '<' and '>' indicate dominance. The symbols have their mathematical sense. The word <mushroom> is represented as «muXr[U>V]m», showing the American pronunciation is dominant.
The symbol '|' indicates weak
equivalence. The
embellished representation of <version> is
«v&ß[J|X]°n».
The symbols '/' and '\' indicate weak dominance. The symbol leans to the side which dominates. Thus, the word <from> is represented as «fr[u/o]m», showing the weak dominance of the British pronunciation.
The symbol '?' indicates incompatibility. <clerk> is represented as «kL[&?A]ßk», reminding us that there's no pleasing everyone.
Of course, these embellishments can and should be ignored by those uninterested in this additional distributional information, and in general when I cite CAAPR spellings, I use comma separators except in cases where the embellishments are of interest.
Version 2 of CAAPR differs from version
1 in two important
regards. The more important of the two is that the CAAPR-B
list
has been revised to mark use of the indistinct i, as well as using the
plural and past tense symbols '$' and 'þ'. This
makes the A
list and the B list equivalent in terms of the amount and style of
information presented.
The other change consists of enhancements to the notation itself. Most of the enhancements related to finer classification of indistinct, unstressed sounds (the symbols '°', '*', '¹', '³', 'ÿ', '3', 'î' and 'R'). Also, the embellished symbol pair notation was introduced, and the two symbols 'à' and 'Ÿ' were dropped, the former because there was no meaningful distinction from 'è', and the latter as a side-effect of the introduction of 'R' («LYR» is a better representation of <liar> than «LŸß»).
It may be questioned whether the increased precision in the marking of indistinct sounds is really a good thing. Does the difference between «E'prøn» and «E'pr°n» really matter? One answer is that some folks think it does, and will argue with you for as long as you want about whether «je'nørøL» or «je'nrøL» is more correct. But I think the real benefit of this degree of precision is an ironic one: it emphasizes how much uncertainty and variance there is in the pronunciation of unstressed sounds. I developed CAAPR in the hopes it would be useful to spelling reformers. One of the problems with many spelling reforms is that they end up reflecting the minutiae of their inventor's dialect, setting forth as certain that one of »rabit« or »rabut« is correct, and the other a blatant error. CAAPR's rather thorough demonstration of the uncertainty of English pronunciation serves as a persuasive argument for abandoning the pure phonetic principle for the spelling of unstressed syllables. I think this lesson is an important one, and that any practical reformed spelling system for world English must take it seriously.
One other area in which I have enhanced version 2 of CAAPR is that, despite the internationality of the CAAPR-C notation, the version 1 CAAPR-C list included only traditional American spellings. That is, it had an entry for <color>, but not for <colour>. This flaw has been remedied in version 2.
Note that as of March, 2007, I have changed the format of the lists slightly, to
make it easier to transfer their content into Microsoft Excel.
Version 1 of the CAAPR list included a
number of
"signature words", whose CAAPR representation deviated from the rules,
generally in order to give greater consistency to the spelling of
related words. Except as the result of errors, this version
contains no such words. CAAPR is not intended as a spelling
system, and the inconsistencies of the language itself as well as of
the sources of CAAPR should not be obscured. There are good
reasons to spell <princess> and <duchess>
with the same
ending in a practical orthography, but it seems best to leave the data
alone, and represent them as «pri'nses» and
«du'Cis» in CAAPR-B, which after all is a reference
notation and not a spelling system.
Because this version of the CAAPR
notation is more
complicated than the previous version, I will continue to make the
version 1 lists downloadable here.
I note that, in addition to its advantage of simplicity, the version 1
CAAPR-B list takes no account of the "indistinct i", which may be of
use to those who doubt its existence.
Though this version of CAAPR has been
thoroughly
proofread, it is still likely to contain errors and other
faults.
Thus, you should inform me when you encounter errors, whether
isolated or systematic. If you discover ways in which CAAPR
could
be changed to improve its usefulness, I'd also like to hear of
them. Sometimes I suspect that my work on this site is no
more
than talking to myself in public. If this is not so, and
there
are ways I can make my forays into dictionary building more generally
useful, it would be a shame if no one bothered to tell me.
To
comment on this page,
e-mail Alan at wyrdplay.org
Go to wyrdplay.org home
page
Go to wyrdplay.org spelling
system roster