This page is about ABCD, which stands for Alan's Basic Codes with Diacritics. I call ABCD a "notation" - it is easier to explain what it is not than to explain what it is, and why you might be interested in it. ABCD is not a spelling system: it is too complex and idiosyncratic for that. Neither is it a dictionary key: it is neither as accurate or as regular as a dictionary key. Essentially, ABCD is a notation which elucidates the relationship between a word's usual spelling and its pronunciation. It is suitable for use both with words that conform to common English spelling patterns, words like nasty, nice, terrible and benevolent, as well as with horridly exceptional words like women, colonel, boatswain and connoisseur.
ABCD is loosely based on my spelling system DRE.
It makes use of an extensive number of diacritics, organized much like
the DRE set of diacritics. ABCD uses both lower- and upper-case
characters, but prefers to use lower-case for the most common and
familiar patterns, and upper-case for less familiar ones.
Further, ABCD's lower-case characters always match the corresponding
traditional spellings (possibly with the addition of a diacritic),
while upper-case characters may occasionally differ from them.
(For instance, the Z in ABCD
represents an s which is pronounced as z.) Each ABCD letter or
digraph represents both a sound and an English spelling. For
instance, the digraphs sh, ti and SH
all represent the same sound, but spelled as sh (as in shoe),
ti (as in nation) and ch (as
in machine) respectively. In
addition to lower- and upper-case alphabetics, ABCD uses a few
punctuation characters, mostly to note flaws in a word's usual
spelling, and also to separate word constituents. (The @
character is an anomaly - it is treated as a special form of the
letter a rather than as punctuation.)
Like DRE, ABCD is ambiguous about certain aspects of pronunciation, though less so than DRE. An ABCD spelling does not usually indicate stress, and also does not distinguish between the schwa and the regular short vowel sounds. However, if you ignore these two areas, ABCD is quite precise. In fact, one characteristic of ABCD is that the ABCD spelling of a word is sufficient to represent both its traditional spelling (ignoring typographical issues like capitalization and hyphenation) and its pronunciation, subject to the two ambiguities of stress and schwa. (I have defined a less ambiguous form of ABCD, briefly discussed in an appendix, but the ambiguous version is easier to read, and I think more useful.) Furthermore, the ABCD representation of pronunciation and spelling is almost entirely context-free, which makes it easy to process mechanically by a computer program. The context-dependent elements of ABCD are enumerated in this appendix.
Here are a few simple examples of ABCD in action, to give you a
better idea of how it works. The list below is in the format "TS: ABCD".
(Throughout this page, I use the convention of displaying
traditionally spelled words (TS) in italics, and ABCD spellings in
boldface. Occasionally, my CAAPR
notation is also used - this is also shown in bold, and enclosed in
curly braces to identify it as CAAPR.)
abundant: abundant
alienate: álîenáte
charisma: KHariZma
handsome: han(d)som(e)
awareness:
aWâre+nêss
accordion:
a^ccòrdîon
demoralization: dem~Öral~Ízátion
laugh: l[au:~À][gh:f]
abundant is a word spelled entirely according to English patterns, and requiring no markings for vowel sounds. alienate also conforms to patterns, but requires some vowels be marked with diacritics to prevent misinterpretation. Note that no special marking is required for a final silent e following a long vowel. The word charisma also conforms to high-frequency patterns, but both the ch and the s need to be altered to avoid misunderstanding. (The spelling KH is used rather than CH, because CH is equally plausible as a spelling for the ch of machine.) handsome has two silent letters, and, in contrast to alienate, the e is marked as silent since the previous vowel is not long. Finally, the word awareness shows some ABCD techniques for resolving some of the subtler ambiguities of regular spelling. The W in awareness is capitalized to show that aw is not to be interpreted as a single vowel sound (as in law), while the + sign after aWâre shows that the first e is not pronounced as a short e or a schwa, but instead is silent, because it ends the root word aware.
Unlike the words above it, the word accordion does not conform to basic English patterns, because the double c follows a vowel representing a schwa. The ^ flags this situation. The word demoralization displays a different difficulty - the British and American pronunciations differ. The ~ flags a code which is interpreted differently for the two varieties of English. And finally, the word laugh is completely defiant of standard English patterns, and so the ABCD representation simply shows how the letter combinations map to sounds.
An ABCD dictionary is available for download here. It contains 27,000 English words, spelled in TS and ABCD. For most words, the spelling is the same for both American and British English; where they differ, the dictionary provides both of them, with the American spelling first. The whole point of ABCD is really the dictionary. It can be used as an educational tool, for increasing one's understanding of the patterns of English spelling, and the ways in which they break down. I also believe that it may be useful as the starting point for developing spelling systems which are very similar to existing spelling, by allowing easy identification of those words that fail to abide by whatever rules the designer feels are most important. One reason I developed ABCD was to help me develop a version of my system DRE which did not require the use of diacritics. I have not, at this time, actually succeeded in doing this, but there is no doubt that ABCD has made the process easier, and I consider it possible that the process might someday actually produce something satisfactory.
The ABCD dictionary is ultimately derived from the CAAPR dictionary; the pronunciations it uses are based on consensus of 2 American dictionaries, 2 British dictionaries and the Longman Pronunciation dictionary, which covers both varieties. See the CAAPR page for more information on this subject. The dictionary download above includes copies of both this page and the CAAPR definition for easy reference.
This version of ABCD and its dictionary differ from previous versions due to removal of the symbol L (which was completely synonymous with (l)), the use of the symbol É (which was originally part of ABCD, but was then unwisely removed), and removal of the exception for the S symbol in the sequence ôuS.
Note: The dictionary was updated in 2019 by the addition of a
significant number of additional words, most of them frequently used
capitalized words, as well as the correction of a few errors. The ABCD
notation itself was not changed, although some of the tables in this
document were clarified and/or corrected.
Before attempting to describe the ABCD notation in full detail,
it will be useful to describe the way it organizes its diacritical
markings, which is based on the conventions of my spelling system
DRE. The organization is strictly applied for letters in lower
case; some flexibility is allowed for upper-case letters to avoid
running out.
Vowels without diacritics represent either the regular short sound of the vowel (as in shack, check, chick, shock and chuck), or the schwa. The digraph oo represents the vowel of shook. An unmarked y is a rather special case, and may have either the vowel sound of misty, or the consonantal sound of yell. When followed by an r, some vowels may also be pronounced with a stressed er sound, as in fern, bird and burn.
Letters with an acute accent represent the normal long sound of the five English vowels, as in máte, méte, míte, móte and múte. The digraph oó represents the vowel of moót. An acute-accented ý, as in flý, represents the same sound as í.
Letters with a grave accent represent an alternate sound of the marked letter. These sounds are all long in length, and almost always spoken distinctly. These sounds are especially common in words of European origin. Mnemonic words are dràma, sÈànce, marÌne, bòre (as well as dòg, in American English) and crùde.
Letters with a circumflex represent an alternate sound of the marked letter. These sounds are shorter than the sounds associated with the grave accent. They may be reduced to a schwa, and may also have a slightly different meaning preceding an r than in other positions. The sounds of the circumflexed vowels when no r follows are those of vidÊo, audîó, Ôther and pÛsh. (DRE also spells âny and prêtty, but these particular forms are not used in ABCD.) Before the letter r, the circumflexed vowels represent the sounds of câre, hÊre and wÔrd. They are also used in the standard suffixes -âlly, -lêss, -nêss and -fûlly, to indicate an indistinct sound despite the following double letter. Note that ABCD, unlike DRE, only uses a lower-case ê for an unstressed sound: ênáble is spelled with one in ABCD, but prÏtty is not.
Letters with a dieresis represent the same sound as the unmarked letter, and are often used where the unmarked letter would have a different interpretation. Examples are päradox, wickËd, fúËl and sörry. The ü with a dieresis has a special meaning. It represents either the unstressed sound of Û or the schwa, preceded by a y, as in regülar or mercüry. In ABCD, ë and ï are also used to indicate the normal short sound of the vowel before an r, as in chërish and spïrit. A ÿ with a dieresis may be used in ABCD to indicate a y which is always pronounced as a vowel, as in lobbŸist, where, because the y is followed by a vowel, one might otherwise assume the consonantal sound is intended.
DRE and ABCD both utilize a number of digraphs in which one of the vowels is marked with a diacritic. In ABCD, except for a few exceptions (oó, éu, éw and combinations like íË containing an Ë), the rule for interpretation of such combinations is simple - the sound is that of the marked letter, and the unmarked letter is ignored. Example words include hEÂd, thÈY, dÍE, dÔUble, nervôuS and cúe. A certain number of unmarked digraphs are used also, and they generally have the meaning you would expect. These are ai, au, aw, ay, ea, ee, eu, ew, oa, oe, oi, oo, ou, ow and oy. Note that éu and éw are exceptions to the rule for interpreting digraphs above. eu and ew are pronounced like ùe (sleutþ, brew), and éu and éw like úe (as in éuró and féw). These two combinations break the rules because of the lack of an accented w in many fonts and on most keyboards.
One of the all-too-common features of English spelling is the use of silent letters. ABCD encloses silent letters in parentheses, as in (k)nífe, í(s)land and ballÈ(t). There are a few letters and combinations, notably e and gh, whose treatment is more complicated when silent. See their descriptions below for more information.
Another confusing aspect of English spelling is the use of
double letters. A useful rule of thumb is that a double consonant
implies that the preceding vowel is short and stressed; for example,
compare filling and filing,
or matter and material.
Unfortunately, there are a great many exceptions to these so-called
rules. ABCD uses the ^
character preceding a double letter to flag a vowel which is either
unstressed or long, as in a^dditional
or gró^ss. Note that ck,
cq and dj
are treated as double letters for this purpose.
One might well ask of ABCD: is it oriented towards American or British English? The answer is that it is equally oriented towards both. It may be used to spell words from either regional variety. In most cases, the spelling is independent of the variety. This may happen in any of three ways. Many words are pronounced the same in both varieties, such as cat, cloudy and demonstration. Other words are pronounced differently, but with pronunciations that are related to each other according to well-defined rules, allowing a single spelling to be used for both. Examples of such words are pot, stairs and curious. A third case is that of words which have related pronunciations in American and British English, but where the relationship is not reliable for similar words. For instance, the American pronunciation of sample would be written sample in ABCD and the British pronunciation as sàmple, but the similar word ample would be spelled ample for both varieties. ABCD uses the character ~ to indicate a pronunciation which commonly differs between American and British English. For instance, sample is spelled s~Àmple in ABCD. Words such as clerk and neither with unusual differences between American and British pronunciation must have two ABCD spellings, one for each variety.
You may also be wondering what the distinction is between the
upper- and lower-case ABCD symbols. Before a lower-case symbol
could be used, there were two prerequisites. The first was
simply that the base character for a lower-case symbol had to be the
character used in regular spelling. ABCD uses the symbol Ù
for the letter o when pronounced as long oo, as in move.
A lower-case symbol could not be used unless I were willing to use a
form of the letter o for it. The other requirement was that I
would use a lower-case symbol only when it was pretty clear how you
would spell the sound in a rational spelling system. For
instance, spelling the vowel of plain
as ai is very reasonable, and
so lower-case could be used. But spelling the second vowel of machine with the letter i
is at least dubious, and so the word is denoted maSHÌne
rather than maSHìne. The
capital letter emphasizes that there's "something funny" going on
here.
I think the best approach to describing the details of ABCD is a semi-formal one. So let me start off with a description of how the ABCD spelling of a word is determined. The process starts off with a decomposition of a word into pairs. The first element of each pair is one or more letters from the spelling, and the second is from the CAAPR representation of the pronunciation (see Endnote 1). (CAAPR is described here. Note that the remainder of this page assumes familiarity with CAAPR - so if it is new to you, you may want to keep the CAAPR writeup open for reference.)
As an example, the word charisma
is originally decomposed as:
[ch:k][ar:ør][i:i][s:z][m:m][a:ø]
The process of deriving the ABCD spelling then proceeds in three
steps:
High frequency pairs are replaced by ABCD symbols or symbol combinations. (It seems remarkable that there are few enough of these pairs that one can find readable representations of all of them.)
Certain symbols may be modified or added based on special circumstances of individual words. This is done either to avoid ambiguity (e.g., to distinguish the th of worth from that of porthole) or to note unexpected violations of English patterns (like the double t in attend or the s at the end of the non-plural atlas).
Any remaining pairs have the second element modified to contain an ABCD code rather than a CAAPR code, except that the CAAPR symbols {ø} and {&}, which do not have an unambiguous ABCD representation, are retained.
Step 1, and aspects of step 2, can be summarized easily by simply listing the pairs to which they apply, and how they are represented (which I will do below). But some additional notations are more conveniently described here:
In a number of cases, pairs at the end of a word are handled differently from the same pair within a word. This is especially true for the silent e, and the letter s when used to indicate a plural or possessive. Because of English's fondness for compound and derived words, these letters can sometimes occur within a word with the end-of-word interpretation. In ABCD, a plus sign is used to indicate the end of a word within a word. Examples are scâre+crÓW and státe+ment. The plus sign is also used to separate double letters when both are sounded, as in un+nótiCed or mis+státe.
Silent letters are enclosed in parentheses, as noted above. (Other notations are sometimes used for silent e and gh, as described below.)
The symbol ~ always indicates that what follows is pronounced differently in British and American English. Individual letter combinations beginning with ~ are discussed below, together with the notations with no dependence on English variety.
One property of ABCD is that it is very easily parsed by software - while some letter combinations, such as ch, have meanings distinct from those of their components, there is never (so far as I can determine) any ambiguity in how a word is divided into meaningful units. I note that this property is preserved even if all the ~'s are removed. Which is to say, the ~'s are there to assist the human reader, but are unnecessary for accurate algorithmic decomposition.
Having said all that, I am now ready to run down the alphabet, and produce a complete list of the ABCD phonograms. Though the list is quite long and detailed, it is highly structured and organized, notably by the diacritical conventions given above, and for that reason is not hard to grasp and master. For symbols beginning with a ~, the Denotes column of the tables gives both the American and the British meaning for the symbol: in a/à, the a is the American form, and the à the British form.
a -
Symbol | Denotes | Example | ABCD Example |
a | [a:a] or [a:ø] |
cat about |
cat about |
á | [a:E] | late | láte |
à | [a:A] | father | fàther |
â | -âlly | locally | lócâlly |
@ | [a:i] or [a:ê] |
message | mess@G(e) |
ai | [ai:E] | rain | rain |
air | [air:ër] | fair | fair |
ar | [ar:ør] | awkward | awkward |
âr | [ar:ër] | care | câre |
är | [ar:ar] | paradox | päradox |
ärr | [arr:ar] | arrow | ärrÓW |
au | [au:Ø] | pause | pauZe |
aw | [aw:Ø] | claw | claw |
ay | [ay:E] | play | play |
Å | [a:Ø] | water | wÅter |
AÉ | [ae:I] | algae | alGAÉ |
~À | a/à | bath | b~Àtþ |
~Âr | âr/[ar:ør] | secretary | secrêt~Âry |
See below for [a:o], as in watch (ABCD wOtch).
b -
Symbol | Denotes | Example | ABCD Example |
b | [b:b] | big | big |
bb | [bb:b] | rubble | rubble |
Symbol | Denotes | Example | ABCD Example |
c (Note 1) | [c:k] or [c:s] |
cat city |
cat city |
cc (Note 1) | [cc:k] | accord | accòrd |
ck | [ck:k] | luck | luck |
cqu | [cqu:kw] | acquit | a^cquit |
cQ | [cqu:k] | lacquer | lacQer |
ch | [ch:C] | chill | chill |
ci (Note 2) | [ci:X] | vicious | viciôuS |
Ce, C(e) (Note 3) |
[ce:s] | advance furnace |
advanCe furn@C(e) |
Notes:
c denotes [c:s] if followed, in the traditional spelling, by e, i or y, and otherwise [c:k]. The few words which do not conform to this pattern must be spelled in ABCD with an explicit [c:k] or [c:s], as in [c:k]eltic or fa[c:s]àd(e). cc denotes [cc:k] unless followed by e, i or y. When it is followed by e, i or y, the pronunciation is ks - this is regarded as 2 c's in succesion, rather than a single occurrence of cc.
ci denotes [ci:X] only when followed by a vowel. Otherwise, the c and the i are distinct symbols.
Ce and C(e) represent [c:s] followed by a silent e, in situations where the silent e is not a magic e, as in advanCe and furnaC(e). In the case of furnace, the e is misleading about the preceding vowel, and so is parenthesized. In the case of advance, the previous vowel is too distant in the word to be affected by the e, which serves the useful purpose of defining the pronunciation of the preceding c.
See below for [ch:k],
as in chrome (ABCD KHróme), and for [ch:X],
as in machine (ABCD maSHÌne). Also see n below for information on the
combinations ñc and ñKH as in uncle
and anchor.
Symbol | Denotes | Example | ABCD Example |
d | [d:d] or [d:þ] |
dog wanted |
d~Ög wOntêd |
dd | [dd:d] | add | add |
dG (see G) | [dg:j] | judge | judG(e) |
dj | [dj:j] | adjust | a^djust |
dJ (see J) | [d:j] | procedure | procédJur(e) |
ed (Note 1) | [ed:þ] | missed | missed |
At the end of a word, ed represents [ed:þ], that is, a past tense in which the e is silent, and in which the d is pronounced either as t or d, depending on the previous letter. There are some exceptional words ending with -ed in which the e is surprisingly not silent, such as beloved and wicked - these words are spelled with Ëd in ABCD to prevent ambiguity.
Note that words like hunted and raided are regular, represented by [e:i][d:þ], and unambiguously spelled with -êd in ABCD. Also note that the Ëd spelling in unnecessary in one-syllable words, and so bed is bed and not bËd in ABCD. Words compounded from a one-syllable word ending with ed will use Ëd, as in sickbËd, unless the one syllable word is separated from the rest by a +, as in fòrce+fed.
e -
Symbol | Denotes | Example | ABCD Example |
e | [e:e] or [e:ø] |
ten rivet |
ten rivet |
e (Note 1) | [e:-] | late | láte |
é | [e:I] | medium | médîum |
ê (Note 2) | [e:i] or [e:ê] or -lêss, -nêss |
enable erupt lifeless fitness |
ênáble êrupt lífe+lêss fitnêss |
ea | [ea:I] | feast | feast |
ear | [ear:ïr] | fear | fear |
ed (see d) | [ed:þ] | missed | missed |
ee | [ee:I] | feet | feet |
eer | [eer:ïr] | beer | beer |
er | [er:ør] or [er:&r] |
river revert |
river rêvert |
ër | [er:er] | cherish | chërish |
ërr | [err:er] | terrible | tërrible |
es (see s) | [es:$] | miles | míles |
eu | [eu:U] | sleuth | sleutþ |
eur (Endnote 2) |
[eur:Ür] | pleurisy | pleurisy |
éu | [eu:yU] | feud | féud |
éur (Endnote 2) |
[eur:yÜr] | Europe | éurop(e) |
ew | [ew:U] | drew | drew |
éw | [ew:yU] | few | féw |
É (Note 3) | [e:I] | me crises museum |
mÉ crísÉs múZÉum |
È | [e:E] | ballet cafe |
ballÈ(t) cafÈ |
Èe | [ee:E] | matinee |
matinÈe |
Ê | [e:ý] | apostrophe video |
apostroPHÊ vidÊó |
Ë (Note 4) | [e:e] or [e:ê] or [e:ø] |
duet wicked duel |
d~ÚËt wickËd d~ÚËl |
EÀr | [ear:àr] | heart | hEÀrt |
EÂ | [ea:e] | head measure |
hEÂd mEÂZJur(e) |
EÂr | [ear:ër] | bear | bEÂr |
EÄ | [ea:ë] | yeah | yEÄ(h) |
ËA (Note 5) |
[ea:ï] | idea (Brit) | ídËA |
ÉI | [ei:I] | seize | sÉIze |
ÉIr | [eir:ïr] | weird | wÉIrd |
ÈI | [ei:E] | reign | rÈI(g)n |
ÈIr | [eir:ër] | their | thÈIr |
ER | [ear:&r] | earth | ERtþ |
Êr | [er:ïr] | here | hÊre |
Ër (Note 6) | [er:ør] | supplier | su^pplíËr |
ÈY | [ey:E] | survey | survÈY |
ÊY | [ey:ý] | money | mÔnÊY |
~Er (Note 7) |
ër/[er:ør] | cemetery | cemet~Ery |
~ÉU (Endnote 2) |
eu/éu | neutral | n~ÉUtral |
~ÉUr | eur/éur | neurotic | n~ÉUrotic |
~ÉW | ew/éw | news | n~ÉWZ |
Notes:
The handling of silent e in ABCD is complicated. There are two functions that silent e commonly performs. It indicates that the previous vowel sound is long, in which case the e is commonly called magic. Alternately, in many words, such as mice, savage and tense, it changes the sound of the previous consonant. (Note that without the final e, tens would be a plural, and the s would be pronounced as z.) When both functions are taken into account, we can classify words ending with a silent e into 4 categories. We say a final e is magic if the previous vowel (separated from the e by a single consonant sound) is long. (If the consonant is an r, the sounds of â, Ê and ò are also treated as long.) We say a final e is misleading if there is a vowel preceding it which ought to be long, but is not. In vice, the e is magic, but in service, it is misleading. In words in which a final e is not magic, we call it useful if it is preceded by c, g or s, and otherwise useless. An e can be both useful and misleading, as in garbage, or both useless and misleading, as in festive.
When a silent e occurs at the end of a word, it is
enclosed in parentheses if it is misleading
or if it is useless. Also,
when
a useful (but not magic) e follows the letter c or s, ABCD
capitalizes the consonant to show what the e is accomplishing.
Some example words are míne,
pláce, festiv(e),
sav@G(e) and tenSe.
When a magic e occurs within a word and is not parenthesized, it
is followed by a +,
usually indicating the end of an internal word, as in bâre+ly, lífe+boat,
or minCe+meat.
ê is used only when [e:i] is unstressed. Ï is used instead when stressed, as in Ïñglish.
É is used only when [e:I] appears where a silent e might be expected, at the end of a word (bÉ) or before s (parentþesÉs). Note that É is used even in words with no other vowels, such as be, even though it would be impossible for the e to be silent. É is also used in words like museum, where use of the usual é would seem to be part of the éu digraph.
Ë is used for the regular sound of e when a bare e would be misinterpreted, such as wicked, which looks like a past tense, and duet, where d~Úet would appear to be a one-syllable word whose vowel is ~Úe.
The sound of ËA is an RP diphthong represented in SAMPA as /I@/, which usually occurs before r in words like pier.
Ër is used like Ë, to prevent ambiguity, as in flýËr, where a bare e would be treated as part of the composite vowel symbol ýe.
Note that the distinction between ~Âr and ~Er is only orthographic - both are pronounced the same in either variety of English.
See below for [le:øL], as in double (ABCD dÔUble).
f -
Symbol | Denotes | Example | ABCD Example |
f | [f:f] | free | free |
ff | [ff:f] | stuff | stuff |
g -
Symbol | Denotes | Example | ABCD Example |
g | [g:g] | good | good |
gg | [gg:g] | egg | egg |
G (Notes 1, 2) | [g:j] | germ | Germ |
GG | [gg:j] | veggie | veGGÎE |
GH | [gh:-] | high taught |
híGH tauGHt |
GJ | [g:J] | mirage genre |
miràGJ(e) GJ[e:o]nrË |
Notes:
Note that the spelling G is used even if the letter following g is unusual, as in margarine (American ABCD màrGarin(e)).
The combination dG, as in edge (ABCD edG(e)), is treated as a double letter.
h -
Symbol | Denotes | Example | ABCD Example |
h | [h:h] | hot | hot |
H (Note 1) | [h:h] | mishap | misHap |
Notes:
Because the letter h is used in a number of digraphs, it is frequently ambiguous when it follows a consonant, as in the words porthole, mishap and rawhide. ABCD uses a capital H for [h:h] if confusion might be possible, as in pòrtHóle, misHap and rawHíde.
i -
Symbol | Denotes | Example | ABCD Example |
i | [i:i] or [i:ê] or [i:ø] |
pig acid devil |
pig acid devil |
í | [i:Y] | item | ítem |
î (Endnote 3) | [i:ý] or [i:ÿ] |
radio | rádîó |
ir | [ir:ør] or [ir:êr] or [ir:&r] |
admiral direct bird |
admiral direct bird |
ïr | [ir:ir] | miracle | mïr@cle |
ïrr | [irr:ir] | mirror | mïrror |
Ì | [i:I] | marine | marÌne |
Ï (Note 1) | [e:i] | pretty | prÏtty |
IÉ | [ie:I] | brief | brIÉf |
IÉr | [ier:ïr] | pier | pIÉr |
ÍE | [ie:Y] | pie | pÍE |
ÎE | [ie:ý] | cookie | cookÎE |
~Í (Note 2) | i/í | missile civilization |
miss~Íle civil~Ízátion |
Notes:
Ï is used for [e:i] only when stressed; when unstressed, ê is used.
Note that the ending e in miss~Íle is not parenthesized - it is misleading in American English, but magic in British English.
The letter i also occurs in the combinations ci, si, sci, ssi, ti, and Zi, where it has no sound of its own, but modifies the sound of the preceding consonant.
See below for [i:y], as in billion (ABCD billYon).
j -Symbol | Denotes | Example | ABCD Example |
j | [j:j] | jam | jam |
jj | [jj:j] | hajj | hajj |
J (Note 1) | see note | capture | captJur(e) |
Notes:
The capital J is inserted as a sign of palatalization in the combinations dJ (in procedure), sJ (in insure), ssJ (in pressure), tJ (in capture and question), and ZJ (in measure). More precisely, it is used in representing the pairs [d:j] (dJ), [s:X] (sJ), [ss:X] (ssJ), [t:C] and [ti:C] (tJ) and [s:J] (ZJ). The symbol J also appears in the combination GJ, described under g.
(Note that there is no ambiguity between the t and ti spellings corresponding to tJ - an i was present in the original spelling exactly if the letter after the J is not a u.)
Symbol | Denotes | Example | ABCD Example |
k | [k:k] | skin | skin |
KH | [ch:k] | school | sKHoól |
The combination ck is treated as a double k - see c above.
See n below for information on the combination ñk.
l -
Symbol | Denotes | Example | ABCD Example |
l | [l:L] | leg | leg |
ll | [ll:L] | pill | pill |
le (Note 1) | [le:øL] | purple | purple |
Notes:
le represents the normal sound of l followed by the normal sound of e when not at the end of a word, before a past tense or plural marker (d or s) or followed by +, as in sled. I now regret this context dependency, as slËd would be considerably easier for a program to process correctly, especially due to the simultaneous -ed ending.
m -
Symbol | Denotes | Exmaple | ABCD Example |
m | [m:m] | mud | mud |
m | [m:øm] | spasm | spaZm |
mm | [mm:m] | hammer | hammer |
n -
Symbol | Denotes | Example | ABCD Example |
n | [n:n] | nice | níce |
n | [n:øn] | didn't | didnt |
nn | [nn:n] | sunny | sunny |
ng | [ng:G] | song | s~Öng |
ñ (Note 1) | [n:G] | finger sink |
fiñger siñk |
N (Note 2) | [n:n] | ungrateful | uNgráte+ful |
Notes:
ñ can be used before any of the various symbols representing or starting with the k sound, as in uñcle, añKHor, bañquet, coñQer and jiñx.
N represents [n:n] when the regular n sound is followed by g, as in ungrateful. N is not needed preceding k sounds - unclean is simply spelled unclean in ABCD.
o -
Symbol | Denotes | Example | ABCD Example |
o | [o:o] or [o:ø] |
pot lemon |
pot lemon |
ó | [o:O] | zero | zéró |
ò | [o:Ø] | coral sloth (Amer) |
còral slòtþ |
oa | [oa:O] | boat | boat |
oar | [oar:Ør] | boar | boar |
oe | [oe:O] | toe | toe |
oer | [oer:Ør] | Boer | boer |
oi | [oi:Q] | boil | boil |
oo | [oo:V] | book | book |
oó | [oo:U] | boot | boót |
oór (Endnote 2) |
[oor:Ür] | poor | poór |
or | [or:ør] | motor decorate |
mótor decoráte |
ör | [or:or] | laboratory (Brit) |
laböratory |
örr | [orr:or] | sorry | sörry |
ou | [ou:W] | house | house |
ôu | -ôuS | vicious | viciôuS |
ow | [ow:W] | allow | a^llow |
oy | [oy:Q] | boy | boy |
O | [a:o] | squash | squOsh |
Ô | [o:u] | mother | mÔther |
OR (Note 1) | [our:ør] | favour | fávOR |
Ôr | [or:&r] | word | wÔrd |
OÙ | [ou:U] | soup | sOÙp |
OÙr (Endnote 2) |
[our:Ür] | tour | tOÙr |
ÒUr | [our:Ør] | court | cÒUrt |
ÔU | [ou:u] | trouble | trÔUble |
ÓW | [ow:O] | blow | blÓW |
~Ö | ò/ö | cross forest |
cr~Öss f~Örêst |
~Òr | òr/[or:ør] | category | catêg~Òry |
Notes:
Symbol | Denotes | Example | ABCD Example |
p | [p:p] | pink | piñk |
pp | [p:pp] | happy | happy |
PH | [ph:f] | photo | PHótó |
q -
Symbol | Denotes | Example | ABCD Example |
qu | [qu:kw] | queen | queen |
Q | [qu:k] | unique | únÌQe |
See n above for the combinations ñqu and ñQ, as in bañquêt and coñQer.
r -
Symbol | Denotes | Example | ABCD Example |
r (Note 1) | [r:r] | red | red |
r (Note 2) | [r:-] | arrive | a^rríve |
Notes:
s -
Symbol | Denotes | Example | ABCD Example |
s (Note 1) | [s:s] or [s:$] |
sad cries |
sad crÍEs |
ss | [ss:s] | guess | g(u)ess |
sc, sC (Note 2) | [sc:s] | scent acquiesce |
scent acquîesC(e) |
sci (Note 3) | [sci:X] | luscious | lusciôuS |
sh | [sh:X] | ship | ship |
si (Notes 3, 4) | [si:X] | mansion | mansion |
sJ (see J) | [s:X] | insure | insJùre |
ssi | [ssi:X] | mission | mission |
ssJ (see J) | [ss:X] | pressure | pressJur(e) |
S (Note 5) | [s:s] | atlas cactus tense |
atlaS cactuS tenSe |
SH | [ch:X] | machine | maSHÌne |
Notes:
At the end of a word (or before a +) s is assumed to indicate a plural, in which case, depending on the preceding sound, it may be pronounced as z. The plural s often follows a silent e - however, in contrast to the past tense, where the d is always preceded by e, a silent e in the plural generally implies its presence in the singular as well.
sc denotes [sc:s] preceding e, i or y. In any other position, it is simply the juxtaposition of the regular s and c (pronounced as k) symbols. The C may be capitalized to indicate a following non-magic e.
si, sci, ssi and ti have the sound of {X} only when followed by a vowel. Otherwise, the i is a separate symbol.
When si or ti follows n, there are two common pronunciations: nch and nsh. The CAAPR dictionary, from which the ABCD dictionary is derived, uses nsh as the recognized pronunciation, which is more in line with the pronunciation of si and ti in other positions.
S represents [s:s] at the end of a word, where it might be mistaken for a plural. S is also used before a silent e, where the e prevents the word from being interpreted as a plural. See e note 1 above for more details.
See z below for [s:z] (except in plurals) as in hose (ABCD hóZe).
t -
Symbol | Denotes | Example | ABCD Example |
t | [t:t] | top | top |
tt | [tt:t] | kitten | kitten |
tch | [tch:C] | catch | catch |
th | [th:D] | that leather |
that lEÂther |
tþ | [th:T] | think truth |
tþiñk trùtþ |
ti (see s Notes 3, 4) |
[ti:X] | vocation | vócátion |
tJ (see J) | [t:C] or [ti:C] |
capture question |
captJur(e) questJon |
u -
Symbol | Denotes | Example | ABCD Example |
u | [u:u] or [u:ø] |
sun circus |
sun circus |
ú (Note 1) | [u:yU] or [u:yV] |
puny annual |
púny annúal |
ù (Note 1) | [u:U] or [u:V] |
lunar gradual |
lùnar gradJùal |
û | -fûlly | awfully | awfûlly |
ü | [u:yV] or [u:yû] or [y:yø] |
refugee regular volume (Amer) |
refügee regülar volüm(e) |
úe | [ue:yU] | cue | cúe |
úer (Endnote 2) |
[uer:yÜr] | puerile (Brit) | púer~Íle |
ùe | [ue:U] | true | trùe |
ur | [ur:ør] or [ur:&r] |
Arthur burn |
àrtþur burn |
urr (Note 2) | [urr:ür] | hurry | hurry |
úr (Endnote 2) |
[ur:yÜr] | purity | púrity |
ùr (Endnote 2) |
[ur:Ür] or [ur:Vr] |
plural brochure (Amer) |
plùral bróSHùr(e) |
ür | [ur:yûr] or [ur:yør] |
accurate mercury |
accür@t(e) mercüry |
Ù | [o:U] | move | mÙve |
Û | [u:V] or [u:û] |
push prejudice |
pÛsh prejÛdiC(e) |
~Ú | ù/ú | student | st~Údent |
~Úe | ùe/úe | Tuesday | t~ÚeZd[ay:y] |
~Úr (Endnote 2) |
ùr/úr | duration manure |
d~Úrátion man~Úre |
~Ü | Û/yÛ | insulation | ins~Ülátion |
Notes:
The symbols ú and ù ordinarily represent the long vowel /u:/, but they represent /u/ (which is rendered in CAAPR as {V}) before a vowel.
urr is the only
instance of an ABCD notation without a ~
which is interpreted differently for American and British English,
but this seems reasonable, since TS exhibits this variance itself.
Symbol | Denotes | Example | ABCD Example |
v | [v:v] | very | vëry |
vv | [vv:v] | savvy | savvy |
w -
Symbol | Denotes | Example | ABCD Example |
w | [w:w] | way | way |
wh | [wh:µ] | which | which |
W (Note 1) | [w:w] | away | aWay |
Wh (Note 1) | [wh:µ] | awhile | aWhíle |
Notes:
x -
Symbol | Denotes | Example | ABCD Example |
x | [x:ks] | fix | fix |
xc (Note 1) | [xc:ks] | except | êxcept |
X | [x:gz] | exist | êXist |
Notes:
xc stands for [xc:ks] only preceding e, i or y. Otherwise, it is simply an x followed by a c, as in excavate.
See n above for information on the combination ñx, as in jiñx.
y -
Symbol | Denotes | Example | ABCD Example |
y (Note 1) | [y:y] or [y:ÿ] |
yes Tokyo |
yeS tókyó |
y (Note 1) | [y:ý] | happy everything |
happy ev(e)rytþing |
ý | [y:Y] | fly qualify |
flý quOlifý |
ýe | [ye:Y] | dye | dýe |
ÿ | [y:i] | myth | mÿtþ |
Y | [i:y] | million | millYon |
Ÿ (Note 1) | [y:ý] | lobbyist | lobbŸist |
Notes:
The ABCD symbol y
may indicate either a consonant or vowel sound. As a consonant, it
denotes [y:y]. As a
vowel, it denotes [y:ý].
The vowel sound occurs at the end of a word or before a consonant,
and the consonantal sound occurs at the beginning of a word.
Before a vowel, either sound may occur. Usually, when y is found after a consonant
and before a vowel, the corresponding pair is [y:ÿ],
indicating that both the consonant and the vowel pronunciation are
possible. In this position, a consonantal pronunciation is
assumed - if only a vowel pronunciation is used, then the spelling
should be Ÿ. See Endnote 3 for further
discussion of the ambiguous letter y
and its sounds.
z -
Symbol | Denotes | Example | ABCD Example |
z | [z:z] | zoo | zoó |
zz | [zz:z] | buzz | buzz |
Z | [s:z] | hose | hóZe |
Zi (Note 1) | [si:J] | vision | viZion |
ZJ (see J) | [s:J] | measure | mEÂZJur(e) |
Notes:
Unusual sounds -
As noted, the ABCD spelling notation provides unique codes for high-frequency spelling patterns. Of course, as we all know, English is afflicted with a sizable number of words that break these patterns. ABCD handles these words by means of bracketed symbol pairs, for instance, [eau:éw] in beautiful. The eau is the letter sequence in the usual spelling, and the éw defines the sound (but not the spelling). Obviously, this representation is not unique: [eau:ú] or [eau:yoó] could have been written instead.
Almost all sounds of English have at least one high-frequency spelling, and so there is at least one ABCD spelling that can be used in such pairs for those sounds. But a few sounds, mostly from words of foreign origin, are so low-frequency that there is no standard ABCD notation for them. An example is the final sound of the word loch, when pronounced in the authentic Scottish way. ABCD therefore must assign representations to these sounds, so that these words can be rendered sensibly. For instance, the /x/ sound of loch is given the ABCD spelling of QH, and so the word is written lo[ch:QH] in ABCD.
This table catalogs the representations of unusual sounds (and one uncommon American/British difference):
Symbol | Denotes (SAMPA) |
Example | ABCD Example |
ã | /A~/ | melange | mÈl[an:ã]GJ(e) |
õ | /O~/ | concierge | c[on:õ]cî[er:air]GJe |
QH | /x/ | loch | lo[ch:QH] |
UH | /V~/ | uh-huh | UHhUH |
& (Note 1) |
/3/ | masseuse (Brit) |
mass[eu:&]Z(e) |
~OOr (Note 2) |
oòr/oor | courier | c[our:~OOr]îer |
Notes:
Completely pure CAAPR is not used here. Certain
simplifications have been introduced to remove distinctions not
relevant to this project. In particular,
The indistinct i, CAAPR {ê}, is treated as identical to the short i ({i}).
The CAAPR symbol {°} is treated the same as {ø}, and the symbols {î}, {3}, {¹} and {³} are treated as synonymous with {ê}, and therefore with {i}.
The symbol {ß} is treated as identical to {r}, and {R} as identical to {ør}.
The {*} symbol is removed.
Also, some aspects of ABCD depend on stress. Sometimes, when
stress differs between British and American English, it will happen
that the ABCD spelling is based on a compromise between the two.
A good example is the word electronic.
The
American CAAPR for this word is {iLe·ktro'nik},
while
the British CAAPR is {i·Lektro'nik}.
The conversion to ABCD is done on the composite form {i·Le·ktro'nik},
leading
to the ABCD spelling Ïlectronic,
which does not accurately reflect the American pronunciation. I
have edited the ABCD dictionary to correct this particular instance,
but it is likely that other examples of the same problem still exist.
ABCD utilizes a number of spellings that imply the equivalence
of a short sound followed by an r to a related long sound followed by
r. Examples are the spellings air,
eer and oar,
which logically ought to be pronounced as ár,
ér and ór,
but are actually pronounced as âr,
Êr and òr
respectively. This implied equivalence is also reflected in the
common use of the magic e in words like care,
sphere and sore.
The most difficult case has to do with the vowels represented in CAAPR
as {Vr} and {Ür}.
In
American English, both {Vr}
and {Ür} symbolize the same
sound, represented in SAMPA as /Ur/, while for British English {Ür} represents the diphthong
/U@(r)/. I note that {Ür}
is quite common in RP, while {Vr}
occurs in only a few words, notably guru
and courier. It turns out to
be extraordinarily convenient to represent {yÜr}/{Ür} by the long vowel symbols úr and ùr,
as in cúre and plùral.
Furthermore, though American and British dictionaries quite
consistently show this sound as {Vr},
most of the participants in the Saundspel group feel that {Ur}
(Sampa /u:r/) is more accurate. For these reasons, {Ür}
is consistently shown with a long vowel. For instance, poór
is used rather than poor.
However, when the sound is understood as {Vr}
in British English, it is represented as a short sound there.
The word guru is spelled g[ur:~OOr]ù in ABCD, representing
gùrù in American English, but
gÛrù in British English.
CAAPR utilizes the symbol {y} for the consonant sound of the letter y (as in young), and {ý} for the vowel sound (as in happy). But there is a third possibility, a quite common one, represented by {ÿ}. {ÿ} represents a sound that can be either {y} or {ý}, varying by speaker. Most words like champion and warrior, in which i is followed by an unstressed vowel, are of this sort. Some words in which y is followed by a vowel, such as Tokyo and Libyan, are also of this sort. The ABCD approach for dealing with words containing this ambiguity is to spell them with the existing letter. Thus, champion is spelled champîon, implying a vowel sound, even though the consonant sound is no doubt more common, and similarly, the spelling libyan is used, implying a consonant sound for the y, even though the word is probably more commonly pronounced with a vowel there. The symbols Y and Ÿ can be used for words like spanYard and lobbŸist, where the pronunciation is unequivocally different from what one might expect.
ABCD represents pronunciation and traditional spelling in an almost context-free way, which is to say that the interpretation of its symbols usually does not depend on their context. For instance, the sequence SH always represents the sound of {X} and the spelling ch, regardless of where it occurs in a word, or what other symbols are adjacent. For a computer program to understand ABCD, it is mostly necessary simply to divide the text into symbols. Some letters are used in more than one symbol (for instance, the letter H occurs in the symbols H, GH, KH, PH, QH, SH and UH), but the rule is that each letter is contained in the longest possible symbol, so that SH will always represent SH, and never S followed by H.
There are, however, a small number of symbols whose interpretation is dependent on context. These context dependencies are found in regular English spelling, and the familiarity benefits of adopting them in ABCD more than offset the additional complexity of context dependence. The context-dependent elements of ABCD are of two sorts, positional and general. The positional elements are as follows:
The sequence le represents the sounds {øL} when preceded by a consonant at the end of a word, or before a +. Otherwise, it represents the regular sounds of l and e. The end-of-word interpretation also applies when le is followed by d (indicating a past tense) or s (indicating a plural) in the same positions. Examples: battle, trÔUbled.
The sequence ed represents the sound of either {d} or {t}, depending on the preceding sound, when at the end of a word or preceding a +. Anywhere else, it represents the regular sounds of e and d. Words like bed, which have no vowel preceding the ed, are an exceptional case, in which the ed is obviously not a past tense marker, and the non-end-of-word interpretation of ed applies. Examples: missed, filled.
The letter e (when not part of le as described above) is silent at the end of a word or before a +, and also before the letter s in these positions. Anywhere else, it is interpreted as a short e or a schwa. Note that some silent e's at the end of a word are represented instead by (e). This is a context dependency for generation of ABCD, but not for interpretation. Examples: shíne, fenCe, híde+out.
The letter m indicates {øm} after a consonant at the end of a word, possibly with a following s or ed. Otherwise, it is simply interpreted as {m}. Example: priZm.
The letter n indicates {øn} when preceded by a consonant and followed by t at the end of a word. Otherwise, it is simply interpreted as {n}. Example: didnt.
The letter s represents the sound of either {z} or {s}, depending on the preceding sound, when at the end of a word or preceding a +. Anwhere else, it represents the regular sound of s. Note that s's pronounced as {z} at the end of a word are represented instead by Z when the word is not plural, as with sÊrIÉZ (series). This is the only place in ABCD where word meaning intrudes on its definition, but it affects only the generation of ABCD, not its interpretation. Examples: cats, d~Ögs.
The symbol c is pronounced as {s} before any form of e, i or y, and as {k} otherwise. The same principle applies to symbols compounded from c, notably cc (either {ks} or {k}), sc (either {s} or {sk}) and xc (either {ks} or {ksk}). Examples: cent, coat, accent, account, scíËnce, screen, êxcíte, êxclaim.
The letter i appears in a number of symbols where, when followed by a vowel, the i is silent, and the sound of the previous letter or letters is changed. For instance, ci represents the sound {X} when followed by a vowel, and otherwise represents the regular sounds of c and i (which is to say {si} or {sø}). Similarly, the sequences si, sci, ssi and ti all represent {X} when followed by a vowel, and Zi represents {J}. Examples: dêficient, pension, lusciôuS, mission, initial, viZion.
The symbol tJ has context dependencies not for its pronunciation, which is always {C}, but for the corresponding spelling. If tJ is followed by a form of the letter o, the corresponding spelling is ti; otherwise the spelling is t. Examples: questJon, nátJur(e).
The symbol y may represent either the consonant {y}, the vowel {ý}, or the indeterminate hybrid {ÿ}. The rules are as follows: If the y is the first character of a word, or follows +, it represents {y}. If it is the last character of a word, or precedes +, it represents {ý}. Within a word, if it precedes a consonant, it represents {ý}. Otherwise, it represents either {y} or {ÿ}. That is, when y is followed by a vowel, the consonant pronunciation is always legitimate, and a vowel pronunciation may be valid as well. Examples: yes, happy, copycat, canyon, libyan. For further discussion of the handling of y, see Endnote 3.
As I mentioned earlier, ABCD is an ambiguous system. The five unmarked vowel letters, as well as ü and Û, may denote either the schwa or a short vowel. This ambiguity can be remedied without losing the readability of ABCD. I'm not sure this is a change for the good, as it requires many more diacritics, while the benefits are small unless one considers this distinction important even in an orthography intended to be very similar to TS. Nevertheless, here's how it is done.
The short vowel sounds of a, e, i and o are denoted by the vowel with a dieresis, in the way in which the dieresis is already used preceding r. This gives rise to very precise spellings like ämbidëxtrôuS, hïppopötamuS and sêlëctïvity. The sounds of u require a more serious reorganization, due to the use of ü for both the {yø} and {yV} sounds. The table below shows how it could be done.
Sound | Ambiguous ABCD |
Unambiguous ABCD |
Ambiguous Example |
Unambiguous Example |
{ø} | u | u | campus | cämpus |
{u} | u | ü | cut | cüt |
{V} | Û | Ü | pÛsh | pÜsh |
{yø} | ü | û | accür@t(e) | äccûr@t(e) |
{yV} | ü | Û | refüGee | rëfÛGee |
{U}/{yU} | ~Ú | |Ù | d~Úty | d|Ùty |
{V}/{yV} | ~Ü | |Ü | d~Ürátion | d|Ürátion |
{ø}/{V} | Û | ~U | instrÛment | ïnstr~Ument |
{yø}/{yV} | ü | µ | monüment | mönµment |
One other ambiguity that must be resolved is between the unstressed {ør} and the stressed {&r}, which can both be spelled by er, ir or ur. An obvious fix here is to use eR, iR and uR for the stressed sound, leading to spellings such as fiRst, êmeRGency and muRder. (And also, Ôr should be changed to ÔR, for consistency, as in wÔRtþ.)
In some ways, the unambiguous system is a better arrangement, since ü is compatible with the other uses of dieresis, and the resemblance of the symbol | to the letter I may be mnemonic. Nevertheless, I think the number of diacritics required in the unambiguous system makes it inferior to the slightly simpler ambiguous one. Certainly, the ambiguity of ABCD is not an issue for my planned uses of it.
The same process that generates the ambiguous ABCD dictionary could equally well generate an unambiguous version. I am not at this time offering it for download, but if you have some use for it, please contact me (Alan at wyrdplay.org), and I'll be happy to provide a copy.
To comment on this page, e-mail
Alan at wyrdplay.org