This page is about ABCD, which stands for Alan's Basic Codes with Diacritics. I call ABCD a "notation" - it is easier to explain what it is not than to explain what it is, and why you might be interested in it. ABCD is not a spelling system: it is too complex and idiosyncratic for that. Neither is it a dictionary key: it is neither as accurate or as regular as a dictionary key. Essentially, ABCD is a notation which elucidates the relationship between a word's usual spelling and its pronunciation. It is suitable for use both with words that conform to common English spelling patterns, words like nasty, nice, terrible and benevolent, as well as with horridly exceptional words like women, colonel, boatswain and connoisseur.
ABCD is loosely based on my spelling
system DRE.
It makes use of an extensive number of
diacritics, organized much like the DRE set of diacritics.
ABCD uses both lower- and upper-case characters, but prefers to use
lower-case for the most common and familiar patterns, and upper-case
for less familiar ones. Further, ABCD's lower-case characters
always match the corresponding traditional spellings (possibly with the
addition of a diacritic), while upper-case characters may occasionally
differ from them. (For instance, the Z in ABCD represents
an s which is pronounced as z.) Each ABCD letter or digraph
respresents both a sound and an English spelling. For
instance, the digraphs sh,
ti and SH all represent the
same sound, but spelled as sh (as in shoe), ti (as
in nation)
and ch (as in machine)
respectively. In addition to lower- and upper-case alphabetics, ABCD
uses a few punctuation characters, mostly to note flaws in a word's
usual spelling, and also to separate word constituents. (The @ character is an
anomaly - it is treated as a special form of the letter a rather than
as punctuation.)
Like DRE, ABCD is ambiguous about certain aspects of pronunciation, though less so than DRE. An ABCD spelling does not usually indicate stress, and also does not distinguish between the schwa and the regular short vowel sounds. However, if you ignore these two areas, ABCD is quite precise. In fact, one characteristic of ABCD is that the ABCD spelling of a word is sufficient to represent both its traditional spelling (ignoring typographical issues like capitalization and hyphenation) and its pronunciation, subject to the two ambiguities of stress and schwa. (I have defined a less ambiguous form of ABCD, briefly discussed in an appendix, but the ambiguous version is easier to read, and I think more useful.) Furthermore, the ABCD representation of pronunciation and spelling is almost entirely context-free, which makes it easy to process mechanically by a computer program. The context dependent elements of ABCD are enumerated in this appendix.
Here are a few simple examples of ABCD
in action, to give you a better idea of how it works. The
list below is in the format "TS:
ABCD".
(Throughout this page, I use the convention of displaying
traditionally spelled words (TS) in italics, and ABCD spellings in
boldface.
Occasionally, my CAAPR
notation is also used - this is also
shown in bold, and enclosed in curly braces to identify it as CAAPR.)
abundant:
abundant
alienate: álîenáte
charisma: KHariZma
handsome: han(d)som(e)
awareness: aWâre+nêss
accordion: a^ccòrdîon
demoralization:
dem~Öral~Ízátion
laugh: l[au:~À][gh:f]
abundant is a word spelled entirely according to English patterns, and requiring no markings for vowel sounds. alienate also conforms to patterns, but requires some vowels be marked with diacritics to prevent misinterpretation. Note that no special marking is required for a final silent e following a long vowel. The word charisma also conforms to high-frequency patterns, but both the ch and the s need to be altered to avoid misunderstanding. (The spelling KH is used rather than CH, because CH is equally plausible as a spelling for the ch of machine.) handsome has two silent letters, and, in contrast to alienate, the e is marked as silent since the previous vowel is not long. Finally, the word awareness shows some ABCD techniques for resolving some of the subtler ambiguities of regular spelling. The W in awareness is capitalized to show that aw is not to be interpreted as a single vowel sound (as in law), while the + sign after aWâre shows that the first e is not pronounced as a short e or a schwa, but instead is silent, because it ends the root word aware.
Unlike the words above it, the word accordion does not conform to basic English patterns, because the double c follows a vowel representing a schwa. The ^ flags this situation. The word demoralization displays a different difficulty - the British and American pronunciations differ. The ~ flags a code which is interpreted differently for the two varieties of English. And finally, the word laugh is completely defiant of standard English patterns, and so the ABCD representation simply shows how the letter combinations map to sounds.
An ABCD dictionary is available for download here. It contains 27,000 English words, spelled in TS and ABCD. For most words, the spelling is the same for both American and British English; where they differ, the dictionary provides both of them, with the American spelling first. The whole point of ABCD is really the dictionary. It can be used as an educational tool, for increasing one's understanding of the patterns of English spelling, and the ways in which they break down. I also believe that it may be useful as the starting point for developing spelling systems which are very similar to existing spelling, by allowing easy identification of those words that fail to abide by whatever rules the designer feels are most important. One reason I developed ABCD was to help me develop a version of my system DRE which did not require the use of diacritics. I have not, at this time, actually succeeded in doing this, but there is no doubt that ABCD has made the process easier, and I consider it possible that the process might someday actually produce something satisfactory.
The ABCD dictionary is ultimately derived from the CAAPR dictionary; the pronunciations it uses are based on consensus of 2 American dictionaries, 2 British dictionaries and the Longman Pronunciation dictionary, which covers both varieties. See the CAAPR page for more information on this subject. The dictionary download above includes copies of both this page and the CAAPR definition for easy reference.
This version of ABCD and its dictionary
differ from previous versions in the use of the symbol É, (which
was originally part of ABCD, but was then unwisely removed), and by
removal of the exception for the S
symbol in the sequence ôuS.
Before attempting to describe the ABCD
notation in full detail, it will be useful to describe the way it
organizes its diacritical markings, which is based on the conventions
of my spelling system DRE. The organization is strictly
applied for letters in lower case; some flexibility is allowed for
upper-case letters to avoid running out.
Vowels without diacritics represent either the regular short sound of the vowel (as in pat, pet, pit, pot and putt), or the schwa. The digraph oo represents the vowel of took. An unmarked y is a rather special case, and may have either the vowel sound of misty, or the consonantal sound of yell. When followed by an r, some vowels may also be pronounced with a stressed er sound, as in fern, bird and burn.
Letters with an acute accent represent the normal long sound of the five English vowels, as in máte, méte, míte, móte and múte. The digraph oó represents the vowel of moót. An acute-accented ý, as in flý, represents the same sound as í.
Letters with a grave accent represent an alternate sound of the marked letter. These sounds are all long in length, and almost always spoken distinctly. These sounds are especially common in words of European origin. Mnemonic words are dràma, sÈànce, marÌne, bòre (as well as dòg, in American English) and crùde.
Letters with a circumflex represent an alternate sound of the marked letter. These sounds are shorter than the sounds associated with the grave accent. They may be reduced to a schwa, and may also have a slightly different meaning preceding an r than in other positions. The sounds of the circumflexed vowels when no r follows are those of vidÊo, audîó, Ôther and pÛsh. (DRE also spells âny and prêtty, but these particular forms are not used in ABCD.) Before the letter r, the circumflexed vowels represent the sounds of câre, hÊre and wÔrd. They are also used in the standard suffixes -âlly, -lêss, -nêss and -fûlly, to indicate an indistinct sound despite the following double letter. Note that ABCD, unlike DRE, only uses a lower-case ê for an unstressed sound: ênáble is spelled with one in ABCD, but prÏtty is not.
Letters with a dieresis represent the same sound as the unmarked letter, and are often used where the unmarked letter would have a different interpretation. Examples are päradox, wickËd, fúËl and sörry. The ü with a dieresis has a special meaning. It represents either the unstressed sound of Û or the schwa, preceded by a y, as in regülar or mercüry. In ABCD, ë and ï are also used to indicate the normal short sound of the vowel before an r, as in chërish and spïrit. A ÿ with a dieresis may be used in ABCD to indicate a y which is always pronounced as a vowel, as in lobbŸist, where, because the y is followed by a vowel, one might otherwise assume the consonantal sound is intended.
DRE and ABCD both utilize a number of digraphs in which one of the vowels is marked with a diacritic. In ABCD, except for a few exceptions (oó, éu, éw and combinations like íË containing an Ë), the rule for interpretation of such combinations is simple - the sound is that of the marked letter, and the unmarked letter is ignored. Example words include hEÂd, thÈY, dÍE, dÔUble, nervôuS and cúe. A certain number of unmarked digraphs are used also, and they generally have the meaning you would expect. These are ai, au, aw, ay, ea, ee, eu, ew, oa, oe, oi, oo, ou, ow and oy. Note that éu and éw are exceptions to the rule for interpreting digraphs above. eu and ew are pronounced like ùe (sleutþ, brew), and éu and éw like úe (as in éuró and féw). These two combinations break the rules because of the lack of an accented w in many fonts and on most keyboards.
One of the all-too-common features of English spelling is the use of silent letters. ABCD encloses silent letters in parentheses, as in (k)nífe, í(s)land and ballÈ(t). There are a few letters and combinations, notably e, gh and l, whose treatment is more complicated when silent. See their descriptions below for more information.
One might well ask of ABCD: is it oriented towards American or British English? The answer is that it is equally oriented towards both. It may be used to spell words from either regional variety. In most cases, the spelling is independent of the variety. This may happen in any of three ways. Many words are pronounced the same in both varieties, such as cat, cloudy and demonstration. Other words are pronounced differently, but with pronunciations that are related to each other according to well-defined rules, allowing a single spelling to be used for both. Examples of such words are pot, stairs and curious. A third case is that of words which have related pronunciations in American and British English, but where the relationship is not reliable for similar words. For instance, the American pronunciation of sample would be written sample in ABCD and the British pronunciation as sàmple, but the similar word ample would be spelled ample for both varieties. ABCD uses the character ~ to indicate a pronunciation which commonly differs between American and British English. For instance, sample is spelled s~Àmple in ABCD. Words such as clerk and neither with unusual differences between American and British pronunciation must have two ABCD spellings, one for each variety.
You may also be wondering what the
distinction is between the upper- and lower-case ABCD
symbols. Before a lower-case symbol could be used, there were
two
prerequisites. The first was simply that the base
character for a lower-case symbol had to be the character used in
regular spelling. ABCD uses the symbol Ù for the
letter
o when pronounced as long oo, as in move. A
lower-case symbol could not be used unless I were willing to use a form
of the letter o for
it. The other requirement was that I would use a lower-case
symbol only when it was pretty clear how you would spell the
sound in a rational spelling system. For
instance, spelling the vowel of plain
as ai is
very
reasonable, and so
lower-case could be used. But spelling the second vowel
of machine
with the letter i
is at least dubious, and so the word is denoted
maSHÌne
rather than maSHìne.
The capital letter
emphasizes that
there's "something funny" going on here.
I think the best approach to describing the details of ABCD is a semi-formal one. So let me start off with a description of how the ABCD spelling of a word is determined. The process starts off with a decomposition of a word into pairs. The first element of each pair is one or more letters from the spelling, and the second is from the CAAPR representation of the pronunciation (see Endnote 1). (CAAPR is described here. Note that the remainder of this page assumes familiarity with CAAPR - so if it is new to you, you may want to keep the CAAPR writeup open for reference.)
As an example, the word charisma is
originally decomposed as:
[ch:k][ar:ør][i:i][s:z][m:m][a:ø]
The process of deriving the ABCD spelling then proceeds in three steps:
High frequency pairs are replaced by ABCD symbols or symbol combinations. (It seems remarkable that there are few enough of these pairs that one can find readable representations of all of them.)
Certain symbols may be modified or added based on special circumstances of individual words. This is done either to avoid ambiguity (e.g., to distinguish the th of worth from that of porthole) or to note unexpected violations of English patterns (like the double t in attend or the s at the end of the non-plural atlas).
Any remaining pairs have the second element modified to contain an ABCD code rather than a CAAPR code, except that the CAAPR symbols {ø} and {&}, which do not have an unambiguous ABCD representation, are retained.
Step 1, and aspects of step 2, can be summarized easily by simply listing the pairs to which they apply, and how they are represented (which I will do below). But some additional notations are more conveniently described here:
In a number of cases, pairs at the end of a word are handled differently from the same pair within a word. This is especially true for the silent e, and the letter s when used to indicate a plural or possessive. Because of English's fondness for compound and derived words, these letters can sometimes occur within a word with the end-of-word interpretation. In ABCD, a plus sign is used to indicate the end of a word within a word. Examples are scâre+crÓW and státe+ment. The plus sign is also used to separate double letters when both are sounded, as in un+nótiCed or mis+státe.
Silent letters are enclosed in parentheses, as noted above. (Other notations are sometimes used for silent e, l and gh, as described below.)
The symbol ~ always indicates that what follows is pronounced differently in British and American English. Individual letter combinations beginning with ~ are discussed below, together with the notations with no dependence on English variety.
One property of ABCD is that it is very easily parsed by software - while some letter combinations, such as ch, have meanings distinct from those of their components, there is never (so far as I can determine) any ambiguity in how a word is divided into meaningful units. I note that this property is preserved even if all the ~'s are removed. Which is to say, the ~'s are there to assist the human reader, but are unnecessary for accurate algorithmic decomposition.
Having said all that, I am now ready to run down the alphabet, and produce a complete list of the ABCD notations. Though the list is quite long and detailed, it is highly structured and organized, notably by the diacritical conventions given above, and for that reason is not hard to grasp and master. For symbols beginning with a ~, the Denotes column of the tables gives both the American and the British meaning for the symbol: in a/à, the a is the American form, and the à the British form.
a -
| Symbol | Denotes | Example | ABCD Example |
| a | [a:a]
or [a:ø] |
cat about |
cat about |
| á | [a:E] | late | láte |
| à | [a:A] | father | fàther |
| â | -âlly | locally | lócâlly |
| @ | [a:i] | message | mess@G(e) |
| ai | [ai:E] | rain | rain |
| air | [air:ër] | fair | fair |
| ar | [ar:ør] | awkward | awkward |
| âr | [ar:ër] | care | câre |
| är | [ar:ar] | paradox | päradox |
| ärr | [arr:ar] | arrow | ärrÓW |
| au | [au:Ø] | pause | pauZe |
| aw | [aw:Ø] | claw | claw |
| ay | [ay:E] | play | play |
| Å | [a:Ø] | water | wÅter |
| AÉ | [ae:I] | algae | alGAÉ |
| ~À | a/à | bath | b~Àtþ |
| ~Âr | âr/[ar:ør] | secretary | secrêt~Âry |
See below for [a:o], as in watch (ABCD wOtch).
b -
| Symbol | Denotes | Example | ABCD Example |
| b | [b:b] | big | big |
| bb | [bb:b] | rubble | rubble |
| Symbol | Denotes | Example | ABCD Example |
| c (Note 1) | [c:k]
or [c:s] |
cat city |
cat city |
| cc (Note 1) | [cc:k] | accord | accòrd |
| ck | [ck:k] | luck | luck |
| cqu | [cqu:kw] | acquit | a^cquit |
| cQ | [cqu:k] | lacquer | lacQer |
| ch | [ch:C] | chill | chill |
| ci (Note 2) | [ci:X] | vicious | viciôuS |
| Ce, C(e) (Note 3) |
[ce:s] | advance furnace |
advanCe furn@C(e) |
Notes:
c denotes [c:s] if followed, in the traditional spelling, by e, i or y, and otherwise [c:k]. The few words which do not conform to this pattern must be spelled in ABCD with an explicit [c:k] or [c:s], as in [c:k]eltic or fa[c:s]àd(e). cc denotes [cc:k] unless followed by e, i or y. When it is followed by e, i or y, the pronunciation is ks - this is regarded as 2 c's in succesion, rather than a single occurrence of cc.
ci denotes [ci:X] only when followed by a vowel. Otherwise, the c and the i are distinct symbols.
Ce and C(e) represent [c:s] followed by a silent e, in situations where the silent e is not a magic e, as in advanCe and furnaC(e). In the case of furnace, the e is misleading about the preceding vowel, and so is parenthesized. In the case of advance, the previous vowel is too distant in the word to be affected by the e, which serves the useful purpose of defining the pronunciation of the preceding c.
See below for [ch:k], as
in chrome
(ABCD KHróme),
and
for [ch:X],
as in machine
(ABCD maSHÌne).
Also see n
below
for information on the combinations ñc
and
ñKH
as in uncle
and anchor.
| Symbol | Denotes | Example | ABCD Example |
| d | [d:d]
or [d:þ] |
dog wanted |
d~Ög wOntêd |
| dd | [dd:d] | add | add |
| dG (see G) | [dg:j] | judge | judG(e) |
| dJ (see J) | [d:j] | procedure | procédJur(e) |
| ed (Note 1) | [ed:þ] | missed | missed |
At the end of a word, ed represents [ed:þ], that is, a past tense in which the e is silent, and in which the d is pronounced either as t or d, depending on the previous letter. There are some exceptional words ending with -ed in which the e is surprisingly not silent, such as beloved and wicked - these words are spelled with Ëd in ABCD to prevent ambiguity.
Note that words like hunted and raided are regular, represented by [e:i][d:þ], and unambiguously spelled with -êd in ABCD. Also note that the Ëd spelling in unnecessary in one-syllable words, and so bed is bed and not bËd in ABCD.
e -
| Symbol | Denotes | Example | ABCD Example |
| e | [e:e]
or [e:ø] |
ten rivet |
ten rivet |
| e (Note 1) | [e:-] | late | láte |
| é | [e:I] | medium | médîum |
| ê (Note 2) | [e:i]
or -lêss, -nêss |
enable erupt lifeless fitness |
ênáble êrupt lífe+lêss fitnêss |
| ea | [ea:I] | feast | feast |
| ear | [ear:ïr] | fear | fear |
| ed (see d) | [ed:þ] | missed | missed |
| ee | [ee:I] | feet | feet |
| eer | [eer:ïr] | beer | beer |
| er | [er:ør]
or [er:&r] |
river revert |
river rêvert |
| es (see s) | [es:$] | miles | míles |
| eu | [eu:U] | sleuth | sleutþ |
| eur (Endnote 2) |
[eur:Ür] | pleurisy | pleurisy |
| éu | [eu:yU] | feud | féud |
| éur (Endnote 2) |
[eur:yÜr] | Europe | éurop(e) |
| ew | [ew:U] | drew | drew |
| éw | [ew:yU] | few | féw |
| É (Note 3) | [e:I] | me crises museum |
mÉ crísÉs múZÉum |
| È | [e:E] | ballet cafe |
ballÈ(t) cafÈ |
| Èe | [ee:E] | matinee |
matinÈe |
| Ê | [e:ý] | apostrophe video |
apostroPHÊ vidÊó |
| Ë (Note 4) | [e:e]
or [e:i] or [e:ø] |
wicked duet diet |
wickËd d~ÚËt díËt |
| EÀr | [ear:àr] | heart | hEÀrt |
| EÂ | [ea:e] | head measure |
hEÂd mEÂZJur(e) |
| EÂr | [ear:ër] | bear | bEÂr |
| ËA (Note 5) |
[ea:ï] | idea (Brit) | ídËA |
| ÉI | [ei:I] | seize | sÉIze |
| ÉIr | [eir:ïr] | weird | wÉIrd |
| ÈI | [ei:E] | reign | rÈI(g)n |
| ÈIr | [ei:ër] | their | thÈIr |
| ER | [ear:&r] | earth | ERtþ |
| Êr | [er:ïr] | here | hÊre |
| Ër (Note 6) | [er:ør] | supplier | su^pplíËr |
| ÈY | [ey:E] | survey | survÈY |
| ÊY | [ey:ý] | money | mÔnÊY |
| ~Er (Note 7) |
ër/[er:ør] | cemetery | cemet~Ery |
| ~ÉU (Endnote 2) |
eu/éu | neutral | n~ÉUtral |
| ~ÉUr | eur/éur | neurotic | n~ÉUrotic |
| ~ÉW | ew/éw | news | n~ÉWZ |
Notes:
The handling of silent e in ABCD is complicated. There are two functions that silent e commonly performs. It indicates that the previous vowel sound is long, in which case the e is commonly called magic. Alternately, in many words, such as mice, savage and tense, it changes the sound of the previous consonant. (Note that without the final e, tens would be a plural, and the s would be pronounced as z.) When both functions are taken into account, we can classify words ending with a silent e into 4 categories. We say a final e is magic if the previous vowel (separated from the e by a single consonant sound) is long. (If the consonant is an r, the sounds of â, Ê and ò are also treated as long.) We say a final e is misleading if there is a vowel preceding it which ought to be long, but is not. In vice, the e is magic, but in service, it is misleading. In words in which a final e is not magic, we call it useful if it is preceded by c, g or s, and otherwise useless. An e can be both useful and misleading, as in garbage, and both useless and misleading, as in festive.
When a silent e occurs at the end
of a
word, it is enclosed in parentheses if it is
misleading
or
if it is useless. Also,
when a useful (but not magic) e follows the letter c or s, ABCD
capitalizes the consonant to show what the e is accomplishing.
Some example words are míne,
pláce,
festiv(e),
sav@G(e)
and tenSe.
When a magic e
occurs within a word and is not parenthesized, it is followed by a +,
usually indicating the end of an internal word, as in bâre+ly, lífe+boat,
or minCe+meat.
ê is used only when [e:i] is unstressed. Ï is used instead when stressed, as in Ïñglish.
É is used only when [e:I] appears where a silent e might be expected, at the end of a word (bÉ) or before s (parentþesÉs). Note that É is used even in words with no other vowels, such as be, even though it would be impossible for the e to be silent. É is also used in words like museum, where use of the usual é would seem to be part of the éu digraph.
Ë is used for the regular sound of e when a bare e would be misinterpreted, such as wicked, which looks like a past tense, and duet, where d~Úet would appear to be a one-syllable word whose vowel is ~Úe.
The sound of ËA is an RP diphthong represented in SAMPA as /I@/, which usually occurs before r in words like pier.
Ër is used like Ë, to prevent ambiguity, as in flýËr, where a bare e would be treated as part of the composite vowel symbol ýe.
Note that the distinction between ~Âr and ~Er is only orthographic - both are pronounced the same in either variety of English.
See below for [le:øL], as in double (ABCD dÔUble).
f -
| Symbol | Denotes | Example | ABCD Example |
| f | [f:f] | free | free |
| ff | [ff:f] | stuff | stuff |
g -
| Symbol | Denotes | Example | ABCD Example |
| g | [g:g] | good | good |
| gg | [gg:g] | egg | egg |
| G (Notes 1, 2) | [g:j] | germ | Germ |
| GH | [gh:-] | high taught |
híGH tauGHt |
| GJ | [g:J] | mirage genre |
miràGJ(e) GJ[e:o]nrË |
Notes:
Note that the spelling G is used even if the letter following g is unusual, as in margarine (American ABCD màrGarin(e)).
The combination dG, as in edge (ABCD edG(e)), is treated as a double letter.
h -
| Symbol | Denotes | Example | ABCD Example |
| h | [h:h] | hot | hot |
| H (Note 1) | [h:h] | mishap | misHap |
Notes:
Because the letter h is used in a number of digraphs, it is frequently ambiguous when it follows a consonant, as in the words porthole, mishap and rawhide. ABCD uses a capital H for [h:h] if confusion might be possible, as in pòrtHóle, misHap and rawHíde.
i -
| Symbol | Denotes | Example | ABCD Example |
| i | [i:i]
or [i:ø] |
pig devil |
pig devil |
| í | [i:Y] | item | ítem |
| î (Endnote 3) | [i:ý]
or [i:ÿ] |
radio | rádîó |
| ir | [ir:ør]
or [ir:&r] |
direct bird |
direct bird |
| ïr | [ir:ir] | miracle | mïr@cle |
| ïrr | [irr:ir] | mirror | mïrror |
| Ì | [i:I] | marine | marÌne |
| Ï (Note 1) | [e:i] | pretty | prÏtty |
| IÉ | [ie:I] | brief | brIÉf |
| IÉr | [ier:ïr] | pier | pIÉr |
| ÍE | [ie:Y] | pie | pÍE |
| IÊ | [ie:ý] | cookie | cookIÊ |
| ~Í (Note 2) | i/í | missile civilization |
miss~Íle civil~Ízátion |
Notes:
Ï is used for [e:i] only when stressed; when unstressed, ê is used.
Note that the ending e in miss~Ìle is not parenthesized - it is misleading in American English, but magic in British English.
The letter i also occurs in the combinations ci, si, sci, ssi, ti, and Zi, where it has no sound of its own, but modifies the sound of the preceding consonant.
See below for [i:y], as in billion (ABCD billYon).
j -| Symbol | Denotes | Example | ABCD Example |
| j | [j:j] | jam | jam |
| jj | [jj:j] | hajj | hajj |
| J (Note 1) | see note | capture | captJur(e) |
Notes:
The capital J is inserted as a sign of palatalization in the combinations dJ (in procedure), sJ (in insure), ssJ (in pressure), tJ (in capture and question), and ZJ (in measure). More precisely, it is used in representing the pairs [d:j] (dJ), [s:X] (sJ), [ss:X] (ssJ), [t:C] and [ti:C] (tJ) and [s:J] (ZJ). The symbol J also appears in the combination GJ, described under g.
(Note that there is no ambiguity between the t and ti spellings corresponding to tJ - an i was present in the original spelling exactly if the letter after the J is not a u.)
| Symbol | Denotes | Example | ABCD Example |
| k | [k:k] | skin | skin |
| KH | [ch:k] | school | sKHoól |
The combination ck is treated as a double k - see c above.
See n below for information on the combination ñk.
l -
| Symbol | Denotes | Example | ABCD Example |
| l | [l:L] | leg | leg |
| ll | [ll:L] | pill | pill |
| le | [le:øL] | purple | purple |
| L (Note 1) | [l:-] | calm | càLm |
Notes:
L represents a silent l following the letter a, as in talk, salmon and calm. This has a special representation for no reason other than that it is surprisingly frequent.
m -
| Symbol | Denotes | Exmaple | ABCD Example |
| m | [m:m] | mud | mud |
| m | [m:øm] | spasm | spaZm |
| mm | [mm:m] | hammer | hammer |
n -
| Symbol | Denotes | Example | ABCD Example |
| n | [n:n] | nice | níce |
| n | [n:øn] | didn't | didnt |
| nn | [nn:n] | sunny | sunny |
| ng | [ng:G] | song | s~Öng |
| ñ (Note 1) | [n:G] | finger sink |
fiñger siñk |
| N (Note 2) | [n:n] | ungrateful | uNgráte+ful |
Notes:
ñ can be used before any of the various symbols representing or starting with the k sound, as in uñcle, añKHor, bañquet, coñQer and jiñx.
N represents [n:n] when the regular n sound is followed by g, as in ungrateful. N is not needed preceding k sounds - unclean is simply spelled unclean in ABCD.
o -
| Symbol | Denotes | Example | ABCD Example |
| o | [o:o]
or [o:ø] |
pot lemon |
pot lemon |
| ó | [o:O] | zero | zéró |
| ò | [o:Ø] | coral sloth (Amer) |
còral slòtþ |
| oa | [oa:O] | boat | boat |
| oar | [oar:Ør] | boar | boar |
| oe | [oe:O] | toe | toe |
| oer | [oer:Ør] | Boer | boer |
| oi | [oi:Q] | boil | boil |
| oo | [oo:V] | book | book |
| oó | [oo:U] | boot | boót |
| oór (Endnote 2) |
[oor:Ür] | poor | poór |
| or | [or:ør] | motor decorate |
mótor decoráte |
| ör | [or:or] | laboratory (Brit) |
laböratory |
| örr | [orr:or] | sorry | sörry |
| ou | [ou:W] | house | house |
| ôu | -ôuS | vicious | viciôuS |
| ow | [ow:W] | allow | a^llow |
| oy | [oy:Q] | boy | boy |
| O | [a:o] | squash | squOsh |
| Ô | [o:u] | mother | mÔther |
| OR (Note 1) | [our:ør] | favour | fávOR |
| Ôr | [or:&r] | word | wÔrd |
| OÙ | [ou:U] | soup | sOÙp |
| OÙr (Endnote 2) |
[our:Ür] | tour | tOÙr |
| ÒUr | [our:Ør] | court | cÒUrt |
| ÔU | [ou:u] | trouble | trÔUble |
| ÓW | [ow:O] | blow | blÓW |
| ~Ö | ò/ö | cross forest |
cr~Öss f~Örêst |
| ~Òr | òr/[or:ør] | category | catêg~Òry |
Notes:
| Symbol | Denotes | Example | ABCD Example |
| p | [p:p] | pink | piñk |
| pp | [p:pp] | happy | happy |
| PH | [ph:f] | photo | PHótó |
q -
| Symbol | Denotes | Example | ABCD Example |
| qu | [qu:kw] | queen | queen |
| Q | [qu:k] | unique | únÌQe |
See n above for the combinations ñqu and ñQ, as in bañquêt and coñQer.
r -
| Symbol | Denotes | Example | ABCD Example |
| r (Note 1) | [r:r] | red | red |
Notes:
s -
| Symbol | Denotes | Example | ABCD Example |
| s (Note 1) | [s:s]
or [s:$] |
sad cries |
sad crÍEs |
| ss | [ss:s] | guess | g(u)ess |
| sc, sC (Note 2) | [sc:s] | scent acquiesce |
scent acquîesC(e) |
| sci (Note 3) | [sci:X] | luscious | lusciôuS |
| sh | [sh:X] | ship | ship |
| si (Notes 3, 4) | [si:X] | mansion | mansion |
| sJ (see J) | [s:X] | insure | insJùre |
| ssi | [ssi:X] | mission | mission |
| ssJ (see J) | [ss:X] | pressure | pressJur(e) |
| S (Note 5) | [s:s] | atlas cactus tense |
atlaS cactuS tenSe |
| SH | [ch:X] | machine | maSHÌne |
Notes:
At the end of a word (or before a +) s is assumed to indicate a plural, in which case, depending on the preceding sound, it may be pronounced as z. The plural s often follows a silent e - however, in contrast to the past tense, where the d is always preceded by e, a silent e in the plural generally implies its presence in the singular as well.
sc denotes [sc:s] preceding e, i or y. In any other position, it is simply the juxtaposition of the regular s and c (pronounced as k) symbols. The C may be capitalized to indicate a following non-magic e.
si, sci, ssi and ti have the sound of {X} only when followed by a vowel. Otherwise, the i is a separate symbol.
When si or ti follows n, there are two common pronunciations: nch and nsh. The CAAPR dictionary, from which the ABCD dictionary is derived, uses nsh as the recognized pronunciation, which is more in line with the pronunciation of si and ti in other positions.
S represents [s:s] at the end of a word, where it might be mistaken for a plural. S is also used before a silent e, where the e prevents the word from being interpreted as a plural. See e note 1 above for more details.
See z below for [s:z] (except in plurals) as in hose (ABCD hóZe).
t -
| Symbol | Denotes | Example | ABCD Example |
| t | [t:t] | top | top |
| tt | [tt:t] | kitten | kitten |
| th | [th:D] | that leather |
that lEÂther |
| tþ | [th:T] | think truth |
tþiñk trùtþ |
| ti (see s Notes 3, 4) |
[ti:X] | vocation | vócátion |
| tJ (see J) | [t:C]
or [ti:C] |
capture question |
captJur(e) questJon |
u -
| Symbol | Denotes | Example | ABCD Example |
| u | [u:u]
or [u:ø] |
sun circus |
sun circus |
| ú (Note 1) | [u:yU]
or [u:yV] |
puny annual |
púny annúal |
| ù (Note 1) | [u:U]
or [u:V] |
lunar gradual |
lùnar gradJùal |
| û | -fûlly | awfully | awfûlly |
| ü | [u:yV]
or [u:yø] |
regular | regülar |
| úe | [ue:yU] | cue | cúe |
| úer (Endnote 2) |
[uer:yÜr] | puerile (Brit) | púer~Íle |
| ùe | [ue:U] | true | trùe |
| ur | [ur:ør]
or [ur:&r] |
Arthur creature burn |
àrtþur creatJur(e) burn |
| urr (Note 2) | [urr:&r]/ [urr:ur] |
hurry | hurry |
| úr (Endnote 2) |
[ur:yÜr] | purity | púrity |
| ùr (Endnote 2) |
[ur:Ür] | plural | plùral |
| ür | [ur:yVr]
or [ur:yør] |
accurate | accür@t(e) |
| Ù | [o:U] | move | mÙve |
| Û | [u:V]
or [u:ø] |
push prejudice |
pÛsh prejÛdiC(e) |
| ~Ú | ù/ú | student | st~Údent |
| ~Úe | ùe/úe | Tuesday | t~ÚeZd[ay:y] |
| ~Úr (Endnote 2) |
ùr/úr | durable | d~Úrable |
| ~Ü | Û/yÛ | insulation | ins~Ülátion |
Notes:
The symbols ú and ù ordinarily represent the long vowel /u:/, but they represent /u/ (which is rendered in CAAPR as {V}) before a vowel.
urr is the only instance of an ABCD notation without a ~ which is interpreted differently for American and British English, but this seems reasonable, since TS exhibits this variance itself.
| Symbol | Denotes | Example | ABCD Example |
| v | [v:v] | very | vëry |
| vv | [vv:v] | savvy | savvy |
w -
| Symbol | Denotes | Example | ABCD Example |
| w | [w:w] | way | way |
| wh | [wh:µ] | which | which |
| W (Note 1) | [w:w] | away | aWay |
| Wh (Note 1) | [wh:µ] | awhile | aWhíle |
Notes:
x -
| Symbol | Denotes | Example | ABCD Example |
| x | [x:ks] | fix | fix |
| xc (Note 1) | [xc:ks] | except | êxcept |
| X | [x:gz] | exist | êXist |
Notes:
xc stands for [xc:ks] only preceding e, i or y. Otherwise, it is simply an x followed by a c, as in excavate.
See n above for information on the combination ñx, as in jiñx.
y -
| Symbol | Denotes | Example | ABCD Example |
| y (Note 1) | [y:y]
or [y:ÿ] |
yes Tokyo |
yeS tókyó |
| y (Note 1) | [y:ý] | happy everything |
happy ev(e)rytþing |
| ý | [y:Y] | fly qualify |
flý quOlifý |
| ýe | [ye:Y] | dye | dýe |
| ÿ | [y:i] | myth | mÿtþ |
| Y | [i:y] | million | millYon |
| Ÿ (Note 1) | [y:ý] | lobbyist | lobbŸist |
Notes:
The ABCD symbol y may indicate
either a consonant or vowel sound. As a consonant, it denotes [y:y].
As a vowel, it denotes [y:ý].
The vowel sound occurs at the end of a word or before a
consonant, and the consonantal sound occurs at the beginning
of a word. Before a vowel, either sound may occur. Usually,
when y is
found after a consonant and before a vowel, the
corresponding pair is [y:ÿ],
indicating that both the consonant and the vowel pronunciation are
possible. In this position, a
consonantal pronunciation is assumed - if only a vowel
pronunciation is used, then the spelling should be Ÿ.
See Endnote 3
for further discussion of the ambiguous
letter y
and its sounds.
z -
| Symbol | Denotes | Example | ABCD Example |
| z | [z:z] | zoo | zoó |
| zz | [zz:z] | buzz | buzz |
| Z | [s:z] | hose | hóZe |
| Zi (Note 1) | [si:J] | vision | viZion |
| ZJ (see J) | [s:J] | measure | mEÂZJur(e) |
Notes:
Unusual sounds -
As noted, the ABCD spelling notation provides unique codes for high-frequency spelling patterns. Of course, as we all know, English is afflicted with a sizable number of words that break these patterns. ABCD handles these words by means of bracketed symbol pairs, for instance, [eau:éw] in beautiful. The eau is the letter sequence in the usual spelling, and the éw defines the sound (but not the spelling). Obviously, this representation is not unique: [eau:ú] or [eau:yoó] could have been written instead.
Almost all sounds of English have at least one high-frequency spelling, and so there is at least one ABCD spelling that can be used in such pairs for those sounds. But a few sounds, mostly from words of foreign origin, are so low-frequency that there is no standard ABCD notation for them. An example is the final sound of the word loch, when pronounced in the authentic Scottish way. ABCD therefore must assign representations to these sounds, so that these words can be rendered sensibly in ABCD. For instance, the /x/ sound of loch is given the ABCD spelling of QH, and so the word is spelled lo[ch:QH] in ABCD.
This table catalogs the representations of unusual sounds (and one uncommon American/British difference):
| Symbol | Denotes (SAMPA) |
Example | ABCD Example |
| ã | /A~/ | melange | mÈl[an:ã]GJ(e) |
| õ | /O~/ | concierge | c[on:õ]cî[er:air]GJe |
| QH | /x/ | loch | lo[ch:QH] |
| UH | /V~/ | uh-huh | UHhUH |
| & (Note 1) |
/3/ | masseuse (Brit) |
mass[eu:&]Z(e) |
| ~OOr (Note 2) |
oòr/oor | courier | c[our:~OOr]îer |
Notes:
Completely pure CAAPR is not used
here. Certain simplifications have been introduced to remove
distinctions not relevant to this project. In particular,
The indistinct i, CAAPR {ê}, is treated as identical to the short i ({i}).
The CAAPR symbol {°} is treated the same as {ø}, and the symbols {î}, {3}, {¹} and {³} are treated as synonymous with {ê}, and therefore with {i}.
The symbol {ß} is treated as identical to {r}, and {R} as identical to {ør}.
The {*} symbol is removed.
Also, some aspects of ABCD depend on stress. Sometimes, when
stress differs between British and American English, it will happen
that the ABCD spelling is based on a compromise between the
two. A
good example is the word electronic.
The American
CAAPR for this word is {iLe·ktro'nik},
while the British CAAPR
is {i·Lektro'nik}.
The conversion to ABCD is done on
the composite form {i·Le·ktro'nik},
leading to the ABCD spelling Ïlectronic,
which does not
accurately reflect the American pronunciation. I have edited
the
ABCD dictionary to correct this particular instance, but it is likely
that other examples of the same problem still exist.
ABCD utilizes a number of spellings that
imply the equivalence of a short sound followed by an r to a related
long sound followed by r. Examples are the spellings air, eer
and oar,
which logically ought to be pronounced as ár,
ér
and ór,
but are actually pronounced as
âr,
Êr
and òr
respectively. This implied equivalence is also reflected in
the
common use of the magic e in words like care, sphere and sore.
The most difficult case has to do with the vowels represented in CAAPR
as {Vr} and {Ür}.
In American English, both {Vr}
and
{Ür}
symbolize the same sound, represented in SAMPA as /Ur/,
while for British English {Ür}
represents the diphthong
/U@(r)/. I note that {Ür}
is quite common in RP, while {Vr}
occurs in only a few words, notably guru and
courier. It
turns out to be extraordinarily convenient
to represent {yÜr}/{Ür} by the
long vowel symbols úr
and
ùr,
as in cúre
and
plùral.
Furthermore, though
American and British dictionaries quite consistently show this sound as
{Vr}, most
of
the participants in the Saundspel group feel that {Ur} (Sampa /u:r/) is
more
accurate. For these reasons, {Ür} is
consistently shown with a long vowel. For instance,
poór
is used rather than
poor.
However, when the sound is
understood as {Vr}
in British English, it is represented as a short sound
there. The word guru
is spelled
g[ur:~OOr]ù
in ABCD, representing
gùrù
in American English,
but gÛrù
in British English.
CAAPR utilizes the symbol {y} for the consonant sound of the letter y (as in young), and {ý} for the vowel sound (as in happy). But there is a third possibility, a quite common one, represented by {ÿ}. {ÿ} represents a sound that can be either {y} or {ý}, varying by speaker. Most words like champion and warrior, in which i is followed by an unstressed vowel, are of this sort. Some words in which y is followed by a vowel, such as Tokyo and Libyan, are also of this sort. The ABCD approach for dealing with words containing this ambiguity is to spell them with the existing letter. Thus, champion is spelled champîon, implying a vowel sound, even though the consonant sound is no doubt more common, and similarly, the spelling libyan is used, implying a consonant sound for the y, even though the word is probably more commonly pronounced with a vowel there. The symbols Y and Ÿ can be used for words like spanYard and lobbŸist, where the pronunciation is unequivocally different from what one might expect.
ABCD represents pronunciation and traditional spelling in an almost context-free way, which is to say that the interpretation of its symbols usually does not depend on their context. For instance, the sequence SH always represents the sound of {X} and the spelling ch, regardless of where it occurs in a word, or what other symbols are adjacent. For a computer program to understand ABCD, it is mostly necessary simply to divide the text into symbols. Some letters are used in more than one symbol (for instance, the letter H occurs in the symbols H, GH, KH, PH, QH, SH and UH), but the rule is that each letter is contained in the longest possible symbol, so that SH will always represent SH, and never S followed by H.
There are, however, a small number of symbols whose interpretation is dependent on context. These context dependencies are found in regular English spelling, and the familiarity benefits of adopting them in ABCD more than offset the additional complexity of context dependence. The context-dependent elements of ABCD are of two sorts, positional and general. The positional elements are as follows:
The sequence le represents the sounds {øL} when preceded by a consonant at the end of a word, or before a +. Otherwise, it represents the regular sounds of l and e. The end-of-word interpretation also applies when le is followed by d (indicating a past tense) or s (indicating a plural) in the same positions. Examples: battle, trÔUbled.
The sequence ed represents the sound of either {d} or {t}, depending on the preceding sound, when at the end of a word or preceding a +. Anywhere else, it represents the regular sounds of e and d. Words like bed, which have no vowel preceding the ed, are an exceptional case, in which the ed is obviously not a past tense marker, and the non-end-of-word interpretation of ed applies. Examples: missed, filled.
The letter e (when not part of le as described above) is silent at the end of a word or before a +, and also before the letter s in these positions. Anywhere else, it is interpreted as a short e or a schwa. Note that some silent e's at the end of a word are represented instead by (e). This is a context dependency for generation of ABCD, but not for interpretation. Examples: shíne, fenCe, híde+out.
The letter m indicates {øm} after a consonant at the end of a word, possibly with a following s or ed. Otherwise, it is simply interpreted as {m}. Example: priZm.
The letter n indicates {øn} when preceded by a consonant and followed by t at the end of a word. Otherwise, it is simply interpreted as {n}. Example: didnt.
The letter s represents the sound of either {z} or {s}, depending on the preceding sound, when at the end of a word or preceding a +. Anwhere else, it represents the regular sound of s. Note that s's pronounced as {z} at the end of a word are represented instead by Z when the word is not plural, as with sÊrIÉZ (series). This is the only place in ABCD where word meaning intrudes on its definition, but it affects only the generation of ABCD, not its interpretation. Examples: cats, d~Ögs.
The symbol c is pronounced as {s} before any form of e, i or y, and as {k} otherwise. The same principle applies to symbols compounded from c, notably cc (either {ks} or {k}), sc (either {s} or {sk}) and xc (either {ks} or {ksk}). Examples: cent, coat, accent, account, scíËnce, screen, êxcíte, êxclaim.
The letter i appears in a number of symbols where, when followed by a vowel, the i is silent, and the sound of the previous letter or letters is changed. For instance, ci represents the sound {X} when followed by a vowel, and otherwise represents the regular sounds of c and i (which is to say {si} or {sø}). Similarly, the sequences si, sci, ssi and ti all represent {X} when followed by a vowel, and Zi represents {J}. Examples: dêficient, pension, lusciôuS, mission, initial, viZion.
The symbol tJ has context dependencies not for its pronunciation, which is always {C}, but for the corresponding spelling. If tJ is followed by a form of the letter o, the corresponding spelling is ti; otherwise the spelling is t. Examples: questJon, nátJur(e).
The symbol y may represent either the consonant {y}, the vowel {ý}, or the indeterminate hybrid {ÿ}. The rules are as follows: If the y is the first character of a word, or follows +, it represents {y}. If it is the last character of a word, or precedes +, it represents {ý}. Within a word, if it precedes a consonant, it represents {ý}. Otherwise, it represents either {y} or {ÿ}. That is, when y is followed by a vowel, the consonant pronunciation is always legitimate, and a vowel pronunciation may be valid as well. Examples: yes, happy, copycat, canyon, libyan. For further discussion of the handling of y, see Endnote 3.
As I mentioned earlier, ABCD is an ambiguous system. The five unmarked vowel letters, as well as ü and Û, may denote either the schwa or a short vowel. This ambiguity can be remedied without losing the readability of ABCD. I'm not sure this is a change for the good, as it requires many more diacritics, while the benefits are small unless one considers this distinction important even in an orthography intended to be very similar to TS. Nevertheless, here's how it is done.
The short vowel sounds of a, e, i and o are denoted by the vowel with a dieresis, in the way in which the dieresis is already used preceding r. This gives rise to very precise spellings like ämbidëxtrôuS, hïppopötamuS and sêlëctïvity. The sounds of u require a more serious reorganization, due to the use of ü for both the {yø} and {yV} sounds. The table below shows how it could be done.
| Sound | Ambiguous ABCD |
Unambiguous ABCD |
Ambiguous Example |
Unambiguous Example |
| {ø} | u | u | campus | cämpus |
| {u} | u | ü | cut | cüt |
| {V} | Û | Ü | pÛsh | pÜsh |
| {yø} | ü | û | accür@t(e) | äccûr@t(e) |
| {yV} | ü | Û | refüGee | rëfÛGee |
| {U}/{yU} | ~Ú | |Ù | d~Úty | d|Ùty |
| {V}/{yV} | ~Ü | |Ü | d~Ürátion | d|Ürátion |
| {ø}/{V} | Û | ~U | instrÛment | ïnstr~Ument |
| {yø}/{yV} | ü | µ | monüment | mönµment |
One other ambiguity that must be resolved is between the unstressed {ør} and the stressed {&r}, which can both be spelled by er, ir or ur. An obvious fix here is to use eR, iR and uR for the stressed sound, leading to spellings such as fiRst, êmeRGency and muRder. (And also, Ôr should be changed to ÔR, for consistency, as in wÔRtþ.)
In some ways, the unambiguous system is a better arrangement, since ü is compatible with the other uses of dieresis, and the resemblance of the symbol | to the letter I may be mnemonic. Nevertheless, I think the number of diacritics required in the unambiguous system makes it inferior to the slightly simpler ambiguous one. Certainly, the ambiguity of ABCD is not an issue for my planned uses of it.
The same process that generates the ambiguous ABCD dictionary could equally well generate an unambiguous version. I am not at this time offering it for download, but if you have some use for it, please contact me (Alan at wyrdplay.org), and I'll be happy to provide a copy.
To
comment on this page,
e-mail Alan at wyrdplay.org