KGRKJGETMRETU895U-589TY5MIGM5JGB5SDFESFREWTGR54TY
Server : Apache/2.4.62
System : FreeBSD fbsdweb2.web.rcn.net 14.1-RELEASE FreeBSD 14.1-RELEASE releng/14.1-n267679-10e31f0946d8 GENERIC amd64
User : www ( 80)
PHP Version : 8.3.8
Disable Function : NONE
Directory :  /domains/markrose/

Upload File :
current_dir [ Writeable ] document_root [ Writeable ]

 

Current File : /domains/markrose/spell.html
<html>

<head>
<title>Hou tu pranownse Inglish</title>
<style>
<!--
 /* Font Definitions */
@font-face
	{font-family:Times;}
@font-face
	{font-family:Courier;}
 /* Style Definitions */
h1
	{margin:0in;
	margin-bottom:10.0pt;
	page-break-after:avoid;
	font-size:16.0pt;
	color:#000080;
	font-family:"Times New Roman";}
h2
	{margin:0in;
	margin-top:12.0pt;
	margin-bottom:10.0pt;
	page-break-after:avoid;
	color:#000080;
	font-size:12.0pt;}
h3
	{margin-top:12.0pt;
	margin-right:0in;
	margin-bottom:10.0pt;
	margin-left:0in;
	page-break-after:avoid;
	color:#000080;
	font-size:11.0pt;
	font-family:"Times New Roman";
	font-style:italic;}
h4
	{margin:0in;
	margin-bottom:10.0pt;
	page-break-after:avoid;
	mso-outline-level:4;
	font-size:10.0pt;
	font-family:"Times New Roman";
	font-weight:normal;
	font-style:italic;}
.RuleNo
	{font-size:12.0pt;
	font-family:Times;
	color:red;
	font-weight:bold;}
ol
	{margin-bottom:0in;}
ul
	{margin-bottom:0in;}
cite
	{color:teal;
	font-style:italic;}
tt
	{color:blue;
	font-family:"Courier New";}
-->
</style>
</head>

<body lang=EN-US link=blue vlink=purple style='tab-interval:.5in'>

<img src="process.gif">

<h1>Hou tu pranownse Inglish</h1>

&copy; 2000 by Mark Rosenfelder

<hr>

<font size=+1>Everybody
agrees that <b>English spelling is horrible</b>.</font>

<p>There have been almost as many proposals for <b>spelling reform</b> as there are rewrites
of Esperanto. (Tellingly, there has
been precisely one success in each category-- Noah Webster and Ido-- and neither
caught on universally.) Most of these proposals spend their energy<b> fixing
what isn't broken</b>. For instance,
they search hard for clever new ways of spelling the <b>ch</b> sound-- even though <cite>ch</cite> does the job just fine in hundreds of
languages. Or, they insist on 'correcting' the Great Vowel Shift, using Italian
values for the vowels. 

<p>Whenever the subject comes up, someone is sure to bring up
all the words in <cite>-ough</cite>, 
or George Bernard Shaw's <cite>ghoti</cite>-- a word which illustrates only Shaw's
wiseacre ignorance. English spelling
may be a nightmare, but it does have rules, and by those rules, <cite>ghoti</cite>
can only be pronounced like <cite>goatee</cite>.


<p><font size=+1>The purpose of this page is to describe those rules-- to <b>explain the system behind English spelling</b>, the rules that tell you
how to pronounce a written word correctly over <b>85% of the time</b>.</font>

<p>Many people expect the opposite as well-- to predict the
spelling from the pronunciations-- not realizing that few orthographies meet this
goal. It's far from true of Spanish,
for instance, which is often held up as an example of a good orthography.
I stopped fervently admiring Spanish
orthography when I saw a sign in a Mexican bakery with about one spelling
mistake every third word.


<p>Several different types of people might be interested in this
page:

<ul>
 <li>foreign learners of English
 <li>native speakers who never quite mastered English spelling
 <li>spelling reformers who care to understand the system they want to replace
 <li>linguists interested in how an inadequate alphabet is manhandled to fit an unruly language.
</ul>

<p>I've also included a sample lexicon and a set of spelling
rules which you can use with my <a href="sounds.htm">Sound
Change Applier</a> to automatically derive the pronunciation.

<hr>

<blockquote><i>Thanks to &Eacute;amonn McManus, Aaron J. Dinkin, Dennis Paul Himes, 
Geoff Eddy, Hirofumi Nagamura, and John Cowan for useful comments and ideas, which I've tried to incorporate here.
</i></blockquote>
<hr>

<h2>The sounds of General American</h2>

<p>If we're discussing spelling, we have to discuss sounds as
well; and this means choosing a reference dialect.  I'll use my own, of course-- a version of General American that's
unexcitingly close to the standard.  I'll call it GA below.


<p>Here's the vowels and consonants of my dialect. For each I give the IPA, 
the representation in the eccentric phonemic transcription I use in this document, 
and a couple of sample words. 

<p>The IPA is given in Unicode; 
if it doesn't look right you have a nasty old non-Unicode-compliant browser.<p>

<blockquote><table border=1 cellspacing=0 cellpadding=0>
<tr style='background-color:#C0C0C0'>
  <td colspan=3><b><center>Vowels</center></b>
  <td colspan=3><b><center>Consonants</center></b>
</tr><tr style='background-color:#C0C0C0'>
  <td><b>IPA</b>
  <td><b>Phoneme</b>
  <td><b>Samples</b>
  <td><b>IPA</b>
  <td><b>Phoneme</b>
  <td><b>Samples</b>
</tr><tr>
  <td>&nbsp;<font face="Lucida Sans Unicode">e</font>
  <td>&nbsp;<tt>&auml;</tt>  
  <td>&nbsp;<cite>r<u>a</u>te</cite>
    
  <td>&nbsp;<font face="Lucida Sans Unicode">p</font>
  <td>&nbsp;<tt>p</tt>
  <td>&nbsp;<cite><u>p</u>a<u>p</u>er</cite>
</tr><tr>
  <td>&nbsp;<font face="Lucida Sans Unicode">&#230;</font>
  <td>&nbsp;<tt>&acirc;</tt>  
  <td>&nbsp;<cite>r<u>a</u>t</cite>
    
  <td>&nbsp;<font face="Lucida Sans Unicode">b</font>
  <td>&nbsp;<tt>b</tt>
  <td>&nbsp;<cite><u>b</u>ook</cite>
</tr><tr>	
  <td>&nbsp;<font face="Lucida Sans Unicode">i</font>
  <td>&nbsp;<tt>&euml;</tt>  
  <td>&nbsp;<cite>m<u>ee</u>t, mach<u>i</u>ne</cite>
    
  <td>&nbsp;<font face="Lucida Sans Unicode">t</font>
  <td>&nbsp;<tt>t</tt>
  <td>&nbsp;<cite><u>t</u>ake</cite>
</tr><tr>
  <td>&nbsp;<font face="Lucida Sans Unicode">&#603;</font>
  <td>&nbsp;<tt>&ecirc;</tt>  
  <td>&nbsp;<cite>m<u>e</u>t, dr<u>ea</u>d</cite>
    
  <td>&nbsp;<font face="Lucida Sans Unicode">d</font>
  <td>&nbsp;<tt>d</tt>
  <td>&nbsp;<cite><u>d</u>ead</cite>
</tr><tr>
  <td>&nbsp;<font face="Lucida Sans Unicode">aj</font>
  <td>&nbsp;<tt>&iuml;</tt>  
  <td>&nbsp;<cite>b<u>i</u>te, c<u>y</u>cle</cite>
    
  <td>&nbsp;<font face="Lucida Sans Unicode">g</font>
  <td>&nbsp;<tt>g</tt>
  <td>&nbsp;<cite><u>g</u>et</cite>
</tr><tr>
  <td>&nbsp;<font face="Lucida Sans Unicode">&#618;</font>
  <td>&nbsp;<tt>&icirc;</tt>  
  <td>&nbsp;<cite>b<u>i</u>t, <u>l</u>ick</cite>
    
  <td>&nbsp;<font face="Lucida Sans Unicode">k</font>
  <td>&nbsp;<tt>k</tt>
  <td>&nbsp;<cite><u>c</u>ape, tal<u>k</u>, <u>q</u>uite</cite>
</tr><tr>
  <td>&nbsp;<font face="Lucida Sans Unicode">o</font>
  <td>&nbsp;<tt>&ouml;</tt>  
  <td>&nbsp;<cite>n<u>o</u>te, s<u>ow</u></cite>
    
  <td>&nbsp;<font face="Lucida Sans Unicode">m</font>
  <td>&nbsp;<tt>m</tt>
  <td>&nbsp;<cite><u>m</u>oon</cite>
</tr><tr>
  <td>&nbsp;<font face="Lucida Sans Unicode">a</font>
  <td>&nbsp;<tt>&ocirc;</tt>  
  <td>&nbsp;<cite>n<u>o</u>t, cl<u>o</u>ck</cite>
    
  <td>&nbsp;<font face="Lucida Sans Unicode">n</font>
  <td>&nbsp;<tt>n</tt>
  <td>&nbsp;<cite><u>n</u>ew</cite>
</tr><tr>
  <td>&nbsp;<font face="Lucida Sans Unicode">ju</font>
  <td>&nbsp;<tt>&uuml;</tt>  
  <td>&nbsp;<cite>c<u>u</u>te, <u>you</u></cite>
    
  <td>&nbsp;<font face="Lucida Sans Unicode">&#331;</font>
  <td>&nbsp;<tt>&ntilde;</tt>
  <td>&nbsp;<cite>si<u>ng</u>, thi<u>n</u>k</cite>
</tr><tr>
  <td>&nbsp;<font face="Lucida Sans Unicode">&#652;</font>
  <td>&nbsp;<tt>&ucirc;</tt>  
  <td>&nbsp;<cite>cut, c<u>o</u>me</cite>
    
  <td>&nbsp;<font face="Lucida Sans Unicode">f</font>
  <td>&nbsp;<tt>f</tt>
  <td>&nbsp;<cite><u>f</u>our, <u>ph</u>ysics</cite>
</tr><tr>
  <td colspan=3>&nbsp;
  <td>&nbsp;<font face="Lucida Sans Unicode">v</font>
  <td>&nbsp;<tt>v</tt>  
  <td>&nbsp;<cite><u>v</u>ine</cite>
</tr><tr>
  <td>&nbsp;<font face="Lucida Sans Unicode">u</font>
  <td>&nbsp;<tt>u</tt>  
  <td>&nbsp;<cite>c<u>oo</u>t</cite>
    
  <td>&nbsp;<font face="Lucida Sans Unicode">&#952;</font>
  <td>&nbsp;<tt>+</tt>
  <td>&nbsp;<cite><u>th</u>in</cite>
</tr><tr>
  <td>&nbsp;<font face="Lucida Sans Unicode">&#596;</font>
  <td>&nbsp;<tt>&ograve;</tt>  
  <td>&nbsp;<cite>c<u>au</u>ght, d<u>o</u>g</cite>
    
  <td>&nbsp;<font face="Lucida Sans Unicode">&#240;</font>
  <td>&nbsp;<tt><u>+</u></tt>
  <td>&nbsp;<cite><u>th</u>is</cite>
</tr><tr>
  <td>&nbsp;<font face="Lucida Sans Unicode">&#650;</font>
  <td>&nbsp;<tt>&ugrave;</tt>  
  <td>&nbsp;<cite>c<u>oo</u>k, p<u>u</u>t</cite>
    
  <td>&nbsp;<font face="Lucida Sans Unicode">s</font>
  <td>&nbsp;<tt>s</tt>
  <td>&nbsp;<cite><u>s</u>o</cite>
</tr><tr>
  <td>&nbsp;<font face="Lucida Sans Unicode">&#601;</font>
  <td>&nbsp;<tt>@</tt>  
  <td>&nbsp;<cite><u>a</u>bove, cyn<u>i</u>c, <u>u</u>ntil</cite>
    
  <td>&nbsp;<font face="Lucida Sans Unicode">z</font>
  <td>&nbsp;<tt>z</tt>
  <td>&nbsp;<cite><u>z</u>oo</cite>
</tr><tr>
  <td colspan=3>&nbsp;
  <td>&nbsp;<font face="Lucida Sans Unicode">&#643;</font>
  <td>&nbsp;<tt>$</tt>
  <td>&nbsp;<cite><u>sh</u>ack</cite>
</tr><tr>
  <td>&nbsp;<font face="Lucida Sans Unicode">aw</font>
  <td>&nbsp;<tt>&ocirc;w</tt>  
  <td>&nbsp;<cite>cr<u>ow</u>d, l<u>ou</u>d</cite>
    
  <td>&nbsp;<font face="Lucida Sans Unicode">&#658;</font>
  <td>&nbsp;<tt><u>$</u></tt>
  <td>&nbsp;<cite>mea<u>s</u>ure</cite>
</tr><tr>
  <td>&nbsp;<font face="Lucida Sans Unicode">oj</font>
  <td>&nbsp;<tt>&ouml;y</tt>  
  <td>&nbsp;<cite>b<u>oy</u>, dr<u>oi</u>d</cite>
    
  <td>&nbsp;<font face="Lucida Sans Unicode">t&#643;</font>
  <td>&nbsp;<tt>&ccedil;</tt>
  <td>&nbsp;<cite><u>ch</u>ew</cite>
</tr><tr>
  <td colspan=3>&nbsp;
  <td>&nbsp;<font face="Lucida Sans Unicode">d&#658;</font>
  <td>&nbsp;<tt>j</tt>
  <td>&nbsp;<cite><u>j</u>u<u>dg</u>e</cite>
</tr><tr>
  <td>&nbsp;<font face="Lucida Sans Unicode">j</font>
  <td>&nbsp;<tt>y</tt>  
  <td>&nbsp;<cite><u>y</u>ou, mill<u>i</u>on</cite>
    
  <td>&nbsp;<font face="Lucida Sans Unicode">r</font>
  <td>&nbsp;<tt>r</tt>
  <td>&nbsp;<cite><u>r</u>an</cite>
</tr><tr>
  <td>&nbsp;<font face="Lucida Sans Unicode">w</font>
  <td>&nbsp;<tt>w</tt>  
  <td>&nbsp;<cite><u>w</u>ait, co<u>w</u></cite>
    
  <td>&nbsp;<font face="Lucida Sans Unicode">l</font>
  <td>&nbsp;<tt>l</tt>
  <td>&nbsp;<cite><u>l</u>ate</cite>
</tr><tr>
  <td colspan=3>&nbsp;
  <td>&nbsp;<font face="Lucida Sans Unicode">h</font>
  <td>&nbsp;<tt>h</tt>
  <td>&nbsp;<cite><u>h</u>ang</cite>
</tr><tr>
  <td>&nbsp;<font face="Lucida Sans Unicode">&#602;</font>
  <td>&nbsp;<tt>@r</tt>  
  <td>&nbsp;<cite>s<u>ear</u>ch, man<u>or</u>, b<u>ir</u>d</cite>
  <td colspan=3 rowspan=3>&nbsp;
</tr><tr>
  <td>&nbsp;<font face="Lucida Sans Unicode">n&#809;</font>
  <td>&nbsp;<tt>@n</tt>  
  <td>&nbsp;<cite>butt<u>on</u>, happ<u>en</u></cite>
</tr><tr>
  <td>&nbsp;<font face="Lucida Sans Unicode">l&#809;</font>
  <td>&nbsp;<tt>@l</tt>  
  <td>&nbsp;<cite>batt<u>le</u>, fin<u>al</u></cite>
 </tr>
</table></blockquote>



<h3>Who cares about dialects?</h3>


<p>Ideally you shouldn't have to worry about my dialect at all:
you could simply take (say) <tt>&ecirc;</tt> to represent whatever <i>you</i> pronounce 
as the vowel in <cite>met</cite>.  Unfortunately, English dialects are not
uniform enough to share a single phonology. There are many words that are not only 
<i>pronounced</i> differently in different dialects-- that is, they have a
distinct <i>phonetic</i> realization-- but
also have their own <i>phonemic</i>
representation. 


<p>Some examples: 

<ul>
<li>GA is rhotic-- we pronounce the post-vocalic r's-- while
other important dialects are not, notably the British standard, RP. 

<li>I distinguish <cite>cot</cite> and <cite>caught</cite>, <cite>Don</cite> and <cite>Dawn</cite>; 
these vowels (<tt>&ocirc;, &ograve;</tt>) merge in the US West.

<li>On the other hand, I merge the vowel sounds in <cite>Mary, merry,</cite>
and <cite>marry</cite>, which are distinguished in Eastern US dialects and in RP.

<li>I pronounce <cite>w</cite> and <cite>wh</cite> the same.
</ul>

<h3>Notational conventions</h3>

<p><b>Spellings</b> are in <cite>teal italics</cite>;
<b>pronunciations</b> are in <tt>blue Courier</tt>. This
convention avoids cluttering the text with brackets and quotation marks.

<p>Thus <cite>g</cite> refers to the letter &lt;g&gt;, while <tt>g</tt> refers to 
the sound /g/, and I will write that <cite>laugh</cite> is pronounced <tt>l&acirc;f</tt>. 


<p>Linguists can take the 'pronunciations' as <b>phonemic</b>;
e.g. I haven't attempted to indicate aspiration, the flapping of medial <tt>t</tt> 
and <tt>d</tt>, the appearance of clear and dark <tt>l</tt>, etc. I indicate
some but not all vowel reductions (basically, those that are reduced in all
forms of the morpheme).


<p><cite>#</cite> represents the beginning or end of a word.
For instance, <cite>#rh</cite> represents an <cite>rh</cite> that
begins a word; <cite>g#</cite> refers to a final <cite>g</cite>.

<p>Capital letters represent variables; e.g. <cite>V</cite> represents any vowel. 



<h2>The computer simulation</h2>


<p>Along with this explanatory page, I've put up

<ul>
<li>a <A HREF="english.lex">sample lexicon</a> of over 5000
English words

<li>a <A HREF="english.sc">sound change file</a> giving the
spelling rules 

<li><A HREF="english.out">sample output</a> from the <a href="sounds.html">Sound
Change Applier</a>

</ul>

<p>The lexicon includes the target pronunciation in GA; I
modified the program to compare the results of the rule application with the
target. <b>The results</b>:

<ul>
<li>3079 (or <b>59%</b>) of the pronunciations are generated <b>perfectly</b>.

<li>4389 (or <b>85%</b>) are generated perfectly or with only <b>minor errors</b>: 
vowel length errors, failure to reduce vowels to <tt>@</tt>, or failure to
voice an <i>s</i>.
</ul>

<p>This is impressive; but it <b>understates the systematicity</b> of English spelling:

<ul>
<li>Many of the errors are off in only one segment.
(E.g. the rules predict everything about <cite>bachelor </cite>except the loss of the middle
vowel. Shouldn't they get some credit
for getting six segments correct?)

<li>Many of the pronunciations are really predictable using
rules beyond the scope of the Sound Change Applier.
I haven't by any means found every possible rule, or stated them
in the best, most general form.

<li>The worst offenders in the language are already
included in the sample; a larger vocabulary would include a higher percentage
of well-behaved spellings.
</ul>


<p>There is <a href="#irregular">a fuller discussion of the mispredictions</a> at
the end of the document.



<p>The <b>odd phonetic transcription</b>, by the way, derives from the dual need to easily represent
sounds both in html and in the sound change file.  
I'm restricted to characters that html supports; and I can't use
capital letters, because I need them for variable definitions in the
rules. As a mnemonic, think of the umlauts
as colons, so that <tt>&ouml;</tt> is short for <tt>o:</tt>,
'long o'.

<p>The wacky spellings I used for the <b>vowels</b>, however, are inherent in the logic of English
spelling. It would only obscure how the
system works if I represented the long and short vowels with IPA forms.


<h2>The rules</h2>

<p>The bulk of this page is basically a <b>human-readable restatement of the rules</b> in the sound change file

<p>The <b>order</b> of the
rules is important. The rules can be
thought of as a <b>recipe</b>: to pronounce
a word, you go down the list of rules, seeing if each one in turn applies, and
applying it if it does. 


<p>The result is sometimes a little backwards in terms
of explaining the system, because <b>exceptions
come first</b>, before the general rules. 
That's the best way to teach the computer; but humans tend to do best by
learning the most general rule first.



<p>I'll warn you: some of these rules are going to seem <b>mondo
obscure</b>. That's because I've tried
to find every regularity I could, even if it only explains half a dozen
words. The yield of some rules may be small
enough that some people would rather just learn the affected words as
irregularities. But if anything I'm <i>more</i>
interested in the minor regularities; they're puzzles, often unfamiliar ones,
and many are the fossils of minor sound changes.

<p>To head off another likely reaction: yes, <b>you can find
exceptions</b> to the rules. I'm
perfectly aware that <cite>ough</cite> is not <i>always</i> pronounced <tt>&ouml;</tt>. 
The point is, what follows are the <i>default</i> rules that work 85% of the time.
Think of <tt>&ouml;</tt> as the default pronunciation of <cite>ough</cite>;
any other pronunciation of <cite>ough</cite> is an irregularity.

<p>And finally: I'm aware that some linguists (e.g. Edward
Carney) have also worked on these problems; unfortunately, I've only seen their
work in summaries. I've tried to be
careful and linguistically informed, but I don't claim to have committed a work
of scholarship.



<h3>Some rewrites</h3>

<p>English has more phonemes than the alphabet has available
symbols; the usual expedient of the orthography for solving this problem is to
use digraphs. (Both the problem and the
solution are inherited from Latin, which had hardly finished tossing out the
Greek letters it didn't think it needed when it started to borrow Greek words
that needed them.)

<p><span class="RuleNo">1</span>. Make the following unconditional
replacements:


<table border=1 cellspacing=0 cellpadding=0>
  <td>&nbsp;<cite>ch</cite> &nbsp;&nbsp;&nbsp;&nbsp;
  <td>&nbsp;<tt>&ccedil;</tt>&nbsp;&nbsp;&nbsp;&nbsp;
</tr><tr>
  <td>&nbsp;<cite>sh</cite>
  <td>&nbsp;<tt>$</tt>
</tr><tr>
  <td>&nbsp;<cite>ph</cite>
  <td>&nbsp;<tt>f</tt>
</tr><tr>
  <td>&nbsp;<cite>th</cite>
  <td>&nbsp;<tt>+</tt>
</tr><tr>
  <td>&nbsp;<cite>qu</cite>
  <td>&nbsp;<tt>kw</tt>
</tr><tr>
  <td>&nbsp;<cite>wr</cite>
  <td>&nbsp;<tt>r</tt>
</tr><tr>
  <td>&nbsp;<cite>wh</cite>
  <td>&nbsp;<tt>w</tt>
</tr><tr>
  <td>&nbsp;<cite>xh</cite>
  <td>&nbsp;<tt>x</tt>
</tr><tr>
  <td>&nbsp;<cite>rh</cite>
  <td>&nbsp;<tt>r</tt>
</tr>
</table>

<p>Before an <cite>o</cite>, replace <cite>wh</cite> with <tt>h</tt> instead: 
<cite>who, whore, whole</cite>.

<p>If you're one of those fossils who still use a voiceless w
or another strange contortion to distinguish <cite>wh </cite>and
<cite>w</cite>, you'd modify this rule.

<p>We can do significantly better than the program if we don't do
these substitutions when the digraph spans a morpheme boundary.
In other words, we shouldn't do the
replacement in compound words like <cite>bosshood, flathead, 
uphill</cite>, or <cite>perhaps</cite>. 

<p>We can also do better if we replace 
<cite>ch</cite> with <tt>k</tt> in words of Greek and Hebrew origin-- 
that is, in two-dollar words like <cite>archaism</cite> or 
<cite>trochaic </cite>or 
<cite>Malachi</cite>.

<p>The program actually replaces only initial <cite>rh</cite>,
since medial <cite>rh </cite>is
so likely to be found in a compound (and it 
doesn't occur finally in the sample lexicon).

<p>(<cite>xh</cite> isn't really a digraph; the rule just reflects the fact that an 
initial <cite>h</cite> isn't pronounced after a prefix ending in <cite>x</cite>, 
as in <cite>exhibit</cite>.)



<p><span class="RuleNo">2</span>. Replace 
<cite>x</cite> with <tt>ks</tt>; but
after <cite>e</cite> and before another vowel, use <tt>
gz</tt> instead. (This is
not an allophonic rule: compare the near-minimal pair 
<cite>exist </cite>and <cite>excite</cite>.)


<p><span class="RuleNo">3</span></b>. Ignore apostrophes (<cite>can't, cop's,
o'clock</cite>). Hyphens can
however be treated as word separators 
(<cite>mother-in-law</cite> is pronounced like <cite>mother in law</cite>).


<h3>The notorious gh</h3>

<p><span class="RuleNo">4</span>. Before a vowel, 
<cite>gh </cite>becomes <tt>g</tt>: <cite>ghost</cite> = <tt>g&ouml;st</tt>.

<p><span class="RuleNo">5</span>. <cite>gh </cite>turns a preceding single vowel long:
<cite>right</cite> = <tt>r&iuml;t</tt>.

<p><span class="RuleNo">6</span>. 
<cite>aught</cite> and <cite>ought</cite> become <tt>&ograve;t</tt>: 
<cite>daughter</cite> = <tt>d&ograve;t@r</tt>, <cite>sought</cite> = <tt>s&ograve;t</tt>.

<p><span class="RuleNo">7</span>. Any other 
<cite>ough </cite>becomes <tt>&ouml;</tt>: 
<cite>dough</cite> = <tt>d&ouml;</tt>.

<p><span class="RuleNo">8</span>. Elsewhere, 
<cite>gh </cite>is simply dropped: <cite>freight</cite> = <tt>fr&auml;t</tt>.

<p>People usually trot out <cite>gh</cite> when they bitch about English spelling.
The culprit is sound change: <cite>gh</cite> used
to do nicely for the <tt>x</tt> sound (now usually represented <cite>kh</cite> when
we transcribe foreign words), but the sound disappeared in everything but
Scots. It usually went quietly, but sometimes, word-finally 
(<cite>laugh, cough, enough, rough, tough,</cite> and not much more) 
it was transformed to <tt>f</tt>instead.

<p><cite>ough</cite> is also notorious, but the usual sound (as
seen in rule 7) is <tt>&ouml;</tt>.  
<cite>Through </cite>is a notable exception.

<p>Initial <cite>gh</cite> is sometimes used to keep the <cite>g </cite>from
softening (<cite>ghetto</cite>);
but generally it's a meaningless variant on <cite>g</cite>, 
said to be introduced by Dutch typesetters in
the early days of printing. In any case
it's no problem, since it's always <tt>g</tt>.  This is one reason Shaw's 
<cite>ghoti</cite> is such a fraud: initial <cite>gh</cite> can <i>never</i> 
be pronounced <tt>f</tt>.


<h3>Unpronounceable initials</h3>

<p><span class="RuleNo">9</span>. In initial <cite>gn, kn, mn, pt, ps, tm</cite>, pronounce
the second letter only: <cite>gnostic</cite> = <tt>n&ocirc;st&icirc;k</tt>, 
<cite>psycho </cite>= <tt>s&iuml;k&ouml;</tt>, 
<cite>knight</cite> = <tt>n&iuml;t</tt>.

<p>Most of these are Greek borrowings-- Greek is much freer with
initial clusters than English is-- but <cite>kn </cite>derives from Old English.



<h3>Replacing y</h3>

<p><span class="RuleNo">10</span>. Replace 
<cite>y</cite> with <tt>&iuml;</tt> if it ends a one-syllable word: 
<cite>ply</cite> = <tt>pl&iuml;</tt>.

<p><span class="RuleNo">11</span>. <cite>ey</cite> is pronounced <tt>&euml;</tt>; 
<cite>ay</cite> is <tt>&auml;</tt>; and <cite>oy</cite> is <tt>&ouml;y</tt>: 
<cite>say, monkey boy</cite> = <tt>s&auml; m&ucirc;nk&euml; b&ouml;y</tt>.

<p><span class="RuleNo">12</span>. Replace 
<cite>y</cite> with <cite>i </cite>if it's not adjacent to a
vowel-- we'll worry later about how to pronounce the <cite>i</cite>. 

<p>Thus, <cite>system</cite> = <tt>s&icirc;st@m</tt>  but 
<cite>you</cite>, where the <cite>y</cite> adjoins a vowel, is <tt>yu</tt>.



<h3>Simplification of stl</h3>

<p><span class="RuleNo">13</span>. The <cite>t </cite>in 
<cite>stl</cite> is lost before a final vowel: 
<cite>bustle</cite> = <tt>b&ucirc;s@l"</tt>, <cite>bristly</cite> = <tt>br&icirc;sl&euml;</tt>.

<p>This could perhaps be generalized; but in slow speech I
leave the <tt>t</tt> in (say) <cite>coastline</cite> or <cite>Christlike</cite>.  
I'm also tempted to generalize to all stops, but the only instance in
the sample lexicon is <cite>muscle</cite>, and it's pretty silly to have a rule that
applies to a single word.



<h3><a name="16">(Af)frication before i</a></h3>

<p><span class="RuleNo">14</span>.  <cite>ci</cite> or <cite>ti</cite> becomes <tt>$</tt> 
before a vowel: <cite>gracious</cite> = <tt>gr&auml;$@s</tt>, 
<cite>nation</cite> = <tt>&auml;$@n</tt>.

<p><span class="RuleNo">15</span>.  <cite>tu</cite> becomes <tt>&ccedil;u</tt> 
before a vowel, or before a liquid (<cite>r, l</cite>) followed by a vowel: 
<cite>mutual</cite> = <tt>m&uuml;&ccedil;u@l</tt>,
<cite>mature</cite> = <tt>m@&ccedil;ur</tt>.

<p><span class="RuleNo">16</span>. <cite>s</cite> becomes <tt>$</tt> (or <tt>$</tt></u> 
if it's preceded by a vowel):

<ul>
<li>before <cite>o</cite>--  <cite>passion</cite> = <tt>p&acirc;$@n</tt>, 
<cite>vision</cite> = <tt>v&icirc;$@n"</tt>.
Note that the <cite>i</cite> is lost.

<li>before <cite>ur</cite>-- <cite>assure </cite>= <tt>@$ur</tt>; 
<cite> leisure</cite> = <tt>l&euml;<u>$</u>@r</tt>.

<li>after <cite>k</cite> and before a vowel: 
<cite>sexual</cite> = <tt>s&ecirc;k$u@l</tt>.
</ul>

<p>At some point English affricated a number of consonants
before a <tt>i</tt> or <tt>y</tt> that preceded another vowel, including
the [<tt>y</tt>] sound that begins <tt>&uuml;</tt>
Sometimes the <tt>y</tt> has been lost since. 
This process seems to be no longer productive-- compare <cite>costume, Casio</cite>.  
(Or is it? In quick speech I do say <tt>k&ocirc;s&ccedil;&ugrave;m</tt>.)

<p>Rule 14 shows another reason <cite>ghoti</cite> is a fraud: 
<cite>ti </cite>only fricativizes when it's followed by a vowel.



<h3>Voicing of s</h3>

<p><span class="RuleNo">17</span>.  <cite>s</cite> is voiced between two vowels 
(<cite>amuse, design, prison</cite>), except after 
<cite>a</cite> (<cite>base, parasite</cite>).

<p>It's easy to find exception to this rule: <cite>disagree, opposite, analysis</cite>-- 
there's even words where the rule applies only for verbs (<cite>abuse, house</cite>).
The rule as stated has more successes than
failures, and I haven't been able to find merely lexical rules that do much
better. A better rule might take the
language of origin into account: the voicing tends to occur in French and Latin
words (<cite>resent, please, reason, miserable</cite>),
but not if they're from Greek (<cite>analysis, isoceles</cite>)
or more exotic languages (<cite>papoose, Osaka</cite>).

<p>The voicing of <cite>s</cite> is so almost predictable that there are
orthographic conventions (borrowed from French) to indicate that we really do
want an <tt>s</tt>: double the <cite>s</cite> (cf. <cite>Moses</cite> vs. <cite>mosses</cite>), or use 
<cite>c</cite> instead (<cite>race</cite> vs. <cite>rase</cite>).
Annoyingly, there are a few cases of unexpectedly voiced <cite>ss</cite> 
(<cite>dessert, dissolve</cite>).

<p>As a corollary of this rule, the American use of <cite>-ize</cite> for British <cite>-ise</cite> 
was unnecessary, although of course it is more foolproof.


<h3>You know me, al</h3>

<p><span class="RuleNo">18</span>. <cite>al </cite>is pronounced <tt>&ograve;l</tt> before 
<cite>r, s, m</cite>, a dental stop, or final <cite>ll</cite>: 
<cite>also, already, wall, bald, although, almost</cite>.

<p><span class="RuleNo">19</span>. <cite>alk</cite> becomes <tt>&ograve;k</tt>, except initially:
<cite>walk </cite>= <tt>w&ograve;k</tt>.

<p>I suspect this is a sound change, obscured by later borrowings
like <cite>alcohol</cite>.



<h3><a name="20">Softening of velars</a></h3>

<p><span class="RuleNo">20</span>. <cite>c</cite> becomes <tt>s</tt> before a front
vowel, <tt>k</tt> elsewhere: <cite>cell</cite> = <tt>s&ecirc;l</tt>, <cite>acid</cite> =
<tt>&acirc;s&icirc;d</tt>, but <cite>cow</cite> = <tt>k&ocirc;w</tt>, 
<cite>backer</cite> = <tt>b&acirc;k@r</tt>, <cite>clear</cite> = <tt>kl&euml;r</tt>.

<p><span class="RuleNo">21</span>. Similarly, 
<cite>g</cite> becomes <tt>j</tt>
before a front vowel, <tt>g</tt> elsewhere: 
<cite>gel</cite> = <tt>j&ecirc;l</tt>, 
<cite>turgid </cite> = <tt>t@rj&icirc;d</tt>, but <cite>got</cite> = <tt>g&ocirc;t</tt>, 
<cite>twig</cite> = <tt>tw&icirc;g</tt>, <cite>gleam</cite> = <tt>gl&euml;m</tt>.

<p><span class="RuleNo">22</span>. If the 
<cite>g </cite>doesn't begin the word, and the triggering 
<cite>e</cite> precedes <cite>o</cite> or <cite>a</cite>, 
the <cite>e</cite> is lost: <cite>changeable</cite> = <tt>c&auml;nj@b@l</tt>; 
<cite>dungeon </cite> = <tt>d&ucirc;nj@n</tt> (but <cite>geology </cite>= <tt>j&euml;&ocirc;l@j&euml;</tt>).

<p><span class="RuleNo">23</span>. Initial <cite>gu</cite> or
final <cite>gue</cite> is pronounced <tt>g</tt>:
<cite>guest</cite> = <tt>g&ecirc;st</tt>, <cite>plague</cite> = <tt>pl&auml;g</tt>. 
(Medially, it tends to be <tt>gw</tt> instead: <cite>language, anguish</cite>.)

<p>Front vowels are <cite>i </cite>and <cite>e</cite>; note that <cite>y</cite> 
was changed to <cite>i</cite> by rule 12.
We owe these rules to a sound change, and
not even our own-- it derives from the history of French.

<p>The last two rules allow <cite>g</cite> to be used for two sounds: 

<ul>
<li><tt>ga ge gi go gu</tt> can be written <cite>ga gue gui go gu</cite>
<li><tt>ja je ji jo ju</tt> can be written <cite>gea ge gi geo geu.</cite>
</ul>

<p>The inserted <cite>e</cite> or <cite>u</cite> are orthographic only; 
they make sure rule 21 applies or doesn't apply, as desired. 

<p>In French, there's a parallel with c: 

<ul>
<li><tt>ka ke ki ko ku</tt> can be written <cite>ca que qui co cu</cite>
<li><tt>sa se si so su</tt> can be written <cite>cea ce ci ceo ceu</cite>
(but it's more usual to write <cite>&ccedil;a ce ci &ccedil;o &ccedil;u</cite>)
</ul>

<p>but it doesn't work so well in English, since our <cite>qu</cite> is still <tt>kw</tt>. 
The inserted <cite>e</cite> is found in just a few words (e.g. 
<cite>placeable</cite>), due to compounding.



<h3>Untangle reverse-written final liquids</h3>

<p><span class="RuleNo">24</span>. <cite>le</cite> and 
<cite>re</cite> (after a consonant, and ending the word)
should be rewritten <tt>@l, @r</tt>.

<p>To be precise, they become syllabic consonants: the final
sound in <cite>bottle</cite>
is a prolonged dark <tt>l</tt>.  
I think this is an allophonic detail, however: if you like, just add a
rule at the end to turn all instances of <tt>@r</tt> into syllabic <tt><u>r</u></tt>.


<h3><a name="25">Short and long vowels</a></h3>

<p>OK, listen up, because these are the <b>two most important
rules</b> of English spelling.

<p><span class="RuleNo">25</span>. Vowels are pronounced long before
an intervocalic consonant (<cite>rate, mete, fine, rote, cute</cite> = 
<tt>r&auml;t m&euml;t f&iuml;n r&ouml;t k&uuml;t</tt>).

<p><span class="RuleNo">26</span>. They're short before two consonants
(<cite>baffle, held, children, rotten, butler</cite>), or before a final consonant 
(<cite>pat, pet, pit, pot, but </cite>= <tt>p&acirc;t p&ecirc;t p&icirc;t p&ocirc;t b&ucirc;t</tt>).


<p>English has a dozen or so vowel phonemes, and this silly
alphabet we inherited from the Romans has just five vowel symbols 
(<cite>y</cite> is sometimes used as a vowel, but as we've
seen, it pointlessly duplicates <cite>i</cite>). The
five symbols can represent ten sounds, thanks to these rules.

<p>Each vowel letter has two basic interpretations, which by
convention are called <b>long</b> and <b>short</b>.  
(Phonetically they're <i>not</i> distinguished by length; <i>tense </i>and <i>lax</i> 
would be more accurate.  But I think the more familiar terms will be
more readable, and remind readers that their old English teachers were onto
something after all.)

<p>In my transcription, <b>long</b>
vowels are marked with a diaresis, since html doesn't supply a macron (<tt>&auml;&euml;&iuml;&ouml;&uuml;</tt>), 
and <b>short </b>vowels with a circumflex (<tt>&acirc;&ecirc;&icirc;&ocirc;&ucirc;</tt>).  
Now you can see why I chose those odd representations-- they come from the
basic logic of English spelling. (Think of the diaresis as the IPA <tt>:</tt> long mark.)

<p>Note that the names of the letters 
<cite>A E I O U</cite> are simply the 'long' vowels.

<p>And where did <i>that </i>come
from? 

<ul>
<li>The spelling of the long vowels is the fault of the
Great Vowel Shift of early modern times.  
Middle English spoke the vowels with their 'proper' vowels, so that
(say) <cite>mate</cite> would have been pronounced <tt>m&ocirc;t@</tt>.  

<li>The short vowels are simply laxed versions of the <i>original</i> sounds of the 
long vowels. <tt>&ecirc;</tt>, for instance, is a
lazy version of <tt>&auml;</tt> (the original sound of long <cite>e</cite>)-- closer
to the muddy center of the vowel space.
</ul>


<p>The above rules work in conjunction with rule 54, which
means that <b>doubling a consonant</b>
changes a medial vowel from long to short: <cite>later/latter,
Peter/petter, biter/bitter, hoping/hopping, cuter/cutter</cite>.



<h3>Exceptions, but general ones</h3>

<p><span class="RuleNo">27</span>. Final <cite>ind</cite> is <tt>&iuml;nd</tt>,
final <cite>oss</cite> is <tt>&ograve;s</tt>; final <cite>og</cite> is <tt>&ograve;g</tt>:
<cite>mind, boss, dog</cite> = <tt>m&iuml;nd b&ograve;s d&ograve;g</tt>.

<p><span class="RuleNo">28</span>. <cite>o</cite> also becomes <tt>&ograve;</tt> 
before <cite>f</cite> and another consonant (<cite>offer</cite> = <tt>&ograve;f@r</tt>, 
<cite>soften</cite> = <tt>s&ograve;f@n</tt>).

<p><span class="RuleNo">29</span>. <cite>wa</cite> is pronounced <tt>w&ocirc;</tt> 
before a dental or alveolar consonant (<tt>t d n s +</tt>): 
<cite>want, wander, swan, Rwanda, swat, wad, wasp</cite>,
and as <tt>w&ograve;</tt> between <cite>w</cite> and <tt>(t)$</tt>: <cite>wash, squash, watch</cite>
= <tt>w&ograve;$ skw&ograve;$ w&ograve;&ccedil;</tt>.

<p><span class="RuleNo">29a</span>. <cite>u</cite> is pronounced <tt>u</tt> before <cite>l</cite>, 
or after a labial stop (<tt>pb</tt>) and before a sibilant (<tt>s$&ccedil;</tt>): <cite>adult, push, butch</cite>.  
(This doesn't apply if the u is long: <cite>mule</cite>.)

<p>I don't think I ever noticed these generalizations till I started working out the rules 
for this page.  At least some of these, such as 29a, are sound changes from Shakespeare's time.

<p>Rules such as 6, 18, 19, 27, 28, and 51 introduce <tt>&ograve;</tt>, 
a vowel which (as signalled by the odd diacritic in my
transcription) doesn't fit well into English phonology.
The fact that a velar occurs in many of the
rule conditions suggests that it was originally an allophonic variant of /&ocirc;/
and /&acirc;/ in this environment-- compare <cite>dog, ought,
long, walk</cite> with <cite>dot, out, lot, wad</cite>.
But it's now phonemic in GA, as can be seen
in the minimum triad <cite>caught, cot, cat</cite>.
These rules would have to be modified (and
some could be eliminated) in dialects that merge <tt>&ograve;</tt> and <tt>&ocirc;</tt>.

<p>For some speakers, rule 29a only applies after labials, so that <cite>pull</cite> and <cite>dull</cite> don't rhyme.

<h3>Softening of gn</h3>

<p><span class="RuleNo">30</span>. Except before a vowel, the vowel in
<cite>ign</cite> or <cite>igm</cite> lengthens, and the <cite>g</cite> is lost: 
<cite>alignment paradigm</cite> = <tt>@l&iuml;nm@nt, p&auml;r@d&iuml;m</tt>, but 
<cite>igneous</cite> = <tt>&icirc;gn&euml;@s</tt>.

<p><span class="RuleNo">31</span>. The 
<cite>g</cite> is simply lost in <cite>eign</cite>: <cite>feign</cite> = <tt>f&auml;n</tt>.


<h3>Handling of -ous</h3>

<p><span class="RuleNo">32</span>. Except before a vowel, 
<cite>ous</cite> reduces to <tt>@s</tt>: 
<cite>jealous</cite> = <tt>j&ecirc;l@s</tt>.

<p>I'm ambivalent about rules that relate to a particular
suffix, since arguably the pronunciation is simply a fact about the suffix in
the mental lexicon. But a suffix can apply to dozens of words, so there was a
large gain from including some such rules in the file.

<p>Note the importance of order: this rule has to be ordered before 
silent <cite>e</cite> deletion, or it will apply to words like <cite>arouse</cite>.



<h3>Removal of silent e</h3>

<p><span class="RuleNo">33</span>. Remove final <cite>e</cite>: 
<cite>rate mike cute</cite> = <tt>r&auml;t m&iuml;k k&uuml;t</tt> 
(unless it's the only vowel in the word, as in <cite>he</cite>).

<p>This and rules 25 and 26 (on long and short vowels) are
the guts of the English spelling system.  
They allow the five vowel symbols to represent ten vowel phonemes.

<p>English orthography tends to preserve the spelling of
morphemes in derived words, including their final <cite>e</cite>.  
The program is too stupid to handle this, since it has no way of
recognizing compounds. But of course in
words like <cite>safety, lovely, changeable, careful, warehouse, jukebox, placement, placeholder</cite>
the <cite>e</cite> in the first morpheme should be deleted by this rule.

<p>People pay tribute to these rules every time they make
up words-- whether for marketing purposes (<cite>Nite-Lite,
Cold-Eeze, Unix</cite>), slang (<cite>reefer, dweeb,
doofus</cite>), a created world (<cite>hobbit, Leela,
Oz, Alley Oop, Naboo, Mr. Magoo, Morlock</cite>), or for borrowings (
<cite>thuggee, kangaroo, tycoon, igloo, tepee</cite>).
Words that don't fit the pattern, like <cite>Linux</cite>, can cause confusion.



<h3>Add shortening; stir</h3>

<p>Some vowels that are orthographically long are pronounced
short, and frankly I haven't put my finger on the pattern.
In the file I did add this rule:

<p><span class="RuleNo">34</span>. Shorten a vowel that precedes a
simple, final CV syllable (and is not the first syllable in the word).

<p>This handles words like <cite>anomaly, cinema, sanity, biology, century</cite>; but it
fails on other words, like <cite>patina, tuxedo, agora.</cite>  
Obviously the shortened vowels are all unstressed; but the
idea here is to predict pronunciations from the spelling, and the spelling
doesn't indicate the stress. 

<p>(We've already removed silent <cite>e</cite>, so this rule isn't triggered by
words like <cite>phoneme</cite>.)

<p>Somewhere I read that long vowels can't occur earlier than
the antepenult; but obvious counterexamples are 
<cite><u>i</u>solating</cite>
or <cite><u>u</u>nification</cite>.  
I'll see if I can improve the generalization, however.



<h3>Vowel digraphs</h3>

<p>Besides the long/short trick, English expands its repertoire
of vowel representations with digraphs.  
Quite a few of these are redundant, and there are lots of
exceptions-- this, and not <cite>ch</cite> or 
<cite>ough</cite>, is the real weak point of English spelling.

<p><span class="RuleNo">35</span>. <cite>iV</cite> (that is, <cite>i</cite> plus another vowel) becomes 
<tt>&iuml;@</tt> in the initial syllable: <cite>bias, diagram</cite> = <tt>b&iuml;@s, d&iuml;@gr&acirc;m</tt>.

<p><span class="RuleNo">36</span>. Exceptions to the following rule:

<ul>
<li>Final <cite>ow</cite> is pronounced <tt>&ouml;</tt>: <cite>slow, rainbow, overthrow</cite>.

<li><cite>oo</cite> is pronuonced <tt>&ugrave;</tt> before a <cite>k</cite>: 
<cite>book, crook, look</cite>.

<li><cite>ei</cite> is pronuonced <tt>&euml;</tt> after <tt>s</tt>: <cite>perceive, ceiling, seize</cite>.

<li><cite>ie</cite> is pronounced <tt>&iuml;</tt> finally: <cite>dye, necktie</cite>.

<li><cite>oul</cite> becomes <tt>&ugrave;</tt> before a final <cite>d</cite>.
</ul>

<p><span class="RuleNo">37</span>. Make the following substitutions:

<table border=1 cellspacing=0 cellpadding=0>
<tr>
  <td>&nbsp;<cite>eau</cite>  &nbsp;&nbsp;&nbsp;&nbsp;
  <td>&nbsp;<tt>&ouml;</tt>  &nbsp;&nbsp;&nbsp;&nbsp;
</tr><tr>
  <td>&nbsp;<cite>ai</cite>
  <td>&nbsp;<tt>&auml;</tt>
</tr><tr>
  <td>&nbsp;<cite>au, aw</cite>
  <td>&nbsp;<tt>&ograve;</tt>
</tr><tr>
  <td>&nbsp;<cite>ee</cite>
  <td>&nbsp;<tt>&euml;</tt>
</tr><tr>
  <td>&nbsp;<cite>ea</cite>
  <td>&nbsp;<tt>&euml;</tt>
</tr><tr>
  <td>&nbsp;<cite>ei</cite>
  <td>&nbsp;<tt>&auml;</tt>
</tr><tr>
  <td>&nbsp;<cite>eo</cite>
  <td>&nbsp;<tt>&euml;@</tt>
</tr><tr>
  <td>&nbsp;<cite>eu, ew</cite>
  <td>&nbsp;<tt>&uuml;</tt>
</tr><tr>
  <td>&nbsp;<cite>ie</cite>
  <td>&nbsp;<tt>&euml;</tt>
</tr><tr>
  <td>&nbsp;<cite>iV</cite>
  <td>&nbsp;<tt>&euml;@</tt>
</tr><tr>
  <td>&nbsp;<cite>oa</cite>
  <td>&nbsp;<tt>&ouml;</tt>
</tr><tr>
  <td>&nbsp;<cite>oe</cite>
  <td>&nbsp;<tt>&ouml;</tt>
</tr><tr>
  <td>&nbsp;<cite>oo</cite>
  <td>&nbsp;<tt>u</tt>
</tr><tr>
  <td>&nbsp;<cite>ou, ow</cite>
  <td>&nbsp;<tt>&ocirc;w</tt>
</tr><tr>
  <td>&nbsp;<cite>oi</cite>
  <td>&nbsp;<tt>&ouml;y</tt>
</tr><tr>
  <td>&nbsp;<cite>ua</cite>
  <td>&nbsp;<tt>&uuml;@</tt>
</tr><tr>
  <td>&nbsp;<cite>ue</cite>
  <td>&nbsp;<tt>u</tt>
</tr><tr>
  <td>&nbsp;<cite>ui</cite>
  <td>&nbsp;<tt>u</tt>
</tr>
</table>

<p>Again, the program is not smart enough to recognize when the
digraph spans a morpheme boundary, and thus should be treated as two separate
vowels: <cite>goer</cite> = <tt>g&ouml;@r</tt>, <cite>coaxial</cite> = <tt>k&ouml;&acirc;ks&euml;@l</tt>.

<p>Annoyingly, some of these digraphs have at least two
values: cf. <cite>wool, fool; mead, dread; fief, friend; reign, seize; ground, group</cite>.
The values in the table are those that occur
most often. (The alternatives are
generally just a step or two apart phonetically, e.g. <tt>u/&ugrave;, &euml;/&ecirc;, &auml;/&euml;</tt>.)

<p>For ease of exposition I've put the final 
<cite>ie </cite>rule here, but it really goes before rule 14
(affrication); otherwise terrible things happen to words like <cite>untie</cite>.



<h3>Those pesky final syllabics</h3>

<p><span class="RuleNo">38</span>. Any vowel reduces to <tt>@</tt> before final <cite>l</cite>: 
<cite>battle, final, hovel, evil, symbol</cite>.

<p><span class="RuleNo">39</span>. Any short vowel reducts to <tt>@</tt> 
before a final <cite>n</cite>: <cite>human, frighten, cabin, button</cite>.

<p>These rules don't apply to monosyllables (<cite>pal, can</cite>),
nor to vowels that have already been assigned a particular value by an earlier
rule (e.g. <cite>meal</cite> to <tt>m&euml;l</tt> by rule 37).

<p>These rules could probably be refined; they don't apply to
stressed finals, but again, the orthography doesn't indicate stress.

<p>You can take <tt>@l</tt> as a phonemic representation, or add a rule at the end to 
replace it with vocalic <u><tt>l</tt></u>. Ditto for <tt>@n</tt>.



<h3>Suffix simplifications</h3>

<p><span class="RuleNo">40</span>. The following suffixes are reduced
as follows:

<table border=1 cellspacing=0 cellpadding=0>
 <tr>
  <td>&nbsp;<cite>-able, -ible</cite>  &nbsp;&nbsp;&nbsp;&nbsp;
  <td>&nbsp;<tt>@b@l</tt>  &nbsp;&nbsp;&nbsp;&nbsp;
 </tr>
 <tr>
  <td>&nbsp;<cite>-lion</cite>
  <td>&nbsp;<tt>ly@n</tt>
 </tr>
 <tr>
  <td>&nbsp;<cite>-nion</cite>
  <td>&nbsp;<tt>ny@n</tt>
 </tr>
</table>

<p>Again, we really shouldn't have 'rules' for single lexical
entries. But these suffixes are common,
so the rule has a large yield.



<h3>Unpronounceable finals</h3>

<p><span class="RuleNo">41</span></b>. A final 
<cite>b</cite> or <cite>n</cite> is not pronounced if preceded by an <cite>m</cite>:
<cite>damn bomb</cite> = <tt>d&acirc;m b&ocirc;m</tt>.


<h3>Final vowel coloration</h3>

<p><span class="RuleNo">42</span>. Pronounce any remaining final vowel
as follows:

<table border=1 cellspacing=0 cellpadding=0>
 <tr>
  <td>&nbsp;<cite>-a</cite>  &nbsp;&nbsp;&nbsp;&nbsp;
  <td>&nbsp;<tt>@</tt>  &nbsp;&nbsp;&nbsp;&nbsp;
 </tr>
 <tr>
  <td>&nbsp;<cite>-i</cite>
  <td>&nbsp;<tt>&euml;</tt>
 </tr>
 <tr>
  <td>&nbsp;<cite>-o</cite>
  <td>&nbsp;<tt>&ouml;</tt>
    
 </tr>
 <tr>
  <td>&nbsp;<cite>-u</cite>
  <td>&nbsp;<tt>u</tt>
 </tr>
</table>

<p>A final vowel is usually the mark of a foreign word, which
is why final vowels tend to have the 'continental' values: 
<cite>sushi, cello, haiku</cite>.
Earlier borrowings were nativized, meaning
that final vowels had to be written as diphthongs (e.g. <cite>Munsee</cite>, <cite>Hindoo</cite>).

<p>Since final <cite>-e</cite> is already in use, we used to mark one that
was supposed to be pronounced (<cite>Chlo&euml; </cite>= <tt>kl&ouml;&euml;</tt>), or, if we were
borrowing from French, we retained the accent (<cite>caf&eacute;</cite> = <tt>k&acirc;f&auml;</tt>).
But English seems to be so allergic to
diacritics that these helpful conventions have largely been lost.



<h3>Vowels before r </h3>

<cite>r</cite> is hell on English vowels; it tends to color
the vowels, and in many dialects, disappear.  
In GA there are 12 monophthongal vowels, but only 6 can appear before <tt>r</tt>--  
<tt>&auml; &euml; &ocirc; &ouml; &ograve; u</tt>-- plus <tt>@r</tt>,
which is really just a prolonged vocalic <tt><u>r</u></tt>.

<p><span class="RuleNo">43</span>. An <tt>&ocirc;w, &ocirc;</tt>, or <tt>&ograve;</tt> 
resulting from the previous rules changes to <tt>&ouml;</tt> before an <tt>r</tt>: 
<cite>course</cite> = <tt>k&ouml;rs</tt>, <cite>for</cite> = <tt>f&ouml;r</tt>.

<p><span class="RuleNo">44</span>. <cite>war </cite>is pronounced <tt>w&ouml;r</tt>, 
except before a vowel: 
<cite>warlock, war, dwarf</cite> = <tt>w&ouml;rl&ocirc;k, w&ouml;r, dw&ouml;rf</tt>;
and <cite>wor</cite> is pronounced <tt>w@r</tt>: <cite>word, worst, worry</cite>.

<p><span class="RuleNo">45</span>. <tt>&ecirc;</tt> or <tt>&acirc;</tt> before a double 
<cite>r</cite> (and <tt>&ecirc;</tt> before <cite>ri</cite>) become <tt>&auml;</tt>: 
<cite>terror, marry, merit</cite> = <tt>t&auml;r@r, m&auml;r&euml;, m&auml;r&icirc;t</tt>.

<p><span class="RuleNo">46</span>. <tt>&acirc;</tt> before any other <cite>r</cite>
becomes <tt>&ocirc;</tt>: <cite>mark, star </cite>= <tt>m&ocirc;rk, st&ocirc;r</tt>.

<p><span class="RuleNo">47</span>. <tt>&ecirc;, &icirc;, &ucirc;</tt> before <cite>r</cite> 
are reduced to schwa: <cite>perk, fir, fur</cite> = <tt>p@rk, f@r, f@r</tt>.

<p>Thanks to the infamous rule 45, I pronounce <cite>Mary, merry, marry</cite> 
the same. If you left
this rule out, it would probably correctly predict the pronounciation of
Easterners and Britons who distinguish them.



<h3>The velar nasal ng</h3>

<p>The careful reader may wonder why <cite>ng</cite> was not handled earlier, with
the other consonantal digraphs. The reason is that orthographically, it acts as a 
double consonant-- e.g. <cite>singer</cite>
has a short not a long <b>i</b>.  But now it's time to handle it.

<p>For lack of an eng, I represent the velar nasal as <tt>&ntilde;</tt>; 
don't confuse it with a palatalized <tt>ny</tt>.


<p><span class="RuleNo">48</span>.  <cite>ng</cite> becomes <tt>&ntilde;g</tt> 
before a liquid (<tt>r, l</tt>) or semivowel (<tt>y, w</tt>): 
<cite>angry, England, singular, anguish</cite> = 
<tt>&auml;&ntilde;gr&euml;, &icirc;&ntilde;gl&acirc;nd, s&icirc;&ntilde;g&uuml;l@r, &auml;&ntilde;gw&icirc;$</tt>. 

<p><span class="RuleNo">49</span>. <cite>ng</cite> becomes <tt>&ntilde;</tt> finally, or before
another consonant: 
<cite>hung</cite> = <tt>h&ucirc;ng</tt>, <cite>length</cite> = <tt>l&auml;&ntilde;+.</tt>

<p><span class="RuleNo">50</span>. <cite>n</cite> becomes <tt>&ntilde;</tt> before a velar stop
(<tt>k, g</tt>): <cite>anger</cite> = <tt>&auml;&ntilde;g@r</tt>, <cite>think</cite> = <tt>+&icirc;&ntilde;k</tt>.

<p><span class="RuleNo">51</span>. <tt>&ocirc;</tt> becomes <tt>&ograve;</tt>, and 
<tt>&acirc;</tt> becomes <tt>&auml;</tt> before <tt>&ntilde;</tt>: 
<cite>song</cite> = <tt>s&ograve;&ntilde;</tt>; <cite>hang</cite> = <tt>h&auml;&ntilde;</tt>.

<p>Note that rule 50 doesn't apply to words like <cite>hung</cite>,
because rule 49 already removed the <cite>g </cite>in those words.

<p>50 is arguably merely allophonic, but since it's completely
consistent I treated it as a spelling rule.  
You could certainly say that a word like <cite>ungrateful</cite>
'really' has an underlying /ng/, because it's composed of 
<cite>un</cite> plus <cite>grateful</cite>;
then this, as in most languages, will get pronounced <tt>&ntilde;g</tt>.
But if you go that route, you can't actually show that English allows 
/&ntilde;g/ as well as /ng/-- how do we know that 
<cite>wrong</cite> isn't actually /r&ograve;ng/, modified by the
allophonic rule? The important thing is
not to pretend that we have a contrast of /ng/ and /&ntilde;g/.



<h3>Voicing of s</h3>

<p><span class="RuleNo">52</span>. <cite>s</cite> is voiced finally, after a voiced oral stop: 
<cite>dogs</cite> = <tt>d&ograve;gz</tt>.

<p><span class="RuleNo">53</span>. It's also voiced before final <cite>m</cite>: 
<cite>prism</cite> = <tt>pr&icirc;zm</tt>.

<p>The first of these rules is really morphophonemic: the
plural, possessive, and 3p singular inflections of English are spelled <cite>s</cite> even
when, by assimilation, they're pronounced <tt>z</tt>. This rule is not
phonological, as can be seen by a word like <cite>chance</cite> = <tt>&ccedil;&acirc;ns</tt>; 
compare <cite>fans</cite> = <tt>f&acirc;nz</tt>.



<h3>Double consonants</h3>

<p><span class="RuleNo">54</span>. A double consonant is pronounced
singly: <cite>dinner, buzzard, hassle</cite> = <tt>d&icirc;n@r, b&ucirc;z@rd, h&acirc;s@l</tt>.

<p><span class="RuleNo">55</span></b>. A <tt>t</tt> disappears before 
<tt>&ccedil;</tt>, and a <tt>d</tt> before <tt>j</tt>: 
<cite>batch</cite> = <tt>b&acirc;&ccedil;</tt>, <cite>judge</cite> = <tt>j&ucirc;j</tt>.

<p><span class="RuleNo">56</span>. An <tt>s</tt> disappears before <tt>$</tt>: 
<cite>pressure</cite> = <tt>pr&ecirc;$r</tt>.

<p>Rule 54 works hand in hand with <a href="#25">rule 25</a>: a consonant is
doubled to show that the preceding vowel is short: <cite>redder</cite> = <tt>r&ecirc;d@r</tt> 
(compare <cite>red</cite>, where the <cite>d</cite> doesn't need to be doubled
because a vowel preceding a final consonant is already short).

<p>Rule 55 is something of a corollary: to 'double' <tt>&ccedil;</tt>, we write 
<cite>tch</cite> rather than <cite>chch</cite>; and to double a <tt>j</tt>, 
we write <cite>dg</cite> rather than <cite>jj</cite> or <cite>gg</cite>.

<p>Rule 56 goes with <a href="#16">rule 16</a>, which changed <cite>s</cite> to <span
<tt>$</tt> before some instances of <cite>u</cite>. 



<h2><a name="almost">Almost but not quite regular</a></h2>

<p>In the rule list there's <b>almost</b> a rule that changes <cite>o</cite> to 
<tt>&ucirc;</tt> before certain fricatives or nasals. Here's a list of affected words, as well as
counterexamples:

<table border=1 cellspacing=0 cellpadding=0>
 <tr>
  <td>&nbsp;<tt>_v</tt>&nbsp;&nbsp;&nbsp;
  <td>&nbsp;<cite>above, cover, dove, glove, govern, hovel, hover, love, oven, shovel,
  of</cite>
  <td>&nbsp;<cite>clover, prove, drover, jovial, move, novel, over, poverty,
  proverb, province, sovereign, stove, bovine</cite>
</tr><tr>
  <td>&nbsp;<tt>_l</tt>
  <td>&nbsp;<cite>color</cite>
  <td>&nbsp;<cite>apology, polo</cite>
</tr><tr>
  <td>&nbsp;<tt>_+</tt>
  <td>&nbsp;<cite>other, another, mother, brother, nothing</cite>
  <td>&nbsp;<cite>both, bother, broth, brothl, cloth, clothes, moth</cite>
</tr><tr>
  <td>&nbsp;<tt>_n</tt>
  <td>&nbsp;<cite>onion, none, money, monk, monkey, month, wonder, front, son,
  sponge, honey, Monday, one</cite>
  <td>&nbsp;<cite>alone, bone, honest, honor, tonight, pond, beyond, conk</cite>
</tr><tr>
  <td>&nbsp;<tt>_m</tt>
  <td>&nbsp;<cite>come, become, from, some, stomach</cite>
  <td>&nbsp;<cite>bomb, comb, dome, home, gnome, Mom, whom, womb</cite>
</tr>
</table>

<p>Most of these turn out to be due to an orthographic or even a calligraphic rule: medieval English scribes wrote <cite>o</cite> instead of <cite>u</cite> before <cite>m, n, v</cite>, apparently because in the medieval hand, the verticals of the <cite>u</cite> ran confusingly together with those of the following consonant.  



<h2><a name="irregular">So what's irregular?</a></h2>

<p>The biggest source of errors are those that I
considered <b>near-misses</b>: instances where the rules get the length of a vowel
wrong, or don't predict a reduction to schwa, or don't predict a voiced <cite>s</cite>.

<p>The first two of these are <b>a feature not a bug</b>,
since they make word roots recognizable, despite predictable differences in
pronunciation. For instance, the root <cite>pedant</cite>
is spelled identically in <cite>pedant</cite> (<tt>p&ecirc;d@nt</tt>) and 
<cite>pedantic</cite> (<tt>p@d&acirc;nt&icirc;k)</tt>).  This underlines the relationship between the
two words, despite the fact that neither root vowel is pronounced the
same. Similarly, <cite>sanity</cite> has a short a (<tt>s&acirc;n&icirc;t&euml;</tt>), 
although a vowel preceding a single consonant is
normally long; this is an 'error', but it keeps the same spelling of the root
as in <cite>sane</cite>.

<p>Putting these near-misses aside, my program gets 791
words wrong in a 5180-word sample vocabulary.  

<p>Many of these are really stupidities of the program,
not the language. There are:

<ul>
<li>188 simple variations of other errors-- e.g. since <cite>busy</cite> is
wrongly predicted to have a <tt>&uuml;</tt>, so is <cite>business</cite>

<li>52 borrowings using foreign spelling conventions (e.g. 
<cite>aficionado, bourgeois, cello, stein</cite>).  
Borrowings are common enough in English that writers can learn the
patterns for each source language.

<li>18 instances of final <cite>-ed</cite> taken as <tt>&ecirc;d</tt>

<li>45 words (mostly Greek) where <cite>ch</cite> = <tt>k</tt> not <tt>&ccedil;</tt>

<li>45 silent <cite>e</cite>'s not recognized as such due to compounding

<li>20 over-enthusiastic vowel reductions (usually due to
stress falling where, statistically, it doesn't occur much: 
<cite>amen, violin</cite>;
or to vowels that unexpectedly don't turn to schwa before 
<cite>r</cite>: <cite>m<u>i</u>rror, s<u>e</u>rgeant</cite>).

<li>6 instances of consonant combinations taken as single
sounds despite crossing a morpheme boundary (e.g. 
<cite>dishonor, shepherd</cite>)
</ul>

<p>That leaves about 420 words wrong, less than 10%; the
major categories are as follows:

<ul>
<li>195 misinterpretations of diphthongs; some of these are
genuine ambiguities in English spelling 
(cf. <cite>dead, mead, real; die, sieve, science, fief</cite>); 
others are due to insufficient analysis 
(e.g. <cite>poet</cite> is mispredicted simply because I didn't provide a rule for 
<cite>oe</cite>-- it wasn't worth it, it occurred too rarely in the lexicon).

<li>37 examples of the <cite>o</cite> to <tt>&ucirc;</tt> change <a href="#almost">discussed
above</a>.

<li>26 indefensible vowel spellings (e.g. <cite>pretty, women,
resin, English, lose, swamp, water, bury, lawyer</cite>).

<li>17 consonant clusters not simplified enough (e.g. <cite>half, folks,
listen, mortgage, raspberry</cite>).  

<li>17 instances of an unexpected (or mispredicted) <tt>&ograve;</tt>; e.g. 
<cite>cloth, frost, chocolate</cite>.

<li>18 instances of final <cite>-y</cite> being <tt>&iuml;</tt> rather than <tt>&euml;</tt> .

<li>13 annoying cases where <cite>g </cite>before a front vowel is hard (e.g.
<cite>get, give</cite>); there are
also 4 cases where <cite>gg</cite> + front vowel was taken incorrectly as 
<tt>gj</tt>-- which it should be, dammit (<cite>suggest</cite>) but often isn't (<cite>stagger</cite>).

<li>8 instances of an unexpected <tt>&ugrave;</tt>; e.g.
<cite>put, wolf, woman. </cite>(These all begin with labials-- these may be related to rule 29a.)

<li>10 unexpected (af)frications (e.g. <cite>educate, ocean, righteous, sure</cite>);
there's also an instance of an unexpected lack of frication (<cite>absurd</cite>)

<li>8 more instances of <cite>er</cite> becoming <tt>&auml;r</tt> (besides those
noted in the rules-- e.g. <cite>era, there, herald, very</cite>)

<li>6 instances of vowels unexpectedly dropping (e.g. <cite>bachelor,
vegetable, Wednesday</cite>)
</ul>


<h2>Generating spellings from pronunciation</h2>

<p>Can you <b>reverse </b>these rules to get instructions on
how to spell a word given its pronunciation?  
Not really, since there are too many alternative spellings.
However, the following table can be taken as a first approximation. For each GA
phoneme, I list the spellings referred to in the rules above.  Caveats:

<ul>
 <li>Remember <a href="#25">the long/short vowel rules</a> (25,26). 
 <ul>
  <li>To ensure a short pronunciation, double the following consonant.
  <li>To ensure a long pronunciation:
  <ul>
   <li>at the end of a word, add a silent e 
   <li>elsewhere in the word, use a diphthong instead.
  </ul>
 </ul>
 <li>Remember the softening of velars; see <a href="#20">rules 20-23</a> for a discussion of how to spell 
     <tt>s/k/g/j</tt> before various vowels.
 <li>Parenthesized characters represent the environment where you can use a spelling.
     Examples:
 <ul>
  <li>under <tt>s</tt>, <cite>(V)ss(V)</cite> means that you can spell it 
      <cite>ss</cite> between two vowels
  <li>under <tt>&auml;</tt>, <cite>a(ng)</cite> means that you can spell it 
      <cite>a</cite> before <cite>ng</cite>.
 </ul>
 <li><cite># </cite>represents the end or beginning of a word:
 <ul>
  <li><cite>i#</cite> under <tt>&iuml;</tt> means that this spelling occurs word-finally.
 </ul>
 <li><tt>ks</tt> (or intervocalic <tt>gz</tt>) can be written <cite>x</cite>.
 <li>It's preferable to spell a word the same way across all morphological changes,
     even if it means slight violations of the rules (e.g. 'silent final e' in
     the middle of a word). 
 <li>Likewise: write reduced vowels with the full vowel in a morphologically related
     word. E.g. the second vowel in <cite>parent </cite>is <cite>e</cite>
     because we have a full <tt>&ecirc;</tt> in <cite>parental</cite>.
</ul><p>

<table border=1 cellspacing=0 cellpadding=0>
<tr style='background-color:#C0C0C0'>
  <td><b>Phoneme</b></td>
  <td><b>Spellings</b></td>
  <td><b>Phoneme</b></td>
  <td><b>Spellings</b></td>
</tr><tr>
  <td>&nbsp;<tt>&auml;</tt></td>
  <td>&nbsp;<cite>a, ay, ai, ei, e(r), a(ng)</cite></td>
  <td>&nbsp;<tt>p</tt></td>
  <td>&nbsp;<cite>p</cite></td>
</tr><tr>
  <td>&nbsp;<tt>&acirc;</span></td>
  <td>&nbsp;<cite>a</cite></td>
  <td>&nbsp;<tt>b</cite></td>
  <td>&nbsp;<cite>b</cite></td>
</tr><tr>
  <td>&nbsp;<tt>&euml;</tt></td>
  <td>&nbsp;<cite>e, ee, ea, ey, (c)ei, e(V), i#, y#</cite></td>
  <td>&nbsp;<tt>t</tt></td>
  <td>&nbsp;<cite>t</cite></td>
</tr><tr>
  <td>&nbsp;<tt>&ecirc;</tt></td>
  <td>&nbsp;<cite>e, ea</cite></td>
  <td>&nbsp;<tt>d</tt></td>
  <td>&nbsp;<cite>d</cite></td>
</tr><tr>
  <td>&nbsp;<tt>&iuml;</tt></td>
  <td>&nbsp;<cite>i, y ,ie, igh, ig(n), i(V)</cite></td>
  <td>&nbsp;<tt>g</tt></td>
  <td>&nbsp;<cite>g, gh(i/e/y)</cite></td>
</tr><tr>
  <td>&nbsp;<tt>&icirc;</tt></td>
  <td>&nbsp;<cite>i, y</cite></td>
  <td>&nbsp;<tt>k</tt></td>
  <td>&nbsp;<cite>k, c(a/o/u), q(u), ck#</cite></td>
</tr><tr>
  <td>&nbsp;<tt>&ouml;</tt></td>
  <td>&nbsp;<cite>o, oa, oe, ough, o#, ow#, eau</cite></td>
  <td>&nbsp;<tt>m</tt></td>
  <td>&nbsp;<cite>m</cite></td>
</tr><tr>
  <td>&nbsp;<tt>&ocirc;</tt></td>
  <td>&nbsp;<cite>o, (w)a(n/s/t/d), a(r)</cite></td>
  <td>&nbsp;<tt>n</tt></td>
  <td>&nbsp;<cite>n</cite></td>
</tr><tr>
  <td>&nbsp;<tt>&uuml;</tt></td>
  <td>&nbsp;<cite>u, eu, ew</cite></td>
  <td>&nbsp;<tt>&ntilde;</tt></td>
  <td>&nbsp;<cite>ng, n(k,g)</cite></td>
</tr><tr>
  <td>&nbsp;<tt>&ucirc;</tt></td>
  <td>&nbsp;<cite>u</cite></td>
  <td>&nbsp;<tt>f</tt></td>
  <td>&nbsp;<cite>f, ph</cite></td>
</tr><tr>
  <td colspan=2>&nbsp;
  <td>&nbsp;<tt>v</tt></td>
  <td>&nbsp;<cite>v</cite></td>
</tr><tr>
  <td>&nbsp;<tt>u</tt></td>
  <td>&nbsp;<cite>oo, ue, ui, u#</cite></td>
  <td>&nbsp;<tt>+</tt></td>
  <td>&nbsp;<cite>th</cite></td>
</tr><tr>
  <td>&nbsp;<tt>&ograve;</tt></td>
  <td>&nbsp;<cite>au, aw, augh(t), a(l), (w)a(sh,ch), o(ss#, g#, fC, ng)</cite></td>
  <td>&nbsp;<tt>+</tt></td>
  <td>&nbsp;<cite>th</cite></td>
</tr><tr>
  <td>&nbsp;<tt>&ugrave;</tt></td>
  <td>&nbsp;<cite>oo, u</cite></td>
  <td>&nbsp;<tt>s</tt></td>
  <td>&nbsp;<cite>s, (V)ss(V), c(i/e/y), ce(a/o/u)</cite></td>
</tr><tr>
  <td>&nbsp;<tt>@</tt></td>
  <td>&nbsp;<cite>V, a#</cite></td>
  <td>&nbsp;<tt>z</tt></td>
  <td>&nbsp;<cite>z, (V)s(V)</cite></td>
</tr><tr>
  <td colspan=2>&nbsp;
  <td>&nbsp;<tt>$</tt></td>
  <td>&nbsp;<cite>sh, ci(V), ti(V); <a href="#16">rule 16</a> situations: s, ss</cite></td>
</tr><tr>
  <td>&nbsp;<tt>&ocirc;w</tt></td>
  <td>&nbsp;<cite>ou, ow</cite></td>
  <td>&nbsp;<tt>$</tt></td>
  <td>&nbsp;<cite>s, zh</cite></td>
</tr><tr>
  <td>&nbsp;<tt>&ouml;y</tt></td>
  <td>&nbsp;<cite>oy, oi</cite></td>
  <td>&nbsp;<tt>&ccedil;</tt></td>
  <td>&nbsp;<cite>ch, (doubled) tch, t(u)</cite></td>
</tr><tr>
  <td>&nbsp;<tt></tt></td>
  <td>&nbsp;<cite></cite></td>
  <td>&nbsp;<tt>j</tt></td>
  <td>&nbsp;<cite>j, (doubled) dg, g(i/e/y), ge(a/o/u</cite></td>
</tr><tr>
  <td>&nbsp;<tt>y</tt></td>
  <td>&nbsp;<cite>y;</cite> <tt>yu</tt> <cite>can be u</cite></td>
  <td>&nbsp;<tt>r</tt></td>
  <td>&nbsp;<cite>r, #wr, rh</cite></td>
</tr><tr>
  <td>&nbsp;<tt>w</tt></td>
  <td>&nbsp;<cite>w, #wh, u(V)</cite></td>
  <td>&nbsp;<tt>;l</tt></td>
  <td>&nbsp;<cite>l</cite></td>
</tr><tr>
  <td colspan=2>&nbsp;
  <td>&nbsp;<tt>h</tt></td>
  <td>&nbsp;<cite>h</cite></td>
</tr><tr>
  <td>&nbsp;<tt>@r</tt></td>
  <td>&nbsp;<cite>Vr, re#</cite></td>
  <td colspan=2 rowspan=3>&nbsp;</td>
</tr><tr>
  <td>&nbsp;<tt>@n</tt></td>
  <td>&nbsp;<cite>Vn</cite></td>
</tr><tr>
  <td>&nbsp;<tt>@l</tt></td>
  <td>&nbsp;<cite>Vl, le#</cite></td>
</tr>
</table>



<h2>Spelling reform by regularization</h2>

<p>You could use the above table as the basis for a really
useful and minimal spelling reform. 

<p>For instance, here's Percy Bysshe Shelley's <i>Ozymandias </i>in
regularized spelling. To minimize the
barbarity, I exempt one- and two-letter words from reform.

<blockquote>
<cite>I met a traveller from an <u>anteke</u> land 
<u>hu sed</u>: <u>Tue</u> vast and trunkless legs of stone 
stand in the desert. Near them, on the sand, 
<u>haff </u>sunk, a shattered visage lies, 
<u>huse</u></cite> <cite>frown, and wrinkled lip, and sneer of cold <u>cummand 
</u>tell that its sculptor well those passions read, 
which yet remain, stamped on these lifeless things-- 
the hand that mocked them, and the <u>hart </u>that fed.
And on the <u>peddestal</u> these words are carved: 
'My name is <u>Ozzymandias</u>, king of kings!
Look on my works, ye mighty, and despair!' 
<u>Nuthing </u>beside remains. Round the decay 
of that colossal wreck, boundless and bare, 
the lone and <u>levvel</u> sands stretch far away.</cite>
</blockquote>

Or of course we could just hang it up and use 
<a href="yingzi/yingzi.htm">Chinese-style syllabograms</a> instead.



<h2>So how horrible is English spelling really?</h2>

<p>I doubt that this page will convince anyone that
English spelling is a <i>good </i>system.  There's too many oddities.


<ul>

<li>Vowel combinations are a mess-- often the best you can do
is give the two most likely sounds (<cite>realm, reap</cite>),
and even those will be overruled in the fairly frequent cases where two vowels
really adjoin (<cite>reality</cite>).

<li>There's too many quirky rules that derive from odd
sound changes. We may not be able to get away from the Romance <cite>c/g</cite> softening
or the Great Vowel Shift, but does our spelling need to preserve old forms of 
<cite>feign</cite> or <cite>walk</cite>?

<li>There was a period when busybodies did their best to
make English look like Latin. This was
bad enough when we distorted perfectly good French loans like 
<cite>dette</cite> into <cite>debt</cite>,
but we're also stuck with false etymologies like <cite>island</cite>
(in place of the older, and regular, <cite>iland</cite>).

<li>And the modern custom of borrowing instead of adapting
spellings, though nice for etymology, plays havoc with the orthography,
especially as we start to borrow from more exotic languages and forget where
they're from. I've heard well-meaning
idiots pronouncing a Russian <cite>z </cite>as <tt>ts</tt>, as if it were German; and
people like to pronounce words like <cite>Sarajevo </cite>as if they were Spanish. 
And why spell <cite>gyros</cite> as if it were classical instead of modern
Greek (inviting the pronunciation <tt>j&iuml;r&ouml;z</tt> in place of <tt>y&euml;r&ouml;s</tt>)?

<li>While we're at it, could we please fix the word 
<cite>ginkgo</cite>, which is not only difficult and
irregular, but doesn't reflect <b>any</b> proper Japanese word?
The Japanese characters (<font size=+1>&#x9280;&#x674f;</font>) can be read two ways:
as <i>icho:</i>, they refer to the tree; as <i>ginnan</i>, to the fruit.  
The second character can be read <i>kyo:</i> in other words, so someone misread the
combination as <i>ginkyo:</i>, and someone else mangled this into <i>ginkgo</i>.
</ul>

<p>What I hope to have shown, however, is that beneath all
the pitfalls, there's a rather clever and fairly regular mechanism at work, and
one which still gets the vast majority of words pretty much correct.
It's not to modern tastes, but by no means
as broken as people think.

<hr>

<center><a href="default.html"><img src="home.gif" alt="[ Home ]"></a></center>

</body>

</html>

Anon7 - 2021