KGRKJGETMRETU895U-589TY5MIGM5JGB5SDFESFREWTGR54TY
Server : Apache/2.4.62
System : FreeBSD fbsdweb2.web.rcn.net 14.1-RELEASE FreeBSD 14.1-RELEASE releng/14.1-n267679-10e31f0946d8 GENERIC amd64
User : www ( 80)
PHP Version : 8.3.8
Disable Function : NONE
Directory :  /domains/markrose/

Upload File :
current_dir [ Writeable ] document_root [ Writeable ]

 

Current File : /domains/markrose/genhelp.html
<HTML> 
<HEAD>
<TITLE>gen - language text generator - help</TITLE>

<style>
	h2
		{color:#A60000;}
	h3
		{color:#C08700;}
	h4
		{color:#C08700;}
	h5
		{color:#C08700;}
	h6
		{color:#C08700;}
	tt
		{color:#A60000;
		font-weight:bold;
		font-family:"Gentium";}
	
</style>

</HEAD> 

<BODY BGCOLOR="#FFFFFF" TEXT="#000000">

<table width="100%">
<tr><td bgcolor="#EEC25A">
<h2><br>&nbsp;&nbsp;<a href="kit.html"><img src="kit-gears.gif" border=0 align="absmiddle" height="53" width="60"></a>&nbsp;Gen Help</h2></td></tr>
</table>

<br>I&#x2019;ve long advocated either hand-crafting every word, or using the <a href="sounds.htm">Sound Change Applier</a> to derive families.  But inspiration does flag, and sometimes you want to use a vocabulary generator.

<p>The usual problem with these is that they make all the possibilities equiprobable, which is highly unnaturalistic.  So I&#x2019;ve created a generator called <a href="gen.html">gen</a>, which applies a cheap power law, so the first choice is chosen most often, and so on down, smoothly, to the last choice which get chosen the least. 

<h3>Example</h3>

<a href="gen.html">Go try it!</a>  With the default settings, you&#x2019;ll get a pseudo-text, like this:

<blockquote>
A gatri tu te ee kope. Eudrotri pli ki itupe ki ii. Obudrotia peke pi tea pi pi? Atoi pi ka iekribe eupi ape? Kle dru iplo ki gipotu i. Pi ke brikaibe ble do brou. Ta glikipro e teakatre piu u. Be kipe pa pa pi tepipliita. Tikiii topu epatripu i o po? Pe uuta dru opi gii ki. Ti pepate bi bi a e? I gia kitidu eproplu ple kitle. Kii pitre ko e iipoga a. E o popate ku kritra pi. Tu pe titepee dro kee ekiplu. Ti ti ki te gra a. Ia tle biitapo oi pri epoi? Ti opikli be po betle e. Igribliia tipi tloka ple ko plubla. Ge pita tidleki to pri ti. Ategoki e a plu topipi kiipe. Priklu kro ai tepeplea pu e. A tapa kite pubo ti du. Bipro begitebi kaaete gi tipo ko. E kipretopua pika glotro di bu. Pepe tebo iikepoplo i tru gi. Gike da e ipia tripi ia. Bi bikli pate dlite e dligu? I pididi kra pabaka e o. Ipoidipi a ti i ba geka! 
</blockquote>

Run it again for an entirely different text.  This output format is designed to simulate what your language might look like.


<h3>The controls</h3>

Try it with different settings.  Here&#x2019;s what they do.

<p><b>Output type</b> tells whether you want pseudo-text, or a table of a hundred words.  Pseudo-text is better for seeing what your language looks like, given the phonology and syllable types you&#x2019;ve defined.  Once you&#x2019;re happy with the look and feel of the language, the word list is better for actually generating vocabulary.

<p>The format <b>All possible syllables</b> will output a list of, well, all possible syllables. Note that this option ignores the Dropoff and Monosyllables controls: it is not random at all, and it shows only single syllables.

<p><b>Show syllables</b> will display a dot between syllables in the output. To <b>gen</b>, a syllable is whatever you put in "Syllable types"!

<p><b>Dropoff</b> determines how fast the power law declines.  If you have <tt>C=ptkbdg</tt>, then when outputing a C, normally <b>p</b> will come up the most, <b>t</b> a little less often, and so on, with <b>g</b> the least frequent.  If you select fast dropoff, the probabilities will stack even more in favor of <b>p</b> (i.e. the first choice).  If you select slow, the probabilities will distribute more evenly.  

<p>To turn off the power law entirely select <b>Equiprobable</b>; then <b>gen</b> will select the choices with equal frequency.  (Again, this is a bad choice for a naturalistic language.  But maybe you&#x2019;re doing an auxlang or something.)

<p>The Dropoff control doesn't affect the selection of syllable types.  However, you can choose a more even distribution by checking <b>Slow syllable dropoff</b>.

<p><b>Monosyllables</b> tells <b>gen</b> how much of the output should be monosyllabic.  You could set this to <b>Always</b> for an isolating language, for instance.  (Even isolating languages have compounds, so if you want to generate words or text, use <b>Mostly</b>.)

<p><b>Generate</b> generates a new text.

<p><b>Clear</b> erases the output.  (This isn&#x2019;t necessary but it&#x2019;s provided for neatness&#x2019; sake.)

<p><b>Help me!</b> brings up this help file.

<p><b>IPA</b> gives you a display of IPA symbols which you can cut and paste into any field.

<p><b>Defaults</b> cycles through some default parameters to help you get started or inspired.

<h3>The categories</h3>

These are your phonological classes, defined by enumeration.  The format is exactly the same as used by the <a href="sounds.htm">SCA</a>.  

<p>For instance, I might define my fricatives like this:

<blockquote>
<tt>F=fvsz&#x0161;&#x017e;</tt>
</blockquote>

That means that any time <b>gen</b> wants to output an F from the syllables list, it will randomly pick one of <b>f, v, s, z, &#x0161;, &#x017e;.</b>  

<p>As you can see, you can use <b>Unicode</b>!  The phonemes in a category have to be single characters, but we&#x2019;ll see how to output digraphs below.

<p>The key thing to grasp is that <b>the order determines the probability</b>.  The program runs through the phonemes in a category, with a 30% chance of stopping at each one.*  So the <tt>F</tt> definition above says that we want <b>f</b> to occur a lot and <b>&#x017e;</b> not that much.  

<p>* <font size="-1">That is, 30% for the recommended <b>Medium</b> dropoff.  It&#x2019;s 45% for <b>Fast</b> and 15% for <b>Slow</b>.  Also, for computation speed, if it gets to the end of the choices it starts over. </font> 

<p>The main corollary: <b>Put the sounds you like first!</b>  Don&#x2019;t list them in place of articulation order unless you really like labials.  Try varying the order and hitting <b>Generate</b> to see how changing the order changes the output.

<p>Don&#x2019;t overdo the classes&#8212; <b>gen</b> doesn&#x2019;t know any phonology, and will be perfectly happy with a single class C for all consonants.  You define a class for two reasons:

<ul><li>To control probabilities.  E.g. we usually want stops to occur more than fricatives.

<li>To enforce phonotactics.  E.g. if the only initial clusters you allow are stop + liquid, then you need classes for stops and liquids.
</ul>

<h3>The syllable types</h3>

The <b>Syllable types</b> field defines your phonotactics... your allowed syllable types.  E.g. the sample above is defined with these syllables:

<blockquote>
<tt>CV
<br>V
<br>CRV</tt>
</blockquote>


The syllable types also follow a power law, so <b>put the ones you like first</b>.  Or to be precise, if you want a particular type to be more common, move it up in the list.  

<p>Put just one syllable per line.  (Otherwise <b>gen</b> will just treat whatever you put on one line as a syllable type.) 

<p>In general, more complex types should occur further on.  However, I find that pure vowel syllables (like V in the example) should be less frequent than ones that begin with a consonant.

<p>The process <b>does not handle parentheses.</b>  So if you have a syllable type like <tt>(C(R))V(V)(N)</tt>, you must list the possibilities&#8212; in this case, <tt>V, VV, VN, VVN, CV, CVV, CVN, CVVN, CRV, CRVV, CRVN, CRVVN</tt>.  <i>This is a good thing!</i> ...because it allows you to set the relative probabilities of each syllable type.  (How do you decide on the order?  Trial and error works fine.  Change the order and hit Generate again.  Repeat till it looks good.)

<p>The symbols you use here (in the last example <tt>C V R N</tt>) should be <b>defined in the categories box</b>&#8212; they are your phonological classes.  

<p>So when <b>gen</b> needs to generate a syllable, it selects randomly from the syllable type&#8212; lets say it picks <tt>CRV</tt>.  Now it looks up <tt>C</tt> in the Categories box.  Suppose it finds the definition <tt>C=ptkbdg</tt>.  It randomly picks one of those choices.  Then it moves on to <tt>R</tt>, then <tt>V</tt>.  And so on. 

<p>If there are any <b>undefined symbols</b>, they will be passed through to the output.  E.g. you could add a syllable <tt>khV</tt> and <b>gen</b> will cheerfully generate <tt>khe, khi</tt>, etc. 


<h3>Rewrite rules</h3>

These allow you to apply global substitutions to the output.

The simplest form is to replace a single character:

<blockquote>
<tt>&#x03b8;|th</tt>
</blockquote>

That tells <b>gen</b> to replace every occurence of <tt>&#x03b8;</tt> in the output with <tt>th</tt>.

<p>Or you can handle combinations.  E.g. maybe <tt>ti</tt> always changes to <tt>&#x010d;i</tt>.  You'd write that as <tt>ti|&#x010d;i</tt>.

The facility is actually even more powerful than that, because the left-hand side is a <b>regular expression</b>.  So for instance you could change both <tt>br</tt> and <tt>bl</tt> to <tt>bj</tt> with the formula <tt>b[rl]|bj</tt>.

<p>Rules are applied <b>in order</b>.  Make sure they don&#x2019;t feed into each other when they shouldn&#x2019;t!  (See the <a href="#japan">Japanese example</a> for more on this.)

<p>For fancier changes (such as those that are sensitive to the following phonemes), use the <a href="sounds.htm">SCA</a>. 

<p>You don&#x2019;t have to have any rewrite rules at all, of course.  (The other inputs have to have something in them.)


<h3>Saving your work</h3>

I&#x2019;ve implemented <b>gen</b> in Javascript to make it immediately available to anyone with a browser.   If I used C, as with the SCA, it&#x2019;d have to be provided separately for Windows and Mac and wouldn&#x2019;t work on mobile devices anyway.  Plus, it turns out that non-programmers don&#x2019;t know how to use the command line window!  

<p>Unfortunately I can&#x2019;t directly read and write files, because Javascript is restricted from doing so.   (For very good reasons!  If web pages could write files, they could mess up your computer.)

<p><b>But you can!</b>  Just keep your categories and syllable types in a text file and paste them into <b>gen</b>.   And you can easily cut the output and put it wherever you want.  


<h3>Don&#x2019;t be cheap!</h3>

To avoid the pitfalls of cheap vocabulary generation:

<ul>
<li>Follow the usual rule of recording new words in the lexicon, so you don&#x2019;t re-use words.<p>

<li>Don&#x2019;t just copy the output and use every word in your lexicon.  Pick the words you like; you can hit <b>Generate</b> to get a new set.<p>

<li>Multisyllabic words are output mostly to simulate what  text would look like.  Avoid very long words as roots.<p>

<li>Always use derivational morphology or compounding when you can, rather than just grabbing words from <b>gen</b> !  E.g. for <i>religion, divinity, theology, sacrilege, priesthood,</i> don&#x2019;t just create each of these as roots, create etymologies.  <p>

<li>If you&#x2019;re getting ugly words&#8212; well, you probably have ugly phonotactics!  Move the sounds you like up within your classes, and put simpler CV syllable types earlier in the file.<p>

</ul>

<h3>Sample: Pseudo-Japanese</h3>

Want some pseudo-Japanese?  Sure you do!  Paste these inputs into the three input boxes:

<blockquote>
<table>
<tr><td>
<tt>C=tknsmrh
<br>V=aioeu
<br>U=auo&#x0101;&#x0113;&#x016b;
<br>L=&#x0101;&#x012b;&#x014d;&#x0113;&#x016b;</tt>
</td>
<td>&nbsp;&nbsp;</td>
<td><tt>hu|fu
<br>h&#x016b;|f&#x016b;
<br>si|shi
<br>s&#x012b;|sh&#x012b;
<br>sy|sh
<br>ti|chi
<br>t&#x012b;|ch&#x012b;
<br>ty|ch
<br>tu|tsu
<br>t&#x016b;|ts&#x016b;
<br>qk|kk
<br>qp|pp
<br>qt|tt
<br>q[^ptk]|
</tt></td>
<td>&nbsp;&nbsp;</td>
<td><tt>CV
<br>CVn
<br>CL
<br>CLn
<br>CyU
<br>CyUn
<br>Vn
<br>Ln
<br>CVq
<br>CLq
<br>yU
<br>yUn
<br>wa
<br>L
<br>V</tt>
</tr>
</table>
</blockquote>

As you can see, the rewrite rules were essential in simulating the allophonic rules of Japanese.  Some complications there:

<ul>
<li>I was getting weird output like <tt>cfu</tt> till I realized that the rule <tt>ty|ch</tt> was feeding into  <tt>hu|fu</tt>.  This was solved by moving the latter rules up so they get executed first.<p>

<li>The <tt>q</tt> phoneme is a slightly kludgy way of getting the long consonants, as in <i>futte</i>.  Note that the rewrite rules set the correct long consonants for <tt>p t k</tt>; then the rule <tt>q[^ptk]|</tt> simply removes any other <tt>q</tt>&#x2019;s.
The <tt>^</tt> means &#x201c;match anything <i>except</i> these letters&#x201d;, and the absence of anything after the <tt>|</tt> means that anything matching the regular expression will be deleted.
<p>

<li>I didn&#x2019;t include the voiced consonants... maybe you can try adding them!
</ul>

<hr>

<center><A HREF="default.html"><img src="homeg.gif" border=0 alt="Home"></A></center>


</body>
</html>

Anon7 - 2021