|
Server : Apache/2.4.62 System : FreeBSD fbsdweb2.web.rcn.net 14.1-RELEASE FreeBSD 14.1-RELEASE releng/14.1-n267679-10e31f0946d8 GENERIC amd64 User : www ( 80) PHP Version : 8.3.8 Disable Function : NONE Directory : /domains/markrose/ |
Upload File : |
<HTML>
<HEAD>
<meta http-equiv="content-type" content="text-html; charset=utf-8">
<TITLE>ggg - Generative Grammar Gadget Help</TITLE>
<style>
h2
{color:#A06800;}
h3
{color:#C08700;}
h4
{color:#C08700;}
h5
{color:#C08700;}
h6
{color:#C08700;}
tt
{color:#A60000;
font-weight:bold;
font-family:"Gentium";}
</style>
</HEAD>
<BODY BGCOLOR="#FFFFFF" TEXT="#000000">
<table width="100%">
<tr><td bgcolor="#EEC25A">
<h2><br> <a href="kit.html"><img src="kit-gears.gif" border=0 align="absmiddle" height="53" width="60"></a> ggg help</h2></td></tr>
</td></tr>
</table>
<b>ggg</b> is a Javascript program that lets you construct a context-sensitive grammar.
<p>For a full explanation of what that means, see my upcoming <i>Syntax Construction Kit</i>, or your favorite book on transformational grammar or computer languages.
<p>I've included some sample rules to get you started. Try them out!</p>
<p>(SS among the samples stands for <i>Syntactic Structures</i>; these are the rules from Chomsky’s 1957 book.)
<p>A general <tt>warning</tt>: Writing rules is a form of programming, and it's very very easy to write rules that don't work as you expect them to. The debugging settings can be useful in finding out why that is. Read this document carefully to see some of the pitfalls.
<h2>Overall operation</h2>
The program always starts with the string S. It looks through the production rules to see what S can be expanded to. It keeps applying rules until no more rules can be applied.
<p>Basic rules look like this:
<blockquote><tt>S=A B</tt></blockquote>
S is called a <b>nonterminal</b> symbol, because it appears on the left hand side of a rule. This one means that S can be replaced with A B.
<blockquote><tt>A=A a|a</tt></blockquote>
Adding this rule means that A can be replaced by <tt>a</tt>, or by <tt>A a</tt>.
<p>It's a convention to use capitals for nonterminals and lowercase for <b>terminals</b>– symbols taht can't be replaced. However, <b>ggg</b> doesn't enforce this. If you are following the convention, <tt>a</tt> is a nonterminal and that part of the sentence is done. But the <tt>A</tt> will be replaced by another application of the same rule. (So, rules can be <b>recursive</b>.)
<h3>Production rules</h3>
Let's look at the syntax again:
<blockquote><tt>A=A a|a</tt></blockquote>
The <tt>=</tt> separates the category, ont he left, from its replacement, on the right. You can use <tt>→</tt> instead.
The <tt>|</tt> separates multiple possibilities— <b>ggg</b> will choose randomly from these. So this is short for saying
<blockquote><tt>A=A a<br/>A=a</tt></blockquote>
<h3>Space separated or not</h3>
When making toy grammars with letters, it’s most useful to write your rules without spaces. But once you're dealing with words, you want to separate the parts of a rule with spaces.
<p>Select <b>Single letters</b> for the first case, <b>Words</b> for the second.
<h3>Optional elements</h3>
If you check <b>Allow optional symbols with ()</b>, you can put optional elements in parentheses.
<p>E.g.
<blockquote><tt>A=(A)a(c)</tt></blockquote>
is short for
<blockquote><tt>A=a<br/>A=Aa<br/>A=ac<br/>A=Aac<br/></tt></blockquote>
<h3>Null symbol</h3>
It can be useful to have a null symbol which isn’t actually output. Use <tt>ø</tt> for this. E.g. in the SS rules I have
<blockquote><tt>Tense=past <br/>
Tense=VPL/NPS _ <br/>
Tense=ø/NPP _</tt></blockquote>
This means that <tt>Tense</tt> can be replaced with <tt>Past | VPL | ø</tt>. (The latter two options are conditional; see below.)
<p>This is one way you <b>delete</b> things. The <tt>ø</tt> won’t be output to the final string.
<p>But till it’s deleted, it can also be an input for other rules, which can be surprisingly useful.
<h3>Debugging help</h3>
Check <b>Show parsed rules</b> to have the program show what it thinks the rules are. As it expands <tt>|</tt> and transformations (wee below) into fuller representations, this can help you understand why a rule is working or not.
<p>Check <b>Show debugging output</b> to show a complete <b>derivation</b>. The program will indicate at each step what rules it thinks it can apply— the one actually selected will be boldfaced. Then it will show the output at that step. And so on till it can't apply any more.
<h2>Environments</h2>
So far the rules have been <b>context-free</b>. You can create <b>context-sensitive</b> rules by using <b>environments</b>. A simple example:
<blockquote><tt>B=p/A _</tt></blockquote>
The syntax may be familiar to you from the <a href="sca2.html">SCA²</a>. The meaning is similar: “B can turn into p if it occurs just after A.” The <tt>_</tt> in the environment is required and represents the replaced element (here, B). So this rule could apply to <tt>A B</tt> or <tt>a B</tt> or <tt>A A B C</tt> but not <tt>B A</tt>.
<h3><a name="Exist">Existence check</a></h3>
Sometimes it’s useful to check only if a particular element exists. You can do this with an environment with no <tt>_</tt>. You can only check one element, which can be a terminal or nonterminal. For instance:
<blockquote><tt>B=p/A</tt></blockquote>
This says to replace <tt>B</tt> with <tt>p</tt> only if the element <tt>A</tt> occurs anywhere in the string. So this rule might be applied to <tt>A B</tt> but not to <tt>B B</tt>.
<p>Note, this is an alternative, often simpler way to handle things like agreement.
<h2>Transformations</h2>
The above rule could also be stated
<blockquote><tt>A B=A p</tt></blockquote>
This does what it looks like: replace <tt>A B</tt> with <tt>A p</tt>. Transformations allow powerful rearrangements of the string and are generally required for handling natural languages.
Internally, <b>ggg</b> rewrites rules with environments as rules with transformations.
<h3>Wildcards</h3>
A transformation can include wildcards, indicated with <tt>*</tt>:
<blockquote><tt>a * a=q</tt></blockquote>
This means “replace a sequence <tt>a...a</tt>, where ... is anything at all, with <tt>q</tt>.”
You can also put <tt>*</tt> in the replacement string. In this case, whatever was found in the <tt>*</tt> location will be copied to the output string in the appropriate place.
<blockquote><tt>* VP=* VP *</tt></blockquote>
The above curious rule is used in the French sample rules. As you can see, the element before the verb is copied after it: e.g. <tt>3p VP</tt> becomes <tt>3p VP 3p</tt>. This is a cheap way of handling verb agreement: the first copy will eventually apply to the preceding NP; the second one will apply to the verb.
<h2>Rule order</h2>
For more complicated grammars, rule ordering becomes important.
<p>The basic rule is: separate groups of rules with <tt>+</tt>. <b>ggg</b> will keep applying rules in one group until it can't any more, then move on to the next group, and so on.
<p>For an example, see the <b>SS Verb Complex</b> sample rules.
<p>More complicated rulesets are defined with the following symbols, placed before the ruleset:
<blockquote><table>
<tr><td><tt>?</tt></td> <td>All rules in this set are optional </td></tr>
<tr><td><tt>1</tt></td> <td>Apply one of the rules in this set </td></tr>
<tr><td><tt>1?</tt></td> <td>This set is optional, and just one can apply </td></tr>
</table></blockquote>
Note that rules will keep applying if they can, which can make the program do things you don't expect. E.g., from the French rules:
<blockquote><tt>Fin VP=Fin VP Fin</tt></blockquote>
This must be marked <tt>1</tt>. Otherwise a sentence will produce <tt>NP Fin VP Fin</tt>, then <tt>NP Fin Fin VP Fin</tt>, then <tt>NP Fin Fin Fin VP</tt>, and so on, because the rule can apply to its own output.
<h2>Morphology</h2>
The basic idea from <i>Syntactic Structures</i> is to use transformations for morphology. E.g.
<blockquote><tt>ing read=reading</tt></blockquote>
This gets tedious if you have multiple forms to generate. So there are special rules marked with <tt>µ</tt> (for <i>‘morphology’</i>; just copy-and-paste the letter). E.g. in the SS rules we have, in part:
<blockquote><tt>
µ µ VPL past en ing ø <br/>
µ read reads read read reading <br/>
µ eat eats ate eaten eating
</tt></blockquote>
The first line is the key to the rest of the entries, and defines what affixes are supported. So the above lines tell <b>ggg</b> that the <tt>ing</tt> form of <tt>read</tt> is <tt>reading</tt>, the <tt>past</tt> form of <tt>eat</tt> is <tt>ate</tt>, and so on.
<p>You must write your rules such that the affix comes first.
<p>If a rule ends early, the word will be unchanged. This can be used for defective paradigms (e.g. <tt>can</tt> simply has no participles). I also use it to add an additional form for English <tt>be</tt>, to store the non-3s form of the present tense, which for every other verb defaults to the verb root.
<p>There's no way to have default forms, I’m afraid, nor multiple categories (e.g. verbs vs nouns).
<p>Note that morphological rules apply <b>at the end</b>, and therefore can't be corrected by further rules.
<h2>Some advice</h2>
Like the <a href="sca2.html">SCA²</a>, <b>ggg</b> is a simple but powerful tool, which can do much more than you might think. But you do have to think a little like a syntactician and a little like a programmer. Read the book once it comes out; in the meantime, study the examples closely to see some of the possible tricks.
<p>Start with <tt>S</tt> and give the most general rules first— e.g. those that define the basic shape of a sentence.
<p><b>ggg</b> operates only on strings, not trees. That is, it does not really know the derivation so far. Let's say you have a SOV language and want to have SVO as a transformation. And say you turn noun phrases into N + Det. You write:
<blockquote><tt>
S=NP NP V <br/>
NP=N Det <br/>
+ <br/>
NP V=V NP
</tt></blockquote>
This will <i>only</i> generate <tt>N Det N Det V</tt>. That’s because the last rule never finds any NPs to apply to— they were eliminated by the second rule.
Put general transformations early, while the things they apply to are still in the derivation. This will work:
<blockquote><tt>
S=NP NP V <br/>
NP V=V NP/NP _ <br/>
NP=N Det
</tt></blockquote>
<b>ggg</b> is intended to model syntax, not morphology. So you wouldn’t want to model Sanskrit with it. But its morphology operations should be sufficient for languages like English or French. Put the morphology bits near the end. If you use <tt>µ</tt> rules, make sure you generate affixes <i>before</i> the word, and list all the possible affixes in the first <tt>µ</tt> rule.
<p>If you need agreement, use some of the tricks above: copy an affix, or use the <a href="#Exist">existence check</a>.
<p>Don’t overdo the number of words or tenses you handle; that won’t improve your understanding of the syntax, it just causes you extra work.
<p>Finally... you may just want a tool that knows more about language. But, hey, I’ve got some
tools that meet the bill! Try out the <a href="gtg.html">Generative Tree Gadget<a>
or the <a href="mg.html">Minimalism gadget</a>.
<hr>
<center><A HREF="default.html"><img src="homeg.gif" border=0 alt="Home"></A></center>
</body>
</html>