|
Server : Apache/2.4.62 System : FreeBSD fbsdweb2.web.rcn.net 14.1-RELEASE FreeBSD 14.1-RELEASE releng/14.1-n267679-10e31f0946d8 GENERIC amd64 User : www ( 80) PHP Version : 8.3.8 Disable Function : NONE Directory : /domains/markrose/ |
Upload File : |
<HTML>
<HEAD>
<meta http-equiv="content-type" content="text-html; charset=utf-8">
<TITLE>ggg - Generative Tree Gadget Help</TITLE>
<style>
h2
{color:#A06800;}
h3
{color:#C08700;}
h4
{color:#C08700;}
h5
{color:#C08700;}
h6
{color:#C08700;}
tt
{color:#A60000;
font-weight:bold;
font-family:"Gentium";}
</style>
</HEAD>
<BODY BGCOLOR="#FFFFFF" TEXT="#000000">
<table width="100%">
<tr><td bgcolor="#EEC25A">
<h2><br> <a href="kit.html"><img src="liltree.gif" border=0 align="absmiddle" height="53" width="60"></a> gtg help</h2></td></tr>
</td></tr>
</table>
<p><b>gtg</b> is a Javascript program that lets you construct a context-sensitive grammar which knows about tree structure. Compare <a href="ggg.html"><b>ggg</b><a> which only knows about strings.
<blockquote><span style="color:red;"><b>Warning:</b> This program is <b>in progress</b> and apt to change at any time.
</span></blockquote>
<p>For a full explanation of what that means, see my upcoming <i>Syntax Construction Kit</i>, or your favorite book on transformational grammar or computer languages.
<p>Test it out! Press <b>Generate</b> to generate a sentence. It will be shown with its deep structure and surface structure, plus some cryptic notes on what transformations applied.
<p>To see how to modify or extend the rules, or create them for a language of your own, read on.
<p><i>—Mark Rosenfelder, August 2018</i>
<h2>Overview</h2>
The <b>lexicon</b> is defined in the Lexicon text area. An entry looks like this:
<blockquote><tt>
woman:N:f:p women
</tt></blockquote>
This means that <tt>woman</tt> is a Noun (N), requiring feminine (f) pronoun agreement, and it has a plural (p) <tt>women</tt>.
Next there are <b>Phrase structure</b> rules. These look like:
<blockquote><tt>
S=Comp TP <br/>
TP=NP T VP
</tt></blockquote>
We always start with <tt>S</tt> (for Sentence). The first rule means that <tt>S</tt> explands to <tt>Comp TP</tt>. But <b>gtg</b> remembers the tree structure; it represents this step as
<blockquote><tt>
<span style="color:blue;">[S Comp TP ]</span>
</tt></blockquote>
The initial bracket is labeled with the constituent type. The second rule expands <tt>TP</tt> to <tt>NP T VP</tt>, which produces
<blockquote><tt>
<span style="color:blue;">[S Comp <span style="color:green;">[TP NP T VP ]</span> ]</span>
</tt></blockquote>
Yes, these get hard to read, so I've added color-coding. I invite you to think of these as tree structures. $$$
<p>The phrase structure rules define a <b>deep structure</b>, which should not be too complex. The aim is to define rules that end up in <b>nonterminal</b> nodes like <tt>N V Pron A</tt>— nodes that aren't expanded further.
<p>Instead, these nodes are <b>lexicalized</b>. E.g. an <tt>N</tt> will be randomly replaced by one of the nouns from the Lexicon, such as <tt>woman</tt>.
<p>The sentence can then be modified by the <b>Transformations</b>, which are rules which modify part of the tree structure. These handle both traditional transformations and morphological rules.
<p>These are written in a very simple programming language. For instance:
<blockquote><tt>
* passive inflection <br/>
<b>if</b> Pass <b>find next</b> V <b>inflect</b> en
</tt></blockquote>
I've highlighted the commands. So this means:
<ol>
<li>Look for a category <tt>Pass</tt>. If it's not found, this rule doesn't apply.
<li>Find the next <tt>V</tt> node after the <tt>Pass</tt> node.
<li>Inflect that <tt>V</tt> with the affix <tt>en</tt>.
</ol>
<h2>More about the Lexicon </h2>
The general format is
<blockquote><tt>
word:category:features:inflection
</tt></blockquote>
<h3>Word</h3>
The <b>word</b> is the basic form of the word; this may be modified when the word is inflected.
<h3>Category</h3>
<p>Here are the <b>categories</b> I've used:
<blockquote><table>
<tr><td><tt>Comp</tt></td> <td>complementizer</td></tr>
<tr><td><tt>D</tt></td> <td>determiner</td></tr>
<tr><td><tt>N</tt></td> <td>noun</td></tr>
<tr><td><tt>T</tt></td> <td>tense</td></tr>
<tr><td><tt>V</tt></td> <td>verb</td></tr>
<tr><td><tt>P</tt></td> <td>preposition, postposition</td></tr>
<tr><td><tt>A</tt></td> <td>adjective</td></tr>
<tr><td><tt>Pron</tt></td> <td>pronoun</td></tr>
</table></blockquote>
<p>
<p>You can define your own categories, so long as you consistently use them in the phrase structure rules and the transformations. To handle the English verbal system, I also defined <tt>Modal Perf Prog Pass</tt>.
<p>You should stick with <tt>S N V Pron NP</tt> because the program knows about these.
<h3>Feature</h3>
The <b>feature</b> slot is used in two different ways.
For <tt>V</tt> <b>(verbs)</b>, it defines the <b>argument structure</b> or <b>valence</b>: what can follow the verb in the <tt>VP</tt>. Examples:
<ul>
<li>(nothing) — intransitive verbs like <i>care</i>
<li>NP — transitive verbs like <i>love</i>
<li>NP NP — ditransitive verbs like <i>give</i>
<li>NP PP — transitive verbs that require a goal, like <i>put</i>
<li>PP — intransitive verbs that require a goal, like <i>sit</i>
</ul>
These features are straight out of generative grammar tradition— Chomsky defined features like [+NP NP]— but they are also checked directly by the lexicalization process: selecting verbs, verbs are only chosen if they match the arguments in the generated sentence.
<p>You can't define a parameter as optional, but you can include multiple entries for the same verb if you like. E.g. <i>break</i> could have entries for transitive and intransitive.
<p>For <b>nouns, pronouns, and determiners</b>, the lexicon gives the word’s gender, e.g. <tt>m</tt> or <tt>f</tt>. This is referred to for forcing agreement.
<p>Also, when lexicalizing nouns and pronouns, we randomly choose a number. I leave singular unmarked, and add <tt>p</tt> for plural. This actually modifies the lexicon entry within the sentence. E.g., if we lexicalize a noun as
<blockquote><tt>
woman:N:f:p women
</tt></blockquote>
and make it plural, the word will go into the sentence as
<blockquote><tt>
women:N:f p:p women
</tt></blockquote>
That is, the features are really a space-separated list, and can be added to in the course of a derivation.
<p>(Note that the ‘word’ portion of the entry was also modified.)
<h4>Inflections</h4>
These are conceptually a list of binary pairs. Let's look at one of the <b>verbs</b>:
<blockquote><tt>
give:V:NP NP:pres gives past gave pastp gave ing giving en given
</tt></blockquote>
This is just a collapsed form of
<blockquote><table>
<tr><td><tt>pres</tt></td> <td>gives</td></tr>
<tr><td><tt>past</tt></td> <td>gave</td></tr>
<tr><td><tt>pastp</tt></td> <td>gave</td></tr>
<tr><td><tt>ing</tt></td> <td>giving</td></tr>
<tr><td><tt>en</tt></td> <td>given</td></tr>
</table></blockquote>
This is obviously defined for not-very-inflected languages like English. It will become somewhat tedious for (say) French, and probably impossible for Sanskrit. Hey, it's a syntax app, not a morphology app.
<p>The <b>Inflect</b> command is given a word and an affix. It’s very simple-minded: it looks in this table, finds the affix, and replaces the word with the form found.
<p>If if doesn’t find the affix, it does nothing. You can use this to save forms when a particular affix should just give the base form. E.g. the affix <tt>presp</tt> (present plural) isn't in the table, so the inflected form is just <tt>give</tt>. (Note that <tt>be</tt> does have a defined <tt>presp</tt> form.)
<p>If you look at my lexicon, <tt>pres past</tt> come from the lexicon— they are assigned to category <tt>T</tt> ‘tense’. But when actually inflecting verbs, we <b>concatenate the tense and number</b>. The number comes from the subjects of the sentence, and as noted, it's blank or <tt>p</tt>. So to actually inflect the appropriate verb, we append tense + number, forming one of <tt>pres past presp pastp</tt>.
<p>After all that, <b>nouns</b>, pronouns, and determiners will be relatively simple: they just give the plural form.
<p>Well, <b>pronouns</b> have to give case forms as well. E.g.:
<blockquote><tt>
she:Pron:f:acc her p they pacc them
</tt></blockquote>
Again, the affixes are actually <b>number + case</b>, with nominative being blank. So this lexicon entry
allows the pronoun <tt>she</tt> to be properly modified for both number and case.
<h2>Phrase struture rules</h2>
These are the simplest part of the grammar! I've already explained the basics, but there are also some nice features to simplify your rules.
<blockquote><tt>
VP=Modal VP=?
</tt></blockquote>
The <tt>=?</tt> means that the whole <b>rule is optional</b>. It will be chosen randomly about 1/3 of the time.
<blockquote><tt>
NP=D ?A N
</tt></blockquote>
The <tt>?</tt> before a category means that the <b>word is optional</b>. About 1/3 of the time we'll include it, otherwise not.
<blockquote><tt>
NP=D ?A N|Pron
</tt></blockquote>
The <tt>|</tt> gives alternative expansions. This rule means that <tt>NP</tt> can be expanded eitehr to <tt>D ?A N</tt> or to <tt>Pron</tt>. The first alternative is favored, so put your commonest expansion there.
<blockquote><tt>
VP=V (NP)
</tt></blockquote>
An element in parentheses <tt>( )</tt> will be included zero to two times. That is, this rule is short for <tt>VP=V|V NP|V NP NP</tt>.
<p>Rules are <b>executed in order</b>, without going back. So you should not define rules that create things you've already handled. E.g. this won’t work:
<blockquote><tt>
VP=V NP PP <br/>
PP=P NP <br/>
NP=D N|NP PP
</tt></blockquote>
That’s because if we generate <tt>NP PP</tt>, the final <tt>PP</tt> won’t ever be expanded.
<p>(I may revisit this later, but for now it keeps things simple.)
<p>Important exception: phrase structure rules can <b>embed</b> a subclause. E.g. if you have the rule
<blockquote><tt>
VP=V S
</tt></blockquote>
then <tt>S</tt> will be replaced by an entire sentence. Transformations can test whether they are in a main or an embedded clause using the commands <tt>main</tt> and <tt>sub</tt>.
<p>Something of a kludge: use <tt>Sinf</tt> to embed a non-finite subclause, e.g. <i>(I want) him to go</i>.
<h2>Transformations</h2>
This is the area which departs most firmly from both my other syntax toys, and from generative grammar tradition.
<p>I didn't intend to! At first I was defining rules like this:
<blockquote><tt>
T:+:Neg≠Aux:^do
</tt></blockquote>
But I soon realized that the rules were a) unreadable, b) extremely ad hoc, and c) not generalizable to other languages.
<p>I also found that it's <i>really</i> hard to keep grammatical knowledge out of the app itself. E.g. the above rule was designed to add <tt>do</tt> to the <tt>T</tt> node if there was a negative in the sentence <i>or</i> no auxiliary. But this amounted to writing the English verbal rule in the code rather than writing a rule for it. So I kept breaking down the tasks into pieces that were actually generalizable.
<p>So I ended up with a <b>miniature programming language</b>. Here are the pieces.
<p><b>Comments</b> begin with <tt>*</tt>. Note that these are written to the output, so you can see what happened.
<p>Most rules refer to either the <b>clipboard</b>, or the <b>last item found</b>. You can think of the rules as mini-programs that have two variables or accumulators they can work with.
The commands are the following:
<h3>maybe</h3>
Marks an optional rule. 2/3 of the time, the rule will be skipped.
<p>Why a fixed probability? Because you want to want to actually see your optional rules apply, so you want something predictable. You do not actually need rules that apply, say, 23.56% of the time. (However, you can get 1/9 by including <tt>maybe</tt> twice!)
<p>If you are testing rules, start without this command so you can make sure it works.
<h3>checkif</h3>
Check for a sequence of categories, like <tt>D N</tt>. You can search for entire subtrees by looking for e.g. <tt>[NP</tt>.
<p>This is the only multi-line rule. The idea is to use <tt>yes</tt> on the rule(s) for where the structure is present, <tt>no</tt> for the rule(s) where it is not present.
<h3>yes</h3>
If the previous <tt>checkif</tt> command found its target, execute the rest of the rule; otherwise skip it.
<h3>no</h3>
If the previous <tt>checkif</tt> command couldn’t find its target, execute the rest of the rule; otherwise skip it.
<h3>start</h3>
This can be used or a loop <b>within a rule</b>. It marks the beginning of the loop.
<h3>loop</h3>
Return to the <tt>start</tt> location. To prevent infinite loops, the loop can only execute 10 times.
<p>These are intended to work with <tt>find next</tt> so you can do something to each element found.
<h3>if</h3>
See if an element exists. If it doesn't, skip the rest of the rule. E.g. <tt>if Perf</tt> tests if there is an element of category <tt>Perf</tt>.
<p>To search for a word, use <tt>if word</tt>. E.g. <tt>if word who</tt> tests if the word <tt>who</tt> exists in the sentence.
<p>The command finds only the <b>first</b> such item in the sentence. It will set the last item found indicator.
<p>You can also write e.g. <tt>if no Perf</tt> or <tt>if no word who</tt>. These will execute the rest of the word only if the sought item <i>doesn't</i> exist.
<p>You can concatenate multiple possibilities with <tt>|</tt>. E.g. <tt>if Perf|Prog|V</tt> will find any of the categories <tt>Perf Prog V</tt>.
<h3>find</h3>
Search for a particular element, and make it the last item found. You can use <tt>find next</tt> to find the next element from wherever we are, or <tt>find previous</tt> to go backwards.
<p>Again, you can concatenate multiple possibilities with <tt>|</tt>.
<p>If no such item is found, we skip the rest of the rule.
<p>A typical example of using both of these commands:
<blockquote><tt>
if Pass find next V inflect en
</tt></blockquote>
Though the two commands do similar things, the expected semantics is to use <tt>find</tt> where you actually do expect to find something. The above rule can be read “If there’s a <tt>Pass</tt> element, inflect the next verb with <tt>en</tt>.”
<h3>object</h3>
Find the verb’s object, if any. It becomes the last item found.
<p>(Presently this is implemented as finding the last <tt>[NP ]</tt> that follows a <tt>VP</tt>.)
<h3>subject</h3>
Find the verb’s subject, if any. It becomes the last item found.
<p>(Presently this is implemented as finding the first <tt>[NP ]</tt>.)
<h3>ifword</h3>
Check if the word associated with the last item found is the given string. E.g. <tt>ifword who</tt> tests if it's <tt>who</tt>.
<h3>inflect</h3>
Replace the last item found with an inflected form, using the given affix. E.g. <tt>inflect ing</tt> selects the <tt>ing</tt> form.
<p>You can write <tt>inflect clip</tt> to use the contents of the clipboard as the affix. This works well with <tt>cuttense numclip</tt>.
<h3>transform</h3>
The full sequence is <tt>inflect X into Y end</tt>. You can leave out <tt>end</tt> if it ends the rule. Both X and Y can be sequences of categories, including brackets.
<p>An example:
<blockquote><tt>
transform [NP [NP into 1 [PP »to 0 ]
</tt></blockquote>
This rule looks for two NPs. (Nothing can come between them.) You must include the brackets, because that's what’s in the data structure.
<p>This is part of a dative movement rule; e.g. it is used to transform <i>The man gave the woman the book</i> into <i>The man gave the book to the woman</i>.
<p>Note that we can introduce a whole subtree, but we need to use the same format used by the program. The <tt>»</tt> is (sorry!) a special character which tells us to grab the item from the lexicon, in this case the preposition <tt>to</tt>. You can also write (say) <tt>»to:Y</tt> if you want the category for the inserted word to be <tt>Y</tt> instead of whatever’s in the lexicon.
<p>The numbers represent the items found, in order. In the sample sentence, <tt>0</tt> is <i>the woman</i>, <tt>1</tt> is <i>the book</i>. This is actually how transformations were written in the 1960s; but I use this notation because it allows multiple items of the same category in a straightforward way.
<h3>copy</h3>
Take the last item found, and copy the word to the clipboard.
<p>You can write <tt>copy num</tt> to copy just its number (<tt>p</tt> or blank).
<h3>cut</h3>
Same as <tt>copy</tt>, but zero out the last item found— replace it with <tt>ø</tt>, which will remain in the structure but not be “pronounced”– it will not appear in the final string.
<h3>paste</h3>
Replace the last item found with the word from the clipboard.
<p>If the existing item was not blank, its bare word is appended to the lexicon entry. This is useful for the Tense node: e.g. <tt>past</tt> becomes <tt>past+have</tt>.
<p>If the clipboard contains just the number <tt>p</tt>, we inflect the word for plural instead.
<h3>lex</h3>
Look up a word in the lexicon and add it to the clipboard. E.g. <tt>lex do</tt>. You'll want to use one of <tt>paste insert append prepend</tt> to put it into the sentence.
<h3>cuttense</h3>
Copy the tense from the last item found to the clipboard.
<p>If that item contains <tt>+</tt>, the tense is the first part of that concatenation— e.g. for <tt>past+have</tt> it is <tt>past</tt>.
<p>If there’s no <tt>+</tt>, we assume the tense is just the word itself.
<h3>numclip</h3>
Kludgy command that adds the number of the subject to the current contents of the clipboard.
<h3>insert</h3>
Insert the clipboard as a new word after the last item found.
<h3>append</h3>
Insert the clipboard as a new word at the end of the sentence.
<h3>prepend</h3>
Insert the clipboard as a new word at the beginning of the sentence.
<p>(Really, just after the initial <tt>[S</tt>.)
<h3>flip</h3>
Flip the contents of the last item found, and the clipboard.
<p>(That is, the item where the clipboard came from.)
<h3>main</h3>
Skip the rest of the rule if we’re not doing a main clause.
<h3>sub</h3>
Skip the rest of the rule if we’re not doing an embedded clause.
<h3>define</h3>
This defines a special word that expands to several words in all remaining rules. E.g.
<blockquote><tt>
define Aux=Modal|Perf|Prog|Pass
</tt></blockquote>
This allows commands like <tt>find Aux</tt> which will find any of the categories listed.
<p>You can define other abbreviations (e.g. you could define one for <tt>if word inf</tt>), but I can’t say I’ve tested that.
<h2>Annotated rules</h2>
In this section I explain the rules I used for English. This should give an idea how the transformations work.
<p>I'll warn you ahead of time: you have to think like a programmer to get these right. That means:
<ul>
<li>Break things into tiny little pieces. Programs dont understand abstrat commands like “make the verb agree with the subject.” You have to find the subject, find what controls the agreement, find the right verb, inflect it.
<li>Handle all the exceptions. This makes a program, which this list of transformations really is, hard to read, because fiddly little rules about embedding are just as prominent as things that apply all the time, like verbal inflection.
<li>Test and fix. Almost none of these rules are as I originally wrote them. You have to test, diagnose the problems, then fix them.
</ul>
The advantage of writing programs like these, though, is that they keep you honest. It’s easy to write rules that look good but don’t work. Running <b>gtg</b> multiple times will generate a wide range of test cases and help find problems.
<blockquote><tt>
define Aux=Modal|Perf|Prog|Pass</tt></blockquote>
Define <tt>Aux</tt> as a shorthand for “any auxiliary”.
<blockquote><tt>
* suppress questions in subclauses <br/>
sub lex ø if word Q paste
</tt></blockquote>
“In subclauses, change a Q to ø.”
<p>This suppresses statements like “Did she want do the boys love her?”
<blockquote><tt>
* plural form of determiners <br/>
start find next N copy num find previous D paste find next N loop
</tt></blockquote>
“Get each N, copy its number. Go back to its D if any and make it agree.”
<p>This uses a loop since it applies to each N in the sentence. The effect is to make sure we have “this dog, these dogs”.
<blockquote><tt>
* negative <br/>
maybe if Aux lex not insert Neg <br/>
maybe if no Aux find T lex not insert Neg
</tt></blockquote>
These are optional, thus the <tt>maybe</tt>.
<p>Insert <i>not</i> after the auxiliary if any, otherwise after <tt>T</tt>.
<blockquote><tt>
checkif V [NP [NP <br/>
maybe yes transform V [NP [NP into »be:Pass [VP 0 1 [PP »by 2 ] ] end subject copy object flip <br/>
maybe no transform V [NP into »be:Pass [VP 0 [PP »by 1 ] ] end subject copy object flip
</tt></blockquote>
These are optional too. In addition, we have to handle ditransitives differently.
<p>We insert an entire VP, exactly as <tt>Perf Prog</tt> are handled.
<p>The last NP is transformed into a PP.
<p>Then we flip subject and object.
<blockquote><tt>
* wh-question <br/>
main object ifword who cut prepend transform Comp into »Q <br/>
main subject ifword who transform Comp into »ø
</tt></blockquote>
These rules operate only in the main clause, and only if there’s an argument <tt>who</tt>.
<p>A questioned object must be preposed; then we force the complementizer to be <tt>Q</tt> so question inversion is triggered later on.
<p>A questioned subejct is left where it is, but we suppress any <tt>Q</tt> that may have been generated.
<blockquote><tt>
* dative movement
maybe transform [NP [NP into 1 [PP »to 0 ]
</tt></blockquote>
This is just a rearrangement: sometimes make the indirect object a PP.
<blockquote><tt>
* question formation <br/>
if word Q transform [NP T into 1 0 <br/>
if word Q find Aux cut find T paste
</tt></blockquote>
<tt>Q</tt> was generated to mark questions. That triggers these rules. We invert the subject and the T node; in addition, if there’s an auxiliary we copy it to the T node.
<blockquote><tt>
* object forms of pronouns <br/>
find V start find next Pron inflect acc loop <br/>
if word inf find V find last Pron inflect acc
</tt></blockquote>
Another loop: change every Pron after the first V into its accusative form. Fortunately English has no dative forms to worry about.
<p>In an infinitive clause we do this to the subject to. That is, we want <i>I want him to go</i>, not <i>I want he to go.</i>.
<blockquote><tt>
* passive inflection <br/>
if Pass find next V inflect en <br/>
* progressive <br/>
if Prog find next V|Pass inflect ing <br/>
* perfect <br/>
if Perf find next V|Pass|Prog inflect en
</tt></blockquote>
The auxiliaries all work the same way: if a particular auxiliary exists, find the next auxiliary or verb and inflect it in a particular way.
<blockquote><tt>
* Infinitive S: suppress modal, add to <br/>
if word inf find Modal cut <br/>
if word inf find T lex to insert
</tt></blockquote>
<tt>inf</tt> is placed in the <tt>Comp</tt> node to mark infinitival subclauses.
<p>Infinitival clauses can’t have modals: <i>*I want him to must go</i>. We do this, cheaply, by just blanking out the modal.
<p>Then, we insert <tt>to</tt> after the <tt>T</tt> node.
<blockquote><tt>
* Do support <br/>
if no Aux lex do if Neg find T paste <br/>
if no Aux lex do if no Neg if word Q find T paste
</tt></blockquote>
These rules add <tt>do</tt> when it’s needed. This is done by overwriting the <tt>T</tt> node. (Or rather adding to it: <tt>pres</tt> becomes <tt>pres+do</tt>. This will be corrected below.)
<p>We need Do-support when there’s no auxiliary, and when we have either a negative or a yes-no question.
<blockquote><tt>
* verbal inflection <br/>
if no word inf if T cuttense numclip find V|Aux inflect clip
</tt></blockquote>
This sets both tense and number on the verb. The tense always lives in the <tt>T</tt> node. At this point something like <tt>pres+do</tt> will be changed back to <tt>do</tt>, because we’re cutting the tense not copying it.
<p><tt>numclip</tt> is frankly kludgy— it gets the number from the subject. But it’s less kludgy than my first attempt, which was to do it without an explicit rule.
<hr>
<center><A HREF="default.html"><img src="homeg.gif" border=0 alt="Home"></A></center>
</body>
</html>