Using FSA |
On bulba type: % fsa -tkA GUI comes up. There are two lines you can type into, labeled "Regex" and "String" "Regex" accepts a regular expression that lets you define an FSA or FST by hand. "String" accepts a string to be submitted to the current FSA by clicking on "Submit". |
||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
FSA Manual |
|||||||||||||||||||||||||||||||||||||
Example Regexp |
On the Regexp line type: {[a,b,c]}Your first FSA! Notice the color-coding on the states. What does it mean? Note that you MUST use the square brackets for concatentation (a Prolog "list"). Note that each element of the list is assumed to be an element of your "alphabet": {[aa,ab,cc]}Notice the machine has 'aa','ab' and 'cc' labeling transitions.. Probably not what you intended. No normal string will be recognized. Try 'aaabcc'. Now try this as a short cut for spelling our a long sequence of transitions: word(aabbcc)Equivalent to: [a,a,b,b,c,c] |
Saving an FSA |
Click on File>Save as Postscript. Save the file in your home directory, choosing a name with ".ps" extension (like ("abc.ps"). Your home printer will print a postscript file if you know how to send it to the printer directly from the printer manager. NOTE: Before saving an fsa you can drag STATES to better looking locations with your mouse. Pulling states further apart often makes a diagram more readable. NOTE: Read the manual for OTHER formats you can save fsa's in. |
||||||||||||||||||||||||||||||||||
Viewing Postscript Files |
Bring up an xterm window. Connect to your home directory if necessary. Type gv abc.ps &"gv" is a program that views postscript files. "abc.ps" is the assumed name of the FSA you saved in the previous step. "&" backgrounds the job so that while viewing the file you still have the smae xterm window to talk to. |
||||||||||||||||||||||||||||||||||||
More Regexps |
Back in the FSA window, type: {[a,b,c],[c,d,e]} Now try: {[a,b,c]*,[c,d,e]}What does the color coding on State 0 mean? Try: {[a,b,c]+,[c,d,e]} |
||||||||||||||||||||||||||||||||||||
Some languages to try: |
Try inputting thse (expressed in the textbook's Reg exp notation) into FSA:
|
||||||||||||||||||||||||||||||||||||
Submitting strings |
Start out with the FSA for reg exp (4) from the preceding step. In the string window type: bbbbab Now check out the xterm Window you started "fsa" from. You're looking at this: ?-"?-" is a Prolog prompt. "yes" is the Prolog representation of success. Test. Back in the fsa window, type: bba |
||||||||||||||||||||||||||||||||||||
Inputting FSTs |
Try: [(a:b)*,(b:a)*]Now type the following in the string window: aaa (check output) Try this short cut for making transducers: [b,a,a,!]xx[1,2,2,!] Now try the following very SIMPLE FST (in the REegexp window): '4':fourThis transduces the digit '4' into the word 'four'. Notice the digit has to be in single quotes.. Now try to write a compact regexp that does the following transductions:
|
||||||||||||||||||||||||||||||||||||
Karttunen Example |
Lauri Karttunen is a Finnish linguist who has done much work on morphology and FSTs. There is an example involving assimilation in Karttunen 91. See especially p. 4 where the discussion of kaNpan -> kampan -> kamman begins. Note that several rules are discussed:
(b) N -> n (  kaNtan -> kanpan ) (c) p -> m \ m _ (  kampan -> kamman ) Examples:
Now consider another order:
(a) N -> m \ _ p (  kaNpan -> kampan ) (b) N -> n (  kaNtan -> kanpan ) Here's what happens
The change
Note, for some background on the analysis of Canadian raising and flapping, see Canadian Raising, Opacity, and Rephonemicization, by William J. Idsardi, which reviews the Chomsky, Halle, Joos controversy surrounding these facts, and presents some alternative interpretations. |
||||||||||||||||||||||||||||||||||||
Malouf's Twolevel Rule implementation |
Here is Rob's version of rule (a). for the fsa toolkit. Note: macro(nm, comp(pairs, 'N':m, [], p: ? ))gives the rule the name "nm", says the feasible pairs of the machine are defined by the regex "pairs", and say 'N' is realized as 'm' when the left context is anything, and the right context on the UPPER (underlying) tape is 'p'. The regex pairs is a set of pairs from the upper and lower language, intituitvely, all possible pairs. It is defined and named in the following line: macro(pairs,{a..z,'N':m,'N':n,p:m}).Note: a..zis short for a:a,b:b,c:c,.... z:zThat is among the feasible pairs of our language: any lower case letter can be realized as itself. Furthermore, 'N' must be realize either as 'n' or 'm', and 'p' may be realized as 'm'. 'N':m 'N':n p:mThe single quotes will be required for any CAPITAL letters; this is an artifact of Prolog syntax. A key point to watch is 'p'. Notice that our language is defined so as to allow a 'p' to surface either as a 'p' or an 'm'. The set of feasible pairs must be defined to include all possible "rewrites". It is the job of the rules to make sure the reqrites happen only in the appropriate contexts. Here is Rob's version of both rules. How to load a file with both rules into the toolkit:
|
||||||||||||||||||||||||||||||||||||
Assignment: Problem 2 |
Answer the questions in boldface beflow. Turn in your answers. Also turn in a revised version of the file jurafsky.pl as described below. You can bring up a two level transducer version of Jurafsky's e-insertion rule einsertion transducer in fsa by doing the following:
|
||||||||||||||||||||||||||||||||||||
Problem 3 Flap Rule |
Flap Problem | ||||||||||||||||||||||||||||||||||||
What to hand in |
|