Rule Ordering

    Using FSA

    On bulba type:

    % fsa -tk
    
    A GUI comes up.

    There are two lines you can type into, labeled "Regex" and "String"

    "Regex" accepts a regular expression that lets you define an FSA or FST by hand.

    "String" accepts a string to be submitted to the current FSA by clicking on "Submit".

    FSA
    Manual

    Manual

    Example
    Regexp

    On the Regexp line type:

    {[a,b,c]}
    
    Your first FSA!

    Notice the color-coding on the states. What does it mean?

    Note that you MUST use the square brackets for concatentation (a Prolog "list"). Note that each element of the list is assumed to be an element of your "alphabet":

    {[aa,ab,cc]}
    
    Notice the machine has 'aa','ab' and 'cc' labeling transitions.. Probably not what you intended. No normal string will be recognized. Try 'aaabcc'.

    Now try this as a short cut for spelling our a long sequence of transitions:

    word(aabbcc)
    
    Equivalent to:
    [a,a,b,b,c,c]
    
    Saving an
    FSA

    Click on File>Save as Postscript.

    Save the file in your home directory, choosing a name with ".ps" extension (like ("abc.ps"). Your home printer will print a postscript file if you know how to send it to the printer directly from the printer manager.

    NOTE: Before saving an fsa you can drag STATES to better looking locations with your mouse. Pulling states further apart often makes a diagram more readable.

    NOTE: Read the manual for OTHER formats you can save fsa's in.

    Viewing
    Postscript
    Files

    Bring up an xterm window. Connect to your home directory if necessary.

    Type

    gv abc.ps &
    
    "gv" is a program that views postscript files. "abc.ps" is the assumed name of the FSA you saved in the previous step. "&" backgrounds the job so that while viewing the file you still have the smae xterm window to talk to.
    More
    Regexps

    Back in the FSA window, type:

    {[a,b,c],[c,d,e]}
    

    Now try:

    {[a,b,c]*,[c,d,e]}
    
    What does the color coding on State 0 mean?

    Try:

    {[a,b,c]+,[c,d,e]}
    
    Some languages
    to try:

    Try inputting thse (expressed in the textbook's Reg exp notation) into FSA:

    1. ab*c
    2. (ab*c)+
    3. a | (an) | (the)
    4. the set of all strings from the alphabet a,b such that each a is immediately preceded and followed by a b.
    Submitting
    strings

    Start out with the FSA for reg exp (4) from the preceding step. In the string window type:

    bbbbab
    

    Now check out the xterm Window you started "fsa" from.

    You're looking at this:

    ?-
    yes
    "?-" is a Prolog prompt. "yes" is the Prolog representation of success. Test. Back in the fsa window, type:
    bba
    
    Inputting
    FSTs

    Try:

    [(a:b)*,(b:a)*]
    
    Now type the following in the string window:
    aaa (check output)
    bbbb (check output)
    aaaanbbbbb
    bbbbaaaa

    Try this short cut for making transducers:

    [b,a,a,!]xx[1,2,2,!]
    

    Now try the following very SIMPLE FST (in the REegexp window):

    '4':four
    
    This transduces the digit '4' into the word 'four'. Notice the digit has to be in single quotes.. Now try to write a compact regexp that does the following transductions:
    1. 4: four
    2. 41: forty one
    3. 42: forty two
    4. 43: forty three
    5. 44: forty four
    It should recognize 41: forty one by the sequence:
      4:forty 1:one
    Now add:
      40: forty
    to the same language. You will need epsilon transitions. Here's an example of how to insert a "0":
      []:0
    A possible answer!
    Karttunen
    Example

    Lauri Karttunen is a Finnish linguist who has done much work on morphology and FSTs.

    There is an example involving assimilation in Karttunen 91. See especially p. 4 where the discussion of kaNpan -> kampan -> kamman begins.

    Note that several rules are discussed:

      (a) N -> m \ _ p (  kaNpan -> kampan )
      (b) N -> n (  kaNtan -> kanpan )
      (c) p -> m \ m _ (  kampan -> kamman )
    This issue is that they are ordered as shown and we want to model that ordering in our transducer.

    Examples:

      Upper Lower Rules used
      Kampan kamman {(c) }
      KaNpan kamman {(a), (c) }
      KaNtan kantan {(b)}
      Kanpan kanpan

    Now consider another order:

      (c) p -> m \ m _ (  kampan -> kamman )
      (a) N -> m \ _ p (  kaNpan -> kampan )
      (b) N -> n (  kaNtan -> kanpan )

    Here's what happens

      Upper Lower Rules used
      kampan kamman {(c) }
      kaNpan kampan {(a)}
      kantan kantan {(b)}
      kanpan kanpan

    The change
    (c) ordered last kaNpan -> kaNpan
    (c) ordered first kaNpan -> kamman {(a), (c) }

    Note, for some background on the analysis of Canadian raising and flapping, see Canadian Raising, Opacity, and Rephonemicization, by William J. Idsardi, which reviews the Chomsky, Halle, Joos controversy surrounding these facts, and presents some alternative interpretations.

    Malouf's
    Twolevel
    Rule implementation

    Here is Rob's version of rule (a). for the fsa toolkit.

    Note:

    macro(nm, comp(pairs, 'N':m, [],   p: ?  ))
    
    gives the rule the name "nm", says the feasible pairs of the machine are defined by the regex "pairs", and say 'N' is realized as 'm' when the left context is anything, and the right context on the UPPER (underlying) tape is 'p'.

    The regex pairs is a set of pairs from the upper and lower language, intituitvely, all possible pairs. It is defined and named in the following line:

    macro(pairs,{a..z,'N':m,'N':n,p:m}).
    
    Note:
    a..z
    
    is short for
    a:a,b:b,c:c,.... z:z
    
    That is among the feasible pairs of our language: any lower case letter can be realized as itself.

    Furthermore, 'N' must be realize either as 'n' or 'm', and 'p' may be realized as 'm'.

    'N':m
    'N':n
     p:m
    
    The single quotes will be required for any CAPITAL letters; this is an artifact of Prolog syntax.

    A key point to watch is 'p'. Notice that our language is defined so as to allow a 'p' to surface either as a 'p' or an 'm'. The set of feasible pairs must be defined to include all possible "rewrites". It is the job of the rules to make sure the reqrites happen only in the appropriate contexts.

    Here is Rob's version of both rules.

    How to load a file with both rules into the toolkit:

    1. File > Load Aux [select /opt/fsa6/Examples/twolevel/kamman.pl]
    2. Regex window: nm (the nm machine now gets drawn). This ONLY implements rules (a)and (b).
    3. Type "kaNpan"
    4. Check your output window and think about what happened. Is this correct? [The output window is the window you used to originally start 'fsa', the one you typed 'fsa -tk' into.]
    5. Now type "kamman" in the regex window. This loads a machine that implements ALL the rules (a)-(c). Now type "kaNpan" in the STRING window and check the output window.
    Assignment:
    Problem 2 e-insertion

    Answer the questions in boldface beflow. Turn in your answers. Also turn in a revised version of the file jurafsky.pl as described below.

    e-insertion transducer

    You can bring up a two level transducer version of Jurafsky's e-insertion rule einsertion transducer in fsa by doing the following:

    1. Click on File>Loadaux. Load in the file:
        /opt/fsa6/Examples/twolevel/jurafsky.pl.
      There is another copy of the file here.
    2. Type "einsertion" in the regex window. An fsa should be displayed.
    3. In this implementation, to simplify things, I've used "+" for both morpheme and word boundaries.
    4. To test the e insertion rule, type
        fox+s+
      into the string window. Check the output in the host xterm window. Q1: What is the output?
    5. Try:
        fop+s+
      Q2: What is the output?
    6. Revise the rule to handle the case of words ending in "ch" and "sh". Here are some hints.
      1. The left and right contexts of rules are regular expressions (as notated in the fsa tool). You may want to go back and look at how regular expressions in the fsa work as we we practice them in our lab exercise
      2. Obviously the left context of the the e-insertion rule needs to be generalized. Currently the left context is:
          [{x,s,z}, (+):0]
        This means an "x","s" or "z" concatenated with a "(+):0". "(+):0" is how the rule notation represents a morpheme boundary realized as the empty string, using a "0" to represent the empty string. Note that curly braces {   } represent "or" and square brackets [   ] represent concatenation.
    Problem 3
    Flap Rule
    Flap Problem
    What to
    hand
    in

    1. kamman.pl

      1 page (or at most 2) answering the following questions about kamman.pl

      • What happened when you had the "nm" machine loaded and you typed "kaNpan". Was this the correct behavior for a machine that only implemented rules (a) and (b)? Why?
      • What happened when you had the "kamman" machine loaded and you typed "kaNpan". Was this the correct behavior for a machine that implemented rules (a), (b) and (c)?
      • If you look at the rule
          macro(nm, comp(pairs, 'N':m, [], p: ? ))
        all it says is that 'N" gets realized as 'm' if and only if the following environment is 'p' realized as anything (and the preceding environment is anything). Now on the face of it this sounds like it just implements rule (a). Is it correct to say that "nm" as defined here implements both rules (a) and (b)? How can you test to see if rule (b) is implemented? If rule (b) IS implemented, how has this happened?
      • What would happen if we combined 'nm' with the following rule:
          macro(nn, comp(pairs, 'N':m, [], n: ? ))?
        In particular, what would we get if we loaded a machine that combined the 'nn' rule with the 'nm' rule and then typed 'kaNpan' to the String window? What about 'kaNnan'? What about 'kaNtan'? What about 'N'? You can do this two ways. You can engage if the pure pleasure of a priori reasoning, or you can make a copy of 'kamman.pl' in your home directory, edit it to make those changes, and then 'Load Aux' it into fsa to see what happens. For max educational benefit, do it both ways (a priori reasoning first; i.e., guess!). However you do it, report, by filling out the following table:

          Results with 'nn'+ 'nm' machine
          Upper Lower
          KaNpan  
          KaNnan  
          KaNtan  
          N  

      Explain what happened.

    2. Jurafsky.pl. Follow the directions for Problem 2 above, the extended version of the e-insertion rule, and hand in a version of the solution.
    3. flap_problem.pl. Follow the directions for Problem 3 above, the flap rule, and hand in a solution.