Rule Ordering

    Using FSA

    On bulba type:

    % fsa -tk
    
    A GUI comes up.

    There are two lines you can type into, labeled "Regex" and "String"

    "Regex" accepts a regular expression that lets you define an FSA or FST by hand.

    "String" accepts a strong to be submitted to the current FSA by clicking on "Submit".

    Example
    Regexp

    On the Regexp line type:

    {[a,b,c]}
    
    Your first FSA!

    Notice the color-coding on the states. What does it mean?

    Note that you MUST use the square brackets for concatentation (a Prolog "list"). Note that each element of the list is assumed to be an element of your "alphabet":

    {[aa,ab,cc]}
    
    Notice the machine has 'aa','ab' and 'cc' labeling transitions.. Probably not what you intended. No normal string will be recognized. Try 'aaabcc'.

    Now try this as a short cut for spelling our a long sequence of transitions:

    word(aabbcc)
    
    Equivalent to:
    [a,a,b,b,c,c]
    
    Saving an
    FSA

    Click on File>Save as Postscript.

    Save the file in your home directory, choosing a name with ".ps" extension (like ("abc.ps"). Your home printer will print a postscript file if you know how to send it to the printer directly from the printer manager.

    NOTE: Before saving an fsa you can drag STATES to better looking locations with your mouse. Pulling states further apart often makes a diagram more readable.

    NOTE: Read the manual for OTHER formats you can save fsa's in.

    Viewing
    Postscript
    Files

    Bring up an xterm window. Connect to your home directory if necessary.

    Type

    gv abc.ps &
    
    "gv" is a program that views postscript files. "abc.ps" is the assumed name of the FSA you saved in the previous step. "&" backgrounds the job so that while viewing the file you still have the smae xterm window to talk to.
    More
    Regexps

    Back in the FSA window, type:

    {[a,b,c],[c,d,e]}
    

    Now try:

    {[a,b,c]*,[c,d,e]}
    
    What does the color coding on State 0 mean?

    Try:

    {[a,b,c]+,[c,d,e]}
    
    Some languages
    to try:

    Try inputting thse (expressed in the textbook's Reg exp notation) into FSA:

    1. ab*c
    2. (ab*c)+
    3. a | (an) | (the)
    4. the set of all strings from the alphabet a,b such that each a is immediately preceded and followed by a b.
    Submitting
    strings

    Start out with the FSA for reg exp (4) from the preceding step. In the string window type:

    bbbbab
    

    Now check out the xterm Window you started "fsa" from.

    You're looking at this:

    ?-
    yes
    "?-" is a Prolog prompt. "yes" is the Prolog representation of success. Test. Back in the fsa window, type:
    bba
    
    Inputting
    FSTs

    Try:

    [(a:b)*,(b:a)*]
    
    Now type the following in the string window:
    aaa (check output)
    bbbb (check output)
    aaaanbbbbb
    bbbbaaaa

    Try this short cut for making transducers:

    [b,a,a,!]xx[1,2,2,!]
    

    Now try the following very SIMPLE FST (in the REegexp window):

    '4':four
    
    This transduces the digit '4' into the word 'four'. Notice the digit has to be in single quotes.. Now try to write a compact regexp that does the following transductions:
    1. 4: four
    2. 41: forty one
    3. 42: forty two
    4. 43: forty three
    5. 44: forty four
    It should recognize 41: forty one by the sequence:
      4:forty 1:one
    Now add:
      40: forty
    to the same language. You will need epsilon transitions. Here's an example of how to insert a "0":
      []:0
    A possible answer!
    Karttunen
    Example

    Lauri Karttunen is a Finnish linguist who has done much work on morphology and FSTs.

    There is an example involving assimilation in Karttunen 91. See especially p. 4 where the discussion of kaNpan -> kampan -> kamman begins.

    Note that several rules are discussed:

      (a) N -> m \ _ p (  kaNpan -> kampan )
      (b) N -> n (  kaNtan -> kanpan )
      (c) p -> m \ m _ (  kampan -> kamman )
    This issue is that they are ordered as shown and we want to model that ordering in our transducer.

    Examples:

      Upper Lower Rules used
      Kampan kamman {(c) }
      KaNpan kamman {(a), (c) }
      KaNtan kantan {(b)}
      Kanpan kanpan

    Now consider another order:

      (c) p -> m \ m _ (  kampan -> kamman )
      (a) N -> m \ _ p (  kaNpan -> kampan )
      (b) N -> n (  kaNtan -> kanpan )

    Here's what happens

      Upper Lower Rules used
      kampan kamman {(c) }
      kanpan kaNpan
      kantan kantan {(b)}
      kanpan kanpan

    The change
    (c) ordered last kaNpan -> kaNpan
    (c) ordered first kaNpan -> kamman {(a), (c) }

    Note, for some background on the analysis of Canadian raising and flapping, see Canadian Raising, Opacity, and Rephonemicization, by William J. Idsardi, which reviews the Chomsky, Halle, Joos controversy surrounding these facts, and presents some alternative interpretations.

    Malouf's
    Twolevel
    Rule implementation

    Here is Rob's version of rule (a). for the fsa toolkit.

    Note:

    macro(nm, comp(pairs, 'N':m, [],   p: ?  ))
    
    gives the rule the name "nm", says the feasible pairs of the machine are defined by the regex "pairs", and say 'N' is realized as 'm' when the left context is anything, and the right context on the UPPER (underlying) tape is 'p'.

    The regex pairs is a set of pairs from the upper and lower language, intituitvely, all possible pairs. It is defined and named in the following line:

    macro(pairs,{a..z,'N':m,'N':n,p:m}).
    
    Note:
    a..z
    
    is short for
    a:a,b:b,c:c,.... z:z
    
    That is among the feasible pairs of our language: any lower case letter can be realized as itself.

    Furthermore, 'N' must be realize either as 'n' or 'm', and 'p' may be realized as 'm'.

    'N':m
    'N':n
     p:m
    
    The single quotes will be required for any CAPITAL letters; this is an artifact of Prolog syntax.

    A key point to watch is 'p'. Notice that our language is defined so as to allow a 'p' to surface either as a 'p' or an 'm'. The set of feasible pairs must be defined to include all possible "rewrites". It is the job of the rules to make sure the reqrites happen only in the appropriate contexts.

    Here is Rob's version of both rules.

    How to load a file with both rules into the toolkit:

    1. File > Load Aux [select /usr/local/fsa6/Examples/twolevel/kamman.pl]
    2. Regex window: nm (the nm machine now gets drawn). This ONLY implements rules (a)and (b).
    3. Type "kaNpan"
    4. Check your output window and think about what happened. Is this correct?
    5. Now type "kamman" in the regex window. This loads a machine that implements ALL the rules (a)-(c). Now type "kaNpan" in the STRING window and check the output window.