FSA Exercise

Rule Ordering

Using FSA

On bulba type:

% fsa -tk

A GUI comes up.

There are two lines you can type into, labeled "Regex" and "String"

"Regex" accepts a regular expression that lets you define an FSA or FST by hand.

"String" accepts a strong to be submitted to the current FSA by clicking on "Submit".

Example
Regexp

On the Regexp line type:

{[a,b,c]}

Your first FSA!

Notice the color-coding on the states. What does it mean?

Note that you MUST use the square brackets for concatentation (a Prolog "list"). Note that each element of the list is assumed to be an element of your "alphabet":

{[aa,ab,cc]}

Notice the machine has 'aa','ab' and 'cc' labeling transitions.. Probably not what you intended. No normal string will be recognized. Try 'aaabcc'.

Now try this as a short cut for spelling our a long sequence of transitions:

word(aabbcc)

Equivalent to:

[a,a,b,b,c,c]

Saving an
FSA

Click on File>Save as Postscript.

Save the file in your home directory, choosing a name with ".ps" extension (like ("abc.ps"). Your home printer will print a postscript file if you know how to send it to the printer directly from the printer manager.

NOTE: Before saving an fsa you can drag STATES to better looking locations with your mouse. Pulling states further apart often makes a diagram more readable.

NOTE: Read the manual for OTHER formats you can save fsa's in.

Viewing
Postscript
Files

Bring up an xterm window. Connect to your home directory if necessary.

Type

gv abc.ps &

"gv" is a program that views postscript files. "abc.ps" is the assumed name of the FSA you saved in the previous step. "&" backgrounds the job so that while viewing the file you still have the smae xterm window to talk to.

More
Regexps

Back in the FSA window, type:

{[a,b,c],[c,d,e]}

Now try:

{[a,b,c]*,[c,d,e]}

What does the color coding on State 0 mean?

Try:

{[a,b,c]+,[c,d,e]}

Some languages
to try:

Try inputting thse (expressed in the textbook's Reg exp notation) into FSA:

ab*c
(ab*c)+
a | (an) | (the)
the set of all strings from the alphabet a,b such that each a is immediately preceded and followed by a b.

Submitting
strings

Start out with the FSA for reg exp (4) from the preceding step. In the string window type:

bbbbab

Now check out the xterm Window you started "fsa" from.

You're looking at this:

?-

yes

"?-" is a Prolog prompt. "yes" is the Prolog representation of success. Test. Back in the fsa window, type:

bba

Inputting
FSTs

Try:

[(a:b)*,(b:a)*]

Now type the following in the string window:

aaa (check output)
 
bbbb (check output)
 
aaaanbbbbb 

bbbbaaaa

Try this short cut for making transducers:

[b,a,a,!]xx[1,2,2,!]

Now try the following very SIMPLE FST (in the REegexp window):

'4':four

This transduces the digit '4' into the word 'four'. Notice the digit has to be in single quotes.. Now try to write a compact regexp that does the following transductions:

4: four
41: forty one
42: forty two
43: forty three
44: forty four

It should recognize 41: forty one by the sequence:

4:forty 1:one Now add:

40: forty to the same language. You will need epsilon transitions. Here's an example of how to insert a "0":

[]:0 A possible answer!

Karttunen
Example

Lauri Karttunen is a Finnish linguist who has done much work on morphology and FSTs.

There is an example involving assimilation in Karttunen 91. See especially p. 4 where the discussion of kaNpan -> kampan -> kamman begins.

Note that several rules are discussed:

This issue is that they are ordered as shown and we want to model that ordering in our transducer.

Examples:

Upper	Lower	Rules used
Kampan	kamman	{(c) }
KaNpan	kamman	{(a), (c) }
KaNtan	kantan	{(b)}
Kanpan	kanpan	∅

Now consider another order:

Here's what happens

Upper	Lower	Rules used
kampan	kamman	{(c) }
kanpan	kaNpan	∅
kantan	kantan	{(b)}
kanpan	kanpan	∅

The change

(c) ordered last kaNpan -> kaNpan ∅

(c) ordered first kaNpan -> kamman {(a), (c) }

Note, for some background on the analysis of Canadian raising and flapping, see Canadian Raising, Opacity, and Rephonemicization, by William J. Idsardi, which reviews the Chomsky, Halle, Joos controversy surrounding these facts, and presents some alternative interpretations.

Malouf's
Twolevel
Rule implementation

Here is Rob's version of rule (a). for the fsa toolkit.

Note:

macro(nm, comp(pairs, 'N':m, [],   p: ?  ))

gives the rule the name "nm", says the feasible pairs of the machine are defined by the regex "pairs", and say 'N' is realized as 'm' when the left context is anything, and the right context on the UPPER (underlying) tape is 'p'.

The regex pairs is a set of pairs from the upper and lower language, intituitvely, all possible pairs. It is defined and named in the following line:

macro(pairs,{a..z,'N':m,'N':n,p:m}).

Note:

a..z

is short for

a:a,b:b,c:c,.... z:z

That is among the feasible pairs of our language: any lower case letter can be realized as itself.

Furthermore, 'N' must be realize either as 'n' or 'm', and 'p' may be realized as 'm'.

'N':m
'N':n
 p:m

The single quotes will be required for any CAPITAL letters; this is an artifact of Prolog syntax.

A key point to watch is 'p'. Notice that our language is defined so as to allow a 'p' to surface either as a 'p' or an 'm'. The set of feasible pairs must be defined to include all possible "rewrites". It is the job of the rules to make sure the reqrites happen only in the appropriate contexts.

Here is Rob's version of both rules.

How to load a file with both rules into the toolkit:

File > Load Aux [select /usr/local/fsa6/Examples/twolevel/kamman.pl]
Regex window: nm (the nm machine now gets drawn). This ONLY implements rules (a)and (b).
Type "kaNpan"
Check your output window and think about what happened. Is this correct?
Now type "kamman" in the regex window. This loads a machine that implements ALL the rules (a)-(c). Now type "kaNpan" in the STRING window and check the output window.