Module nondet_fsa_rec
source code
This module defines a non-deterministic finite-state recognizer.
As with deterministic FSAs, a non-deterministic FSA is a pair of a
TransitionTable and a tuple of final states. The difference is that a
state-transition function always returns a tuple of states, although that
tuple may be a 1-tuple.
>>> Table = [{'a':(0,),'b':(1,), 'c':(2,)},{'a':(1,),'b':(2,), 'c':(2,)},
{'a':(2,),'b':(1,2), 'c':(0,)}]
>>> Finals = (0,)
>>> Machine = (Table, Finals)
FSA TransitionTable usage:
>>> Table[2]['b']
(1,2)
Explanation: When in state 2, transition to either state 1 or 2 when
seeing a 'b'.
For epsilon machines (machines allowing transitions on empty strings),
we add 'ε' to the alphabet and change the recognizer not to
advance in the string when making an ε transition.
Machine: A pair of an FSA transition table and a tuple of final
states.
TransitionTable: A list of states
state: A dictionary whose keys are all the members of of the input
alphabet (plus 'epilson') and whose values are tuples of states
transitioned to.
undefined: the tuple ('und',), used as the value in a state
dictionary for an input for which the state has no defined
transition.
Attention:
a machine's state dictionaries must all be defined for the same
alphabet. Machines will raise KeyEror exceptions for input not in
alphabet.
Author:
Jean Mark Gawron
Contact:
gawron@mail.sdsu.edu
Boolean.
|
simple_nondet_recognize(machine,
string)
Recognize string string with nondeterministic FSA
machine , returning True if string belongs
to the laguage defined by machine , else False. |
source code
|
|
Boolean.
|
nondet_recognize(machine,
string)
Recognize string string with nondeterministic FSA
machine , returning True if string belongs
to the language defined by machine , else False. |
source code
|
|
Boolean
|
accept_state(search_state,
last_index,
finals)
Return True iff string position of @search_state is at
the end of the string and the machine state of
@search_state is a final state. |
source code
|
|
a list of search states
|
|
a list of search states
|
|
|
|
TransitionTable
|
|
|
simple_det_recognize_with_sink_state(machine,
string) |
source code
|
|
TransitionTable
|
nd_table = [ { ' ! ' : ( ' und ' ) , ' a ' : ( ' und ' ) , ' b ' : ( 1) , ' epsilon ' : ...
The TransitionTable for M3 [no epsilons].
|
TransitionTable
|
nd_table_eps = [ { ' ! ' : ( ' und ' ) , ' a ' : ( ' und ' ) , ' b ' : ( 1) , ' epsilo ...
The TransitionTable for M4 [has epsilon in alphabet].
|
tuple
|
nd_finals = ( 4)
The tuple of final states for M3 and M4 .
|
Machine
|
M3 = ( [ { ' ! ' : ( ' und ' ) , ' a ' : ( ' und ' ) , ' b ' : ( 1) , ' epsilon ' : ( ' und ...
The nondet sheep language machine (Jurafsky and Martin, Speech and Natural Language
Processing, Fig 2.18)
|
Machine
|
M4 = ( [ { ' ! ' : ( ' und ' ) , ' a ' : ( ' und ' ) , ' b ' : ( 1) , ' epsilon ' : ( ' und ...
The nondet sheep language machine with ε transitions (Jurafsky and Martin, Speech and Natural Language
Processing, Fig 2.19)
|
Recognize string string with nondeterministic FSA
machine , returning True if string belongs to
the laguage defined by machine , else False.
Create agenda to keep track of unexplored states;
agenda is a list of search states. An search state is a pair of a
machine state and a string position. Initialize agenda with
a search state consisting of the machine 's start state (0)
and string index 0.
On each iteration of the while-loop, check to see if search state
current-state is an accept state. If so return.
If the current state 's string position is not past the
end of string , extend the agenda , applying
current_state 's machine state to that string position. Then,
if the agenda is non-empty, pop the agenda and
loop. Otherwise fail. See (Jurafsky and Martin, Speech and Natural Language
Processing, Fig 2.21)
- Parameters:
string - The string being recognized.
machine - The machine defining the language.
- Returns: Boolean.
Attention:
Quietly fails for input containing symbols not in input alphabet.
|
Recognize string string with nondeterministic FSA
machine , returning True if string belongs to
the language defined by machine , else False.
Differs from simple_nondet_recognize in calling generate_new_states, instead of simple_generate_new_states,which reports backtracking
and symbols not in the input alphabet. But accepts and rejects the same
classes of strings.
>>> nondet_recognize(M3, 'baa!')
Trying (0, 0)
...
Failing search state (2, 3): Backtracking
....
True
>>> nondet_recognize(M3, 'baa')
...
False
>>> nondet_recognize(M3, 'Baa!')
Trying (0, 0)
B not in input alphabet
Failing search state (0, 0): Backtracking
False
>>> nondet_recognize(M3, 'baah!')
Trying (0, 0)
...
Failing search state (2, 3): Backtracking
...
False
- Parameters:
string - The string being recognized.
machine - The machine defining the language.
- Returns: Boolean.
|
accept_state(search_state,
last_index,
finals)
| source code
|
Return True iff string position of @search_state is at
the end of the string and the machine state of @search_state
is a final state.
- Parameters:
search_state (SearchState) - a pair of a machine state and a tape position
last_index (int) - length of the current string
finals (list) - list of final states for the current machine
- Returns: Boolean
|
simple_generate_new_states(search_state,
table,
string)
| source code
|
Return the list of states derived by applying
search_state to table .
search_state is a pair of a string position in
string and a machine_state
current_machine_state . New states are those generated by
the transitions for current_machine_state in
table for either the symbol at strng position or
epilson.
We assume every input alphabet includes epsilon and therefore that
every transition table is defined for epsilon.
- Parameters:
search , state - the current search state just popped off agenda .
table - the TransitionTable for the current machine.
string - the string being recognized.
- Returns: a list of search states
|
generate_new_states(search_state,
table,
string)
| source code
|
Returns new states compatible with search_state ,
table , and string (see simple_generate_new_states), but in addition reports
backtracking and symbols not in the input alphabet.
- Parameters:
search , state - the currenmt search state just popped off agenda .
table - the TransitionTable for the current machine.
string - the string being recognized.
- Returns: a list of search states
|
Return a copy of table , new_table , such that
new_table adds a sink state, k. For each sym in
alphabet, new_table[k][sym]=k. For each sym in alphabet and each
state i, if table[i][sym]='und', new_table[i][sym]=k.
- Parameters:
table - a list of k dictionaries, all defined for the same alphabet, all
with numerical values < k, representing an FSA transition
table, each k corresponding to a state.
- Returns: TransitionTable
|
nd_table
The TransitionTable for M3 [no epsilons].
- Type:
- TransitionTable
- Value:
[ { ' ! ' : ( ' und ' ) , ' a ' : ( ' und ' ) , ' b ' : ( 1) , ' epsilon ' : ( ' und ' ) } ,
{ ' ! ' : ( ' und ' ) , ' a ' : ( 2) , ' b ' : ( ' und ' ) , ' epsilon ' : ( ' und ' ) } ,
{ ' ! ' : ( ' und ' ) , ' a ' : ( 3, 2) , ' b ' : ( ' und ' ) , ' epsilon ' : ( ' und ' ) } ,
{ ' ! ' : ( 4) , ' a ' : ( ' und ' ) , ' b ' : ( ' und ' ) , ' epsilon ' : ( ' und ' ) } ,
{ ' ! ' : ( ' und ' ) , ' a ' : ( ' und ' ) , ' b ' : ( ' und ' ) , ' epsilon ' : ( ' und ' ) } ]
|
|
nd_table_eps
The TransitionTable for M4 [has epsilon in alphabet].
- Type:
- TransitionTable
- Value:
[ { ' ! ' : ( ' und ' ) , ' a ' : ( ' und ' ) , ' b ' : ( 1) , ' epsilon ' : ( ' und ' ) } ,
{ ' ! ' : ( ' und ' ) , ' a ' : ( 2) , ' b ' : ( ' und ' ) , ' epsilon ' : ( ' und ' ) } ,
{ ' ! ' : ( ' und ' ) , ' a ' : ( 3) , ' b ' : ( ' und ' ) , ' epsilon ' : ( ' und ' ) } ,
{ ' ! ' : ( 4) , ' a ' : ( ' und ' ) , ' b ' : ( ' und ' ) , ' epsilon ' : ( 2) } ,
{ ' ! ' : ( ' und ' ) , ' a ' : ( ' und ' ) , ' b ' : ( ' und ' ) , ' epsilon ' : ( ' und ' ) } ]
|
|
M3
The nondet sheep language machine (Jurafsky and Martin, Speech and Natural Language
Processing, Fig 2.18)
- Type:
- Machine
- Value:
( [ { ' ! ' : ( ' und ' ) , ' a ' : ( ' und ' ) , ' b ' : ( 1) , ' epsilon ' : ( ' und ' ) } ,
{ ' ! ' : ( ' und ' ) , ' a ' : ( 2) , ' b ' : ( ' und ' ) , ' epsilon ' : ( ' und ' ) } ,
{ ' ! ' : ( ' und ' ) , ' a ' : ( 3, 2) , ' b ' : ( ' und ' ) , ' epsilon ' : ( ' und ' ) } ,
{ ' ! ' : ( 4) , ' a ' : ( ' und ' ) , ' b ' : ( ' und ' ) , ' epsilon ' : ( ' und ' ) } ,
{ ' ! ' : ( ' und ' ) , ' a ' : ( ' und ' ) , ' b ' : ( ' und ' ) , ' epsilon ' : ( ' und ' ) } ] ,
( 4) )
|
|
M4
The nondet sheep language machine with ε transitions (Jurafsky and Martin, Speech and Natural Language
Processing, Fig 2.19)
- Type:
- Machine
- Value:
( [ { ' ! ' : ( ' und ' ) , ' a ' : ( ' und ' ) , ' b ' : ( 1) , ' epsilon ' : ( ' und ' ) } ,
{ ' ! ' : ( ' und ' ) , ' a ' : ( 2) , ' b ' : ( ' und ' ) , ' epsilon ' : ( ' und ' ) } ,
{ ' ! ' : ( ' und ' ) , ' a ' : ( 3) , ' b ' : ( ' und ' ) , ' epsilon ' : ( ' und ' ) } ,
{ ' ! ' : ( 4) , ' a ' : ( ' und ' ) , ' b ' : ( ' und ' ) , ' epsilon ' : ( 2) } ,
{ ' ! ' : ( ' und ' ) , ' a ' : ( ' und ' ) , ' b ' : ( ' und ' ) , ' epsilon ' : ( ' und ' ) } ] ,
( 4) )
|
|