Package fsa_recognizer :: Module nondet_fsa_rec
[hide private]
[frames] | no frames]

Module nondet_fsa_rec

source code

This module defines a non-deterministic finite-state recognizer.

As with deterministic FSAs, a non-deterministic FSA is a pair of a TransitionTable and a tuple of final states. The difference is that a state-transition function always returns a tuple of states, although that tuple may be a 1-tuple.

>>>  Table = [{'a':(0,),'b':(1,), 'c':(2,)},{'a':(1,),'b':(2,), 'c':(2,)},
              {'a':(2,),'b':(1,2), 'c':(0,)}]
>>>  Finals = (0,)
>>>  Machine = (Table, Finals)

FSA TransitionTable usage:

>>> Table[2]['b']
(1,2)

Explanation: When in state 2, transition to either state 1 or 2 when seeing a 'b'.

For epsilon machines (machines allowing transitions on empty strings), we add 'ε' to the alphabet and change the recognizer not to advance in the string when making an ε transition.

Machine: A pair of an FSA transition table and a tuple of final states.

TransitionTable: A list of states

state: A dictionary whose keys are all the members of of the input alphabet (plus 'epilson') and whose values are tuples of states transitioned to.

undefined: the tuple ('und',), used as the value in a state dictionary for an input for which the state has no defined transition.


Attention: a machine's state dictionaries must all be defined for the same alphabet. Machines will raise KeyEror exceptions for input not in alphabet.

Author: Jean Mark Gawron

Contact: gawron@mail.sdsu.edu

Functions [hide private]
Boolean.
simple_nondet_recognize(machine, string)
Recognize string string with nondeterministic FSA machine, returning True if string belongs to the laguage defined by machine, else False.
source code
Boolean.
nondet_recognize(machine, string)
Recognize string string with nondeterministic FSA machine, returning True if string belongs to the language defined by machine, else False.
source code
Boolean
accept_state(search_state, last_index, finals)
Return True iff string position of @search_state is at the end of the string and the machine state of @search_state is a final state.
source code
a list of search states
simple_generate_new_states(search_state, table, string)
Return the list of states derived by applying search_state to table.
source code
a list of search states
generate_new_states(search_state, table, string)
Returns new states compatible with search_state, table, and string (see simple_generate_new_states), but in addition reports backtracking and symbols not in the input alphabet.
source code
 
pop_agenda(agenda) source code
TransitionTable
add_sink_state(table)
Return a copy of table, new_table, such that new_table adds a sink state, k.
source code
 
simple_det_recognize_with_sink_state(machine, string) source code
Variables [hide private]
TransitionTable nd_table = [{'!': ('und'), 'a': ('und'), 'b': (1), 'epsilon': ...
The TransitionTable for M3 [no epsilons].
TransitionTable nd_table_eps = [{'!': ('und'), 'a': ('und'), 'b': (1), 'epsilo...
The TransitionTable for M4 [has epsilon in alphabet].
tuple nd_finals = (4)
The tuple of final states for M3 and M4.
Machine M3 = ([{'!': ('und'), 'a': ('und'), 'b': (1), 'epsilon': ('und...
The nondet sheep language machine (Jurafsky and Martin, Speech and Natural Language Processing, Fig 2.18)
Machine M4 = ([{'!': ('und'), 'a': ('und'), 'b': (1), 'epsilon': ('und...
The nondet sheep language machine with ε transitions (Jurafsky and Martin, Speech and Natural Language Processing, Fig 2.19)
Function Details [hide private]

simple_nondet_recognize(machine, string)

source code 

Recognize string string with nondeterministic FSA machine, returning True if string belongs to the laguage defined by machine, else False.

Create agenda to keep track of unexplored states; agenda is a list of search states. An search state is a pair of a machine state and a string position. Initialize agenda with a search state consisting of the machine's start state (0) and string index 0.

On each iteration of the while-loop, check to see if search state current-state is an accept state. If so return.

If the current state's string position is not past the end of string, extend the agenda, applying current_state's machine state to that string position. Then, if the agenda is non-empty, pop the agenda and loop. Otherwise fail. See (Jurafsky and Martin, Speech and Natural Language Processing, Fig 2.21)

Parameters:
  • string - The string being recognized.
  • machine - The machine defining the language.
Returns: Boolean.

Attention: Quietly fails for input containing symbols not in input alphabet.

nondet_recognize(machine, string)

source code 

Recognize string string with nondeterministic FSA machine, returning True if string belongs to the language defined by machine, else False.

Differs from simple_nondet_recognize in calling generate_new_states, instead of simple_generate_new_states,which reports backtracking and symbols not in the input alphabet. But accepts and rejects the same classes of strings.

>>> nondet_recognize(M3, 'baa!')
Trying (0, 0)
...
Failing search state (2, 3): Backtracking
....
True
>>> nondet_recognize(M3, 'baa')
...
False
>>> nondet_recognize(M3, 'Baa!')
Trying (0, 0)
B not in input alphabet
Failing search state (0, 0): Backtracking
False
>>> nondet_recognize(M3, 'baah!')
Trying (0, 0)
...
Failing search state (2, 3): Backtracking
...
False
Parameters:
  • string - The string being recognized.
  • machine - The machine defining the language.
Returns: Boolean.

accept_state(search_state, last_index, finals)

source code 

Return True iff string position of @search_state is at the end of the string and the machine state of @search_state is a final state.

Parameters:
  • search_state (SearchState) - a pair of a machine state and a tape position
  • last_index (int) - length of the current string
  • finals (list) - list of final states for the current machine
Returns: Boolean

simple_generate_new_states(search_state, table, string)

source code 

Return the list of states derived by applying search_state to table.

search_state is a pair of a string position in string and a machine_state current_machine_state. New states are those generated by the transitions for current_machine_state in table for either the symbol at strng position or epilson.

We assume every input alphabet includes epsilon and therefore that every transition table is defined for epsilon.

Parameters:
  • search, state - the current search state just popped off agenda.
  • table - the TransitionTable for the current machine.
  • string - the string being recognized.
Returns: a list of search states

generate_new_states(search_state, table, string)

source code 

Returns new states compatible with search_state, table, and string (see simple_generate_new_states), but in addition reports backtracking and symbols not in the input alphabet.

Parameters:
  • search, state - the currenmt search state just popped off agenda.
  • table - the TransitionTable for the current machine.
  • string - the string being recognized.
Returns: a list of search states

add_sink_state(table)

source code 

Return a copy of table, new_table, such that new_table adds a sink state, k. For each sym in alphabet, new_table[k][sym]=k. For each sym in alphabet and each state i, if table[i][sym]='und', new_table[i][sym]=k.

Parameters:
  • table - a list of k dictionaries, all defined for the same alphabet, all with numerical values < k, representing an FSA transition table, each k corresponding to a state.
Returns: TransitionTable

Variables Details [hide private]

nd_table

The TransitionTable for M3 [no epsilons].
Type:
TransitionTable
Value:
[{'!': ('und'), 'a': ('und'), 'b': (1), 'epsilon': ('und')},
 {'!': ('und'), 'a': (2), 'b': ('und'), 'epsilon': ('und')},
 {'!': ('und'), 'a': (3, 2), 'b': ('und'), 'epsilon': ('und')},
 {'!': (4), 'a': ('und'), 'b': ('und'), 'epsilon': ('und')},
 {'!': ('und'), 'a': ('und'), 'b': ('und'), 'epsilon': ('und')}]

nd_table_eps

The TransitionTable for M4 [has epsilon in alphabet].
Type:
TransitionTable
Value:
[{'!': ('und'), 'a': ('und'), 'b': (1), 'epsilon': ('und')},
 {'!': ('und'), 'a': (2), 'b': ('und'), 'epsilon': ('und')},
 {'!': ('und'), 'a': (3), 'b': ('und'), 'epsilon': ('und')},
 {'!': (4), 'a': ('und'), 'b': ('und'), 'epsilon': (2)},
 {'!': ('und'), 'a': ('und'), 'b': ('und'), 'epsilon': ('und')}]

M3

The nondet sheep language machine (Jurafsky and Martin, Speech and Natural Language Processing, Fig 2.18)
Type:
Machine
Value:
([{'!': ('und'), 'a': ('und'), 'b': (1), 'epsilon': ('und')},
  {'!': ('und'), 'a': (2), 'b': ('und'), 'epsilon': ('und')},
  {'!': ('und'), 'a': (3, 2), 'b': ('und'), 'epsilon': ('und')},
  {'!': (4), 'a': ('und'), 'b': ('und'), 'epsilon': ('und')},
  {'!': ('und'), 'a': ('und'), 'b': ('und'), 'epsilon': ('und')}],
 (4))

M4

The nondet sheep language machine with ε transitions (Jurafsky and Martin, Speech and Natural Language Processing, Fig 2.19)
Type:
Machine
Value:
([{'!': ('und'), 'a': ('und'), 'b': (1), 'epsilon': ('und')},
  {'!': ('und'), 'a': (2), 'b': ('und'), 'epsilon': ('und')},
  {'!': ('und'), 'a': (3), 'b': ('und'), 'epsilon': ('und')},
  {'!': (4), 'a': ('und'), 'b': ('und'), 'epsilon': (2)},
  {'!': ('und'), 'a': ('und'), 'b': ('und'), 'epsilon': ('und')}],
 (4))