Package fsa_recognizer :: Module det_fsa_rec
[hide private]
[frames] | no frames]

Source Code for Module fsa_recognizer.det_fsa_rec

  1  """ 
  2  This module defines a deterministic finite-state recognizer.  No S{epsilon}-transitions are allowed. 
  3   
  4  An FSA (here simply called a C{Machine}) is a triple of an alphabet, a TransitionTable and a tuple of final states: 
  5   
  6      - C{alphabet}:           A sequence, the elements define the set of symbols in the alphabet of C{Machine} 
  7      - C{TransitionTable}:    A list of StateTransitionFunctions 
  8      - C{StateTransitionFunction}:    A dictionary whose keys are the members of the input alphabet and whose values are the states transitioned to. 
  9   
 10  Keys and values in transition functions come from the following sets: 
 11   
 12      - C{alphabet}: Must be the same for all states in a machine 
 13      - C{states}: the set of states of the machine, the first n-1 integers (including 0) for a machine of n states 
 14      - C{undefined}:  the string 'und', used as the value in a StateTransitionFunction for an alphabet symbol for which the state has no defined transition. 
 15   
 16  Example machine: 
 17   
 18    >>>  Table = [{'b':1},{'a':1,'b':2,'c':0},{'a':2,'b':2}] 
 19    >>>  Finals = (0,) 
 20    >>>  Machine = ('abc', Table, Finals) 
 21   
 22  In this machine, the alphabet is M{{a,b,c}}, and there are 3 states, 0, 
 23  1, and 2, defined by 3 StateTransitionFunctions (U{diagram<http://www-rohan.sdsu.edu/~gawron/compling/chap2/silly_graph.pdf>}).  Lists are indexed from 
 24  0, so the state-transition function for state 2 is the 3rd in C{Table}. 
 25   
 26    >>> get_transition_value(Table[2],'b') 
 27    2 
 28   
 29  When in state 2, transition back to state 2 upon seeing a 'b'. 
 30   
 31    >>> get_transition_value(Table[2],'c') 
 32    'und' 
 33   
 34  State 2 is undefined for the input 'c'.  The FSA fails when seeing a 
 35  'c' in state 2. 
 36   
 37  @attention: a machine's state dictionaries must all be defined for the same alphabet. 
 38  @var finals1: The tuple of final states for C{M1} 
 39  @type finals1: tuple 
 40  @var table1: The TransitionTable for C{M1}. 
 41  @type table1: TransitionTable 
 42  @var M1: The deterministic sheep language machine (U{Jurafsky and Martin, Speech and Natural Language Processing, Fig 2.12<http://www-rohan.sdsu.edu/~gawron/compling/chap2/fig02.12.pdf>}) 
 43  @type M1: Machine 
 44  @sort: simple_det_recognize get_transition_value set_transition_value det_recognize add_sink_state 
 45  @author:  Jean Mark Gawron 
 46  @contact: gawron@mail.sdsu.edu 
 47  """ 
 48   
 49  ## The sheep language machine! 
 50  table1 = [ {'b':1},     # state 0  b => 1 
 51             {'a': 2},    # state 1  a => 2 
 52             {'a': 3},    # state 2  a => 3 
 53             {'a':3,'!':4},     # state 3  a => 3,  ! => 4 
 54             { }          # state 4, final state, undef for everything. 
 55                  ] 
 56  ## (4,): a singleton tuple containing 4. Attn: (4) Not a tuple! (4) == 4 
 57  finals1 = (4,) 
 58   
 59  M1 = ('ab!',table1,finals1) 
 60   
 61   
 62  # Example of list accessing 
 63  # L = ['a','b','c'] 
 64  # L[2] 
 65  # Example of string accessing 
 66  # S = 'abcd' 
 67  # S[3] 
 68  # Example of dictionary accessing 
 69  # D = {'a':0,'b':1, 'c':2} 
 70  # D['b'] 
 71   
72 -def simple_det_recognize(machine,string):
73 """ 74 See U{Jurafsky and Martin, Speech and Natural Language Processing, Fig 2.13<http://www-rohan.sdsu.edu/~gawron/compling/chap2/fig02.13.pdf>} 75 76 >>> simple_det_recognize(M1,'Baaa!') 77 False 78 >>> simple_det_recognize(M1,'baaa!') 79 True 80 >>> simple_det_recognize(M1,'baaa') 81 False 82 83 @param machine: a triple of an alphabet, a transition table and a list of final states 84 @type machine: tuple 85 @param string: the string to be accepted or rejected 86 @rtype: boolean 87 @return: True if the string is accepted by C{machine} 88 """ 89 (alphabet,table,finals) = machine 90 index = 0 91 state = 0 92 for index in range(len(string)): 93 state = get_transition_value(table[state],string[index]) 94 if state == 'und': 95 return False 96 if state in finals: 97 return True 98 else: 99 return False
100 101
102 -def get_transition_value (transitions, symbol):
103 """ 104 Having this function wrapper here allows 105 some abstraction in how transition functions are 106 implemented. 107 108 @param transitions: transition map from chars to next state for this state 109 @type transitions: dictionary (Java HashMap) 110 @rtype: C{int} or C{'und'} 111 """ 112 if transitions.has_key(symbol): 113 return transitions[symbol] 114 else: 115 return 'und'
116
117 -def set_transition_value (transitions, sym, value):
118 """ 119 Having this function wrapper here allows 120 some abstraction in how transition functions are 121 implemented. 122 123 @param transitions: transition map from chars to next state for this state 124 @type transitions: dictionary (Java HashMap) 125 @param sym: the symbol for which the transition mapping is being defined. 126 @rtype: C{int} or C{'und'} 127 """ 128 transitions[sym] = value
129
130 -def det_recognize (machine,string):
131 """ 132 Differs from L{simple_det_recognize} in printing 133 explicit error diagnostics for failure. For example, for the case where symbols 134 not in the alphabet occur in B{string}. Instead of raising 135 a KeyError, X{det_recognize} printa an explicit error message 136 and fails. 137 138 Also: reports when a state is undefined for input symbol 139 in alphabet before failing, And reports being in a nonfinal 140 state at ethe termination of input before failing. 141 142 @param machine: a pair of a transition table and a list of final 143 @type machine: tuple 144 @param string: the string to be accepted or rejected 145 @rtype: boolean 146 @return: True if the string is accepted by B{machine} 147 """ 148 (alphabet,table,finals) = machine 149 index = 0 150 state = 0 151 for index in range(len(string)): 152 sym = string[index] 153 transitions = table[state] 154 if sym in alphabet: 155 old_state = state 156 state = get_transition_value(transitions,sym) 157 else: 158 print '%s not in input alphabet' % sym 159 return False 160 if state == 'und': 161 print 'State %s not defined for %s' % (old_state, sym) 162 return False 163 if state in finals: 164 return True 165 else: 166 print 'Ending in non-final state %s' % state 167 return False
168 169
170 -def add_sink_state (machine):
171 """ 172 Return a new machine just like C{machine} but with an added sink state, k. 173 If C{new_table} is the new machine's TransitionTable, and C{table} is 174 C{machine}'s TransitionTable, then for each I{sym} in alphabet, 175 get_transition_value(k, I{sym}) = k. For each I{sym} in alphabet and each 176 state i, if get_transition_value(i, I{sym}) = 'und', get_transition_value(i, I{sym}) = k. 177 178 >>> M2=add_sink_state(M1) 179 >>> det_recognize(M1,'baa!') 180 True 181 >>> det_recognize(M2,'baa!') 182 True 183 >>> det_recognize(M1,'baab') 184 State 3 not defined for b 185 False 186 >>> det_recognize(M2,'baab') 187 Ending in non-final state 5 188 False 189 >>> det_recognize(M1,'Baa!') 190 B not in input alphabet 191 False 192 >>> det_recognize(M2,'Baa!') 193 B not in input alphabet 194 False 195 196 @param machine: the machine to be transformed 197 @rtype: Machine 198 """ 199 (alphabet, table, finals) = machine 200 num_states = len(table) 201 sink_transitions = { } 202 for sym in alphabet: 203 ## num_states is sink state index 204 ## transition back to sink state on every symbol 205 set_transition_value(sink_transitions,sym,num_states) 206 new_table = [ ] # Copy so as to leave old machine unchanged. 207 for i in range(num_states): 208 new_transitions = { } 209 new_transitions.update(table[i]) # new transitions now a copy 210 new_table.append(new_transitions) 211 for sym in alphabet: 212 tran_state = get_transition_value(new_transitions,sym) 213 if tran_state == 'und': 214 # transition to sink state 215 set_transition_value(new_transitions,sym,num_states) 216 # Add sink_transitions as last state 217 new_table.append(sink_transitions) 218 return (alphabet,new_table,finals)
219