Package fsa_recognizer :: Module nondet_fsa_rec
[hide private]
[frames] | no frames]

Source Code for Module fsa_recognizer.nondet_fsa_rec

  1  #################################################################################################### 
  2  #####                N o n   D e t e r m i n i s t i c     M a c h i n e s 
  3  #################################################################################################### 
  4   
  5  # Fig 2.21 pseudocode J&M 
  6  # Example of list popping and pushing: Stack protocol 
  7  # L = [1,2,3] 
  8  # M = L.pop() 
  9  # L.append(3) 
 10  # Note the following is probably NOT what is wanted, since 
 11  # append is destructive and returns None 
 12  # >>> N = L.append(3) 
 13  # >>> N == None 
 14  # True 
 15  ##  extend is like append but adds on a list. 
 16  ##  This it is the destructive analogue of list concatentaion with "=". 
 17  ##  Examples of list extension: 
 18  ##  >>> L = [1,2,3] 
 19  ##  >>> L.extend([4,5,6]) 
 20  ### >>> L 
 21  ### [1, 2, 3, 4, 5, 6] 
 22   
 23  """ 
 24  This module defines a non-deterministic finite-state recognizer. 
 25   
 26  As with deterministic FSAs, a non-deterministic FSA is a pair of a TransitionTable and a tuple of final states. 
 27  The difference is that a state-transition function always returns a tuple of states, although 
 28  that tuple may be a 1-tuple. 
 29   
 30    >>>  Table = [{'a':(0,),'b':(1,), 'c':(2,)},{'a':(1,),'b':(2,), 'c':(2,)}, 
 31                  {'a':(2,),'b':(1,2), 'c':(0,)}] 
 32    >>>  Finals = (0,) 
 33    >>>  Machine = (Table, Finals) 
 34   
 35  FSA TransitionTable usage: 
 36   
 37    >>> Table[2]['b'] 
 38    (1,2) 
 39   
 40  Explanation: When in state 2, transition to either state 1 or 2 
 41  when seeing a 'b'. 
 42   
 43  For epsilon machines (machines allowing transitions on empty strings), 
 44  we add 'S{epsilon}' to the alphabet and change the recognizer not to 
 45  advance in the string when making an S{epsilon} transition. 
 46   
 47  Machine:    A pair of an FSA transition table and a tuple of final states. 
 48   
 49  TransitionTable:    A list of states 
 50       
 51  state:    A dictionary whose keys are all the members of 
 52  of the input alphabet (plus 'epilson') and whose values are tuples of states transitioned to. 
 53       
 54  undefined:     the tuple ('und',), used as the value in a state dictionary for 
 55  an input for which the state has no defined transition. 
 56   
 57  @attention: a machine's state dictionaries must all be defined for the same alphabet. 
 58              Machines will raise KeyEror exceptions for input not in alphabet. 
 59  @var nd_finals: The tuple of final states for C{M3} and C{M4}. 
 60  @type nd_finals: tuple 
 61  @var nd_table: The TransitionTable for C{M3} [no epsilons]. 
 62  @var nd_table_eps: The TransitionTable for C{M4} [has epsilon in alphabet]. 
 63  @type nd_table: TransitionTable 
 64  @type nd_table_eps: TransitionTable 
 65  @var M3: The nondet sheep language machine (U{Jurafsky and Martin, Speech and Natural Language Processing, Fig 2.18<http://www-rohan.sdsu.edu/~gawron/compling/chap2/fig02.18.pdf>}) 
 66  @type M3: Machine 
 67  @var M4: The nondet sheep language machine with S{epsilon} transitions (U{Jurafsky and Martin, Speech and Natural Language Processing, Fig 2.19<http://www-rohan.sdsu.edu/~gawron/compling/chap2/fig02.18.pdf>}) 
 68  @type M4: Machine 
 69  @sort: simple_nondet_recognize nondet_recognize  
 70  @author:  Jean Mark Gawron 
 71  @contact: gawron@mail.sdsu.edu 
 72  """ 
 73   
74 -def simple_nondet_recognize(machine,string):
75 """ 76 Recognize string C{string} with nondeterministic FSA C{machine}, returning 77 True if C{string} belongs to the laguage defined by C{machine}, else False. 78 79 Create C{agenda} to keep track of unexplored states; C{agenda} is a 80 list of X{search states}. An search state is a pair 81 of a machine state and a string position. Initialize C{agenda} 82 with a search state consisting of the C{machine}'s start state (0) and 83 string index 0. 84 85 On each iteration of the while-loop, check to see if search state 86 C{current-state} is an accept state. If so return. 87 88 If the C{current state}'s string position is not past the end of C{string}, 89 extend the C{agenda}, applying C{current_state}'s machine state 90 to that string position. Then, if the C{agenda} is 91 non-empty, pop the C{agenda} and loop. Otherwise fail. 92 See (U{Jurafsky and Martin, Speech and Natural Language Processing, Fig 2.21<http://www-rohan.sdsu.edu/~gawron/compling/chap2/fig02.21.pdf>}) 93 94 @attention: Quietly fails for input containing symbols not in input alphabet. 95 96 @param string: The string being recognized. 97 @param machine: The machine defining the language. 98 @rtype: Boolean. 99 """ 100 (table,finals) = machine 101 agenda = [] # Always start with empty agenda 102 agenda.extend([(0,0)]) # Initial search state = (starting_machine_state, beginning of tape) 103 last_index = len(string) 104 current_search_state = agenda.pop() 105 while True: 106 if accept_state(current_search_state,last_index,finals): 107 return True 108 elif current_search_state[1] < last_index: 109 agenda.extend(simple_generate_new_states(current_search_state,table,string)) 110 if agenda: 111 current_search_state = agenda.pop() 112 else: 113 return False
114
115 -def accept_state(search_state,last_index,finals):
116 """ 117 Return True iff string position of @C{search_state} is at the end of the string 118 and the machine state of @C{search_state} is a final state. 119 120 @param search_state: a pair of a machine state and a tape position 121 @type search_state: SearchState 122 @param last_index: length of the current string 123 @type last_index: int 124 @param finals: list of final states for the current machine 125 @type finals: list 126 @rtype: Boolean 127 """ 128 (machine_state,string_position) = search_state 129 if string_position == last_index and machine_state in finals: 130 return True 131 else: 132 return False
133 134 ## Examples of "list comprehension" in Python: 135 ## [x+1 for x in [1,2,3]] 136 ## [(x,1) for x in [1,2,3]] 137 ## [(x,1) for x in [1,2,3] if not x == 2]
138 -def simple_generate_new_states(search_state,table,string):
139 """ 140 Return the list of states derived by applying C{search_state} 141 to C{table}. 142 143 C{search_state} is a pair of a string position in C{string} 144 and a machine_state C{current_machine_state}. New states are 145 those generated by the transitions for C{current_machine_state} 146 in C{table} for either the symbol at strng position or epilson. 147 148 We assume every input alphabet includes epsilon 149 and therefore that every transition table is defined 150 for epsilon. 151 152 @param search state: the current search state just popped off C{agenda}. 153 @param table: the TransitionTable for the current machine. 154 @param string: the string being recognized. 155 @rtype: a list of search states 156 """ 157 (current_machine_state,string_position) = search_state 158 current_input = string[string_position] 159 ## First grab any epsilon transitions 160 states = [(machine_state,string_position) for machine_state in \ 161 table[current_machine_state]['epsilon'] if not machine_state == 'und'] 162 next_tape_index = string_position + 1 163 ## Now grab any transitions defined for current_input 164 try: 165 other_states = [(machine_state,next_tape_index) for machine_state in \ 166 table[current_machine_state][current_input] if not machine_state == 'und'] 167 except: 168 # current_input not in alphabet 169 other_states = [] 170 states.extend(other_states) # append the other_states onto epsilon_state 171 return states
172
173 -def nondet_recognize(machine, string):
174 """ 175 Recognize string C{string} with nondeterministic FSA C{machine}, returning 176 True if C{string} belongs to the language defined by C{machine}, else False. 177 178 Differs from L{simple_nondet_recognize} in calling L{generate_new_states}, 179 instead of L{simple_generate_new_states},which reports backtracking and symbols 180 not in the input alphabet. But accepts and rejects the same classes of strings. 181 182 >>> nondet_recognize(M3, 'baa!') 183 Trying (0, 0) 184 ... 185 Failing search state (2, 3): Backtracking 186 .... 187 True 188 >>> nondet_recognize(M3, 'baa') 189 ... 190 False 191 >>> nondet_recognize(M3, 'Baa!') 192 Trying (0, 0) 193 B not in input alphabet 194 Failing search state (0, 0): Backtracking 195 False 196 >>> nondet_recognize(M3, 'baah!') 197 Trying (0, 0) 198 ... 199 Failing search state (2, 3): Backtracking 200 ... 201 False 202 203 @param string: The string being recognized. 204 @param machine: The machine defining the language. 205 @rtype: Boolean. 206 """ 207 (table,finals) = machine 208 agenda = [] 209 last_index = len(string) 210 agenda.extend([(0,0)]) # Initial search state = (starting_machine_state, beginning of tape) 211 current_search_state = pop_agenda(agenda) 212 while True: 213 if accept_state(current_search_state,last_index,finals): 214 return True 215 elif current_search_state[1] < last_index: 216 agenda.extend(generate_new_states(current_search_state,table,string)) 217 if agenda: 218 current_search_state = pop_agenda(agenda) 219 else: 220 return False
221
222 -def generate_new_states(search_state,table,string):
223 """ 224 Returns new states compatible with C{search_state}, C{table}, and 225 C{string} (see L{simple_generate_new_states}), but in addition 226 reports backtracking and symbols not in the input alphabet. 227 228 @param search state: the currenmt search state just popped off C{agenda}. 229 @param table: the TransitionTable for the current machine. 230 @param string: the string being recognized. 231 @rtype: a list of search states 232 """ 233 (current_machine_state,string_position) = search_state 234 current_input = string[string_position] 235 ## First grab any epsilon table 236 states = [(machine_state,string_position) for machine_state in \ 237 table[current_machine_state]['epsilon'] if not machine_state == 'und'] 238 next_tape_index = string_position + 1 239 if states: 240 print ' %s[eps] ==> %s' % (search_state,states) 241 ## Now grab any transitions defined for current_input 242 try: 243 other_states = [(machine_state,next_tape_index) for machine_state in \ 244 table[current_machine_state][current_input] if not machine_state == 'und'] 245 except: 246 # current_input not in alphabet 247 print ' %s not in input alphabet' % current_input 248 other_states = [] 249 states.extend(other_states) # append the other_states onto epsilon_state 250 if other_states: 251 print ' %s[%s] ==> %s' % (search_state,current_input,other_states) 252 if not states: 253 print ' Failing search state %s: Backtracking' % (search_state,) 254 return states
255 256 257
258 -def pop_agenda (agenda):
259 next_search_state = agenda.pop() 260 print 'Trying %s' % (next_search_state,) 261 return next_search_state
262 263 # Non deterministic sheep language machine 264 # Fig 2.18 J & M 265 nd_table = [ {'a':( 'und',), 'b':( 1,), '!':('und',),'epsilon': ('und',)}, # state 0 b => 1 266 {'a':( 2,), 'b':('und',), '!':('und',),'epsilon': ('und',)}, # state 1 a => 2 267 {'a':( 3,2), 'b':( 'und',), '!':('und',),'epsilon': ('und',)}, # state 2 a => 3,2 268 {'a':( 'und',), 'b':( 'und',), '!':(4,),'epsilon': ('und',)}, # state 3 ! => 4 269 {'a':( 'und',), 'b':( 'und',), '!':('und',),'epsilon': ('und',)} # state 4 undef for everything. 270 ] 271 272 # Fig 2.19 J & M 273 nd_table_eps = [ {'a':( 'und',), 'b':( 1,), '!':('und',),'epsilon': ('und',)}, # state 0 b => 1 274 {'a':( 2,), 'b':('und',), '!':('und',),'epsilon': ('und',)}, # state 1 a => 2 275 {'a':( 3,), 'b':( 'und',), '!':('und',),'epsilon': ('und',)}, # state 2 a => 3 276 {'a':( 'und',), 'b':( 'und',), '!':(4,),'epsilon': (2,)}, # state 3 a => 3, ! => 4 277 {'a':( 'und',), 'b':( 'und',), '!':('und',),'epsilon': ('und',)} # state 4 undef for everything. 278 ] 279 280 ## Finals always a tuple whose members are all final states. 281 nd_finals = (4,) 282 283 M3 = (nd_table,nd_finals) 284 M4 = (nd_table_eps,nd_finals) 285 286 ####################################################### 287 ### Utils 288 ####################################################### 289
290 -def add_sink_state(table):
291 """ 292 Return a copy of C{table}, C{new_table}, such that 293 C{new_table} adds a sink state, k. For each I{sym} in alphabet, 294 new_table[k][sym]=k. For each I{sym} in alphabet and each 295 state i, if table[i][I{sym}]='und', new_table[i][I{sym}]=k. 296 297 @param table: a list of k dictionaries, all defined for the same alphabet, all with 298 numerical values < k, representing an FSA transition table, each k 299 corresponding to a state. 300 @rtype: TransitionTable 301 """ 302 new_state = len(table) 303 alphabet = table[0].keys() 304 table_copy = [D.copy() for D in table] 305 for D in table_copy: 306 for sym in alphabet: 307 if D[sym] == 'und': 308 D[sym] = new_state 309 new_state_table = {} 310 for sym in alphabet: 311 new_state_table[sym] = new_state 312 return table_copy+[new_state_table]
313
314 -def simple_det_recognize_with_sink_state(machine,string):
315 (table, finals) = machine 316 new_table = add_sink_state(table) 317 index = 0 318 state = 0 319 last_index = len(string) 320 while index < last_index: 321 new_state = new_table[state][string[index]] 322 state = new_state 323 index = index +1 324 if state in finals: 325 return True 326 else: 327 return False
328