1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23 """
24 This module defines a non-deterministic finite-state recognizer.
25
26 As with deterministic FSAs, a non-deterministic FSA is a pair of a TransitionTable and a tuple of final states.
27 The difference is that a state-transition function always returns a tuple of states, although
28 that tuple may be a 1-tuple.
29
30 >>> Table = [{'a':(0,),'b':(1,), 'c':(2,)},{'a':(1,),'b':(2,), 'c':(2,)},
31 {'a':(2,),'b':(1,2), 'c':(0,)}]
32 >>> Finals = (0,)
33 >>> Machine = (Table, Finals)
34
35 FSA TransitionTable usage:
36
37 >>> Table[2]['b']
38 (1,2)
39
40 Explanation: When in state 2, transition to either state 1 or 2
41 when seeing a 'b'.
42
43 For epsilon machines (machines allowing transitions on empty strings),
44 we add 'S{epsilon}' to the alphabet and change the recognizer not to
45 advance in the string when making an S{epsilon} transition.
46
47 Machine: A pair of an FSA transition table and a tuple of final states.
48
49 TransitionTable: A list of states
50
51 state: A dictionary whose keys are all the members of
52 of the input alphabet (plus 'epilson') and whose values are tuples of states transitioned to.
53
54 undefined: the tuple ('und',), used as the value in a state dictionary for
55 an input for which the state has no defined transition.
56
57 @attention: a machine's state dictionaries must all be defined for the same alphabet.
58 Machines will raise KeyEror exceptions for input not in alphabet.
59 @var nd_finals: The tuple of final states for C{M3} and C{M4}.
60 @type nd_finals: tuple
61 @var nd_table: The TransitionTable for C{M3} [no epsilons].
62 @var nd_table_eps: The TransitionTable for C{M4} [has epsilon in alphabet].
63 @type nd_table: TransitionTable
64 @type nd_table_eps: TransitionTable
65 @var M3: The nondet sheep language machine (U{Jurafsky and Martin, Speech and Natural Language Processing, Fig 2.18<http://www-rohan.sdsu.edu/~gawron/compling/chap2/fig02.18.pdf>})
66 @type M3: Machine
67 @var M4: The nondet sheep language machine with S{epsilon} transitions (U{Jurafsky and Martin, Speech and Natural Language Processing, Fig 2.19<http://www-rohan.sdsu.edu/~gawron/compling/chap2/fig02.18.pdf>})
68 @type M4: Machine
69 @sort: simple_nondet_recognize nondet_recognize
70 @author: Jean Mark Gawron
71 @contact: gawron@mail.sdsu.edu
72 """
73
75 """
76 Recognize string C{string} with nondeterministic FSA C{machine}, returning
77 True if C{string} belongs to the laguage defined by C{machine}, else False.
78
79 Create C{agenda} to keep track of unexplored states; C{agenda} is a
80 list of X{search states}. An search state is a pair
81 of a machine state and a string position. Initialize C{agenda}
82 with a search state consisting of the C{machine}'s start state (0) and
83 string index 0.
84
85 On each iteration of the while-loop, check to see if search state
86 C{current-state} is an accept state. If so return.
87
88 If the C{current state}'s string position is not past the end of C{string},
89 extend the C{agenda}, applying C{current_state}'s machine state
90 to that string position. Then, if the C{agenda} is
91 non-empty, pop the C{agenda} and loop. Otherwise fail.
92 See (U{Jurafsky and Martin, Speech and Natural Language Processing, Fig 2.21<http://www-rohan.sdsu.edu/~gawron/compling/chap2/fig02.21.pdf>})
93
94 @attention: Quietly fails for input containing symbols not in input alphabet.
95
96 @param string: The string being recognized.
97 @param machine: The machine defining the language.
98 @rtype: Boolean.
99 """
100 (table,finals) = machine
101 agenda = []
102 agenda.extend([(0,0)])
103 last_index = len(string)
104 current_search_state = agenda.pop()
105 while True:
106 if accept_state(current_search_state,last_index,finals):
107 return True
108 elif current_search_state[1] < last_index:
109 agenda.extend(simple_generate_new_states(current_search_state,table,string))
110 if agenda:
111 current_search_state = agenda.pop()
112 else:
113 return False
114
116 """
117 Return True iff string position of @C{search_state} is at the end of the string
118 and the machine state of @C{search_state} is a final state.
119
120 @param search_state: a pair of a machine state and a tape position
121 @type search_state: SearchState
122 @param last_index: length of the current string
123 @type last_index: int
124 @param finals: list of final states for the current machine
125 @type finals: list
126 @rtype: Boolean
127 """
128 (machine_state,string_position) = search_state
129 if string_position == last_index and machine_state in finals:
130 return True
131 else:
132 return False
133
134
135
136
137
139 """
140 Return the list of states derived by applying C{search_state}
141 to C{table}.
142
143 C{search_state} is a pair of a string position in C{string}
144 and a machine_state C{current_machine_state}. New states are
145 those generated by the transitions for C{current_machine_state}
146 in C{table} for either the symbol at strng position or epilson.
147
148 We assume every input alphabet includes epsilon
149 and therefore that every transition table is defined
150 for epsilon.
151
152 @param search state: the current search state just popped off C{agenda}.
153 @param table: the TransitionTable for the current machine.
154 @param string: the string being recognized.
155 @rtype: a list of search states
156 """
157 (current_machine_state,string_position) = search_state
158 current_input = string[string_position]
159
160 states = [(machine_state,string_position) for machine_state in \
161 table[current_machine_state]['epsilon'] if not machine_state == 'und']
162 next_tape_index = string_position + 1
163
164 try:
165 other_states = [(machine_state,next_tape_index) for machine_state in \
166 table[current_machine_state][current_input] if not machine_state == 'und']
167 except:
168
169 other_states = []
170 states.extend(other_states)
171 return states
172
174 """
175 Recognize string C{string} with nondeterministic FSA C{machine}, returning
176 True if C{string} belongs to the language defined by C{machine}, else False.
177
178 Differs from L{simple_nondet_recognize} in calling L{generate_new_states},
179 instead of L{simple_generate_new_states},which reports backtracking and symbols
180 not in the input alphabet. But accepts and rejects the same classes of strings.
181
182 >>> nondet_recognize(M3, 'baa!')
183 Trying (0, 0)
184 ...
185 Failing search state (2, 3): Backtracking
186 ....
187 True
188 >>> nondet_recognize(M3, 'baa')
189 ...
190 False
191 >>> nondet_recognize(M3, 'Baa!')
192 Trying (0, 0)
193 B not in input alphabet
194 Failing search state (0, 0): Backtracking
195 False
196 >>> nondet_recognize(M3, 'baah!')
197 Trying (0, 0)
198 ...
199 Failing search state (2, 3): Backtracking
200 ...
201 False
202
203 @param string: The string being recognized.
204 @param machine: The machine defining the language.
205 @rtype: Boolean.
206 """
207 (table,finals) = machine
208 agenda = []
209 last_index = len(string)
210 agenda.extend([(0,0)])
211 current_search_state = pop_agenda(agenda)
212 while True:
213 if accept_state(current_search_state,last_index,finals):
214 return True
215 elif current_search_state[1] < last_index:
216 agenda.extend(generate_new_states(current_search_state,table,string))
217 if agenda:
218 current_search_state = pop_agenda(agenda)
219 else:
220 return False
221
223 """
224 Returns new states compatible with C{search_state}, C{table}, and
225 C{string} (see L{simple_generate_new_states}), but in addition
226 reports backtracking and symbols not in the input alphabet.
227
228 @param search state: the currenmt search state just popped off C{agenda}.
229 @param table: the TransitionTable for the current machine.
230 @param string: the string being recognized.
231 @rtype: a list of search states
232 """
233 (current_machine_state,string_position) = search_state
234 current_input = string[string_position]
235
236 states = [(machine_state,string_position) for machine_state in \
237 table[current_machine_state]['epsilon'] if not machine_state == 'und']
238 next_tape_index = string_position + 1
239 if states:
240 print ' %s[eps] ==> %s' % (search_state,states)
241
242 try:
243 other_states = [(machine_state,next_tape_index) for machine_state in \
244 table[current_machine_state][current_input] if not machine_state == 'und']
245 except:
246
247 print ' %s not in input alphabet' % current_input
248 other_states = []
249 states.extend(other_states)
250 if other_states:
251 print ' %s[%s] ==> %s' % (search_state,current_input,other_states)
252 if not states:
253 print ' Failing search state %s: Backtracking' % (search_state,)
254 return states
255
256
257
259 next_search_state = agenda.pop()
260 print 'Trying %s' % (next_search_state,)
261 return next_search_state
262
263
264
265 nd_table = [ {'a':( 'und',), 'b':( 1,), '!':('und',),'epsilon': ('und',)},
266 {'a':( 2,), 'b':('und',), '!':('und',),'epsilon': ('und',)},
267 {'a':( 3,2), 'b':( 'und',), '!':('und',),'epsilon': ('und',)},
268 {'a':( 'und',), 'b':( 'und',), '!':(4,),'epsilon': ('und',)},
269 {'a':( 'und',), 'b':( 'und',), '!':('und',),'epsilon': ('und',)}
270 ]
271
272
273 nd_table_eps = [ {'a':( 'und',), 'b':( 1,), '!':('und',),'epsilon': ('und',)},
274 {'a':( 2,), 'b':('und',), '!':('und',),'epsilon': ('und',)},
275 {'a':( 3,), 'b':( 'und',), '!':('und',),'epsilon': ('und',)},
276 {'a':( 'und',), 'b':( 'und',), '!':(4,),'epsilon': (2,)},
277 {'a':( 'und',), 'b':( 'und',), '!':('und',),'epsilon': ('und',)}
278 ]
279
280
281 nd_finals = (4,)
282
283 M3 = (nd_table,nd_finals)
284 M4 = (nd_table_eps,nd_finals)
285
286
287
288
289
291 """
292 Return a copy of C{table}, C{new_table}, such that
293 C{new_table} adds a sink state, k. For each I{sym} in alphabet,
294 new_table[k][sym]=k. For each I{sym} in alphabet and each
295 state i, if table[i][I{sym}]='und', new_table[i][I{sym}]=k.
296
297 @param table: a list of k dictionaries, all defined for the same alphabet, all with
298 numerical values < k, representing an FSA transition table, each k
299 corresponding to a state.
300 @rtype: TransitionTable
301 """
302 new_state = len(table)
303 alphabet = table[0].keys()
304 table_copy = [D.copy() for D in table]
305 for D in table_copy:
306 for sym in alphabet:
307 if D[sym] == 'und':
308 D[sym] = new_state
309 new_state_table = {}
310 for sym in alphabet:
311 new_state_table[sym] = new_state
312 return table_copy+[new_state_table]
313
315 (table, finals) = machine
316 new_table = add_sink_state(table)
317 index = 0
318 state = 0
319 last_index = len(string)
320 while index < last_index:
321 new_state = new_table[state][string[index]]
322 state = new_state
323 index = index +1
324 if state in finals:
325 return True
326 else:
327 return False
328