1 """
2 This module defines a deterministic finite-state recognizer. No S{epsilon}-transitions are allowed.
3
4 An FSA (here simply called a C{Machine}) is a triple of an alphabet, a TransitionTable and a tuple of final states:
5
6 - C{alphabet}: A sequence, the elements define the set of symbols in the alphabet of C{Machine}
7 - C{TransitionTable}: A list of StateTransitionFunctions
8 - C{StateTransitionFunction}: A dictionary whose keys are the members of the input alphabet and whose values are the states transitioned to.
9
10 Keys and values in transition functions come from the following sets:
11
12 - C{alphabet}: Must be the same for all states in a machine
13 - C{states}: the set of states of the machine, the first n-1 integers (including 0) for a machine of n states
14 - C{undefined}: the string 'und', used as the value in a StateTransitionFunction for an alphabet symbol for which the state has no defined transition.
15
16 Example machine:
17
18 >>> Table = [{'b':1},{'a':1,'b':2,'c':0},{'a':2,'b':2}]
19 >>> Finals = (0,)
20 >>> Machine = ('abc', Table, Finals)
21
22 In this machine, the alphabet is M{{a,b,c}}, and there are 3 states, 0,
23 1, and 2, defined by 3 StateTransitionFunctions (U{diagram<http://www-rohan.sdsu.edu/~gawron/compling/chap2/silly_graph.pdf>}). Lists are indexed from
24 0, so the state-transition function for state 2 is the 3rd in C{Table}.
25
26 >>> get_transition_value(Table[2],'b')
27 2
28
29 When in state 2, transition back to state 2 upon seeing a 'b'.
30
31 >>> get_transition_value(Table[2],'c')
32 'und'
33
34 State 2 is undefined for the input 'c'. The FSA fails when seeing a
35 'c' in state 2.
36
37 @attention: a machine's state dictionaries must all be defined for the same alphabet.
38 @var finals1: The tuple of final states for C{M1}
39 @type finals1: tuple
40 @var table1: The TransitionTable for C{M1}.
41 @type table1: TransitionTable
42 @var M1: The deterministic sheep language machine (U{Jurafsky and Martin, Speech and Natural Language Processing, Fig 2.12<http://www-rohan.sdsu.edu/~gawron/compling/chap2/fig02.12.pdf>})
43 @type M1: Machine
44 @sort: simple_det_recognize get_transition_value set_transition_value det_recognize add_sink_state
45 @author: Jean Mark Gawron
46 @contact: gawron@mail.sdsu.edu
47 """
48
49
50 table1 = [ {'b':1},
51 {'a': 2},
52 {'a': 3},
53 {'a':3,'!':4},
54 { }
55 ]
56
57 finals1 = (4,)
58
59 M1 = ('ab!',table1,finals1)
60
61
62
63
64
65
66
67
68
69
70
71
73 """
74 See U{Jurafsky and Martin, Speech and Natural Language Processing, Fig 2.13<http://www-rohan.sdsu.edu/~gawron/compling/chap2/fig02.13.pdf>}
75
76 >>> simple_det_recognize(M1,'Baaa!')
77 False
78 >>> simple_det_recognize(M1,'baaa!')
79 True
80 >>> simple_det_recognize(M1,'baaa')
81 False
82
83 @param machine: a triple of an alphabet, a transition table and a list of final states
84 @type machine: tuple
85 @param string: the string to be accepted or rejected
86 @rtype: boolean
87 @return: True if the string is accepted by C{machine}
88 """
89 (alphabet,table,finals) = machine
90 index = 0
91 state = 0
92 for index in range(len(string)):
93 state = get_transition_value(table[state],string[index])
94 if state == 'und':
95 return False
96 if state in finals:
97 return True
98 else:
99 return False
100
101
103 """
104 Having this function wrapper here allows
105 some abstraction in how transition functions are
106 implemented.
107
108 @param transitions: transition map from chars to next state for this state
109 @type transitions: dictionary (Java HashMap)
110 @rtype: C{int} or C{'und'}
111 """
112 if transitions.has_key(symbol):
113 return transitions[symbol]
114 else:
115 return 'und'
116
118 """
119 Having this function wrapper here allows
120 some abstraction in how transition functions are
121 implemented.
122
123 @param transitions: transition map from chars to next state for this state
124 @type transitions: dictionary (Java HashMap)
125 @param sym: the symbol for which the transition mapping is being defined.
126 @rtype: C{int} or C{'und'}
127 """
128 transitions[sym] = value
129
131 """
132 Differs from L{simple_det_recognize} in printing
133 explicit error diagnostics for failure. For example, for the case where symbols
134 not in the alphabet occur in B{string}. Instead of raising
135 a KeyError, X{det_recognize} printa an explicit error message
136 and fails.
137
138 Also: reports when a state is undefined for input symbol
139 in alphabet before failing, And reports being in a nonfinal
140 state at ethe termination of input before failing.
141
142 @param machine: a pair of a transition table and a list of final
143 @type machine: tuple
144 @param string: the string to be accepted or rejected
145 @rtype: boolean
146 @return: True if the string is accepted by B{machine}
147 """
148 (alphabet,table,finals) = machine
149 index = 0
150 state = 0
151 for index in range(len(string)):
152 sym = string[index]
153 transitions = table[state]
154 if sym in alphabet:
155 old_state = state
156 state = get_transition_value(transitions,sym)
157 else:
158 print '%s not in input alphabet' % sym
159 return False
160 if state == 'und':
161 print 'State %s not defined for %s' % (old_state, sym)
162 return False
163 if state in finals:
164 return True
165 else:
166 print 'Ending in non-final state %s' % state
167 return False
168
169
171 """
172 Return a new machine just like C{machine} but with an added sink state, k.
173 If C{new_table} is the new machine's TransitionTable, and C{table} is
174 C{machine}'s TransitionTable, then for each I{sym} in alphabet,
175 get_transition_value(k, I{sym}) = k. For each I{sym} in alphabet and each
176 state i, if get_transition_value(i, I{sym}) = 'und', get_transition_value(i, I{sym}) = k.
177
178 >>> M2=add_sink_state(M1)
179 >>> det_recognize(M1,'baa!')
180 True
181 >>> det_recognize(M2,'baa!')
182 True
183 >>> det_recognize(M1,'baab')
184 State 3 not defined for b
185 False
186 >>> det_recognize(M2,'baab')
187 Ending in non-final state 5
188 False
189 >>> det_recognize(M1,'Baa!')
190 B not in input alphabet
191 False
192 >>> det_recognize(M2,'Baa!')
193 B not in input alphabet
194 False
195
196 @param machine: the machine to be transformed
197 @rtype: Machine
198 """
199 (alphabet, table, finals) = machine
200 num_states = len(table)
201 sink_transitions = { }
202 for sym in alphabet:
203
204
205 set_transition_value(sink_transitions,sym,num_states)
206 new_table = [ ]
207 for i in range(num_states):
208 new_transitions = { }
209 new_transitions.update(table[i])
210 new_table.append(new_transitions)
211 for sym in alphabet:
212 tran_state = get_transition_value(new_transitions,sym)
213 if tran_state == 'und':
214
215 set_transition_value(new_transitions,sym,num_states)
216
217 new_table.append(sink_transitions)
218 return (alphabet,new_table,finals)
219