Home | Trees | Indices | Help |
---|
|
A finite state transducer (FST) recognizer.
Our example language relation is transduced sheep language. The upper alphabet will be:
The upper language will be sheep language. The lower alphabet will be:
We will transduce all as into 1s, bs into 2s and 3s and pass all exclamation points through. So the feasible pairs will be:
We generalize a TransitionTable
to FSTs: an FST will be a
list of dictionaries of dictionaries.
For a state σ that includes the following transitions:
the corresponding state-transition function is:
{'a': {1: (4,5)} 'b': {2: (3, )}}
|
|||
Verbose |
|
|||
Boolean |
|
||
|
|||
|
|||
List of strings |
|
||
List |
|
||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|
|||
epsilon_print_name =
|
|||
alphabet =
|
|||
arity_dec_re = re.compile(r'Arity:\s
|
|||
att_val_re = re.compile(r'\s
|
|||
att_val_re2 = re.compile(r'\s
|
|||
blank_line_re = re.compile(r'\s
|
|||
char_interval_re = re.compile(r'
|
|||
einsertion_machine transducer for the einsertion rule given in (Jurafsky and Martin, Speech and Natural Language Processing, Fig 3.14). |
|||
flags_re = re.compile(r'Flags:
|
|||
graph_type_re = re.compile(r'digraph')
|
|||
label_re = re.compile(r'
|
|||
net_dec_re = re.compile(r'Net:')
|
|||
node_connection_re = re.compile(r'\s
|
|||
node_dec_re = re.compile(r'\s
|
|||
node_name_re = re.compile(r'\s
|
|||
sigma_dec_re = re.compile(r'Sigma:
|
|||
size_dec_re = re.compile(r'Size:\s
|
|||
state_desc_re = re.compile(r'
|
|||
transduced_sheep_language =
|
|||
transduced_sheep_language2 =
|
|||
transducer_finals =
|
|||
transductions =
|
|||
transductions2 =
|
|
Return True if Uses an agenda just as in the nondet recognition algorithm in fsa_recognizer.nondet_fsa_rec.nondet_recognize. However an agenda search state now consists of a triple, a machine state and two string positions, the position on the upper tape and the position on the lower tape. >>> nondet_transduce_recognize('baa!','211!', transduced_sheep_language) (0, 0, 0)[b:2] ==> [(1, 1, 1)] (1, 1, 1)[a:1] ==> [(2, 2, 2)] (2, 2, 2)[a:1] ==> [(3, 3, 3), (2, 3, 3)] Failing search state (2, 3, 3): Backtracking (3, 3, 3)[!:!] ==> [(4, 4, 4)] True >>> nondet_transduce_recognize('baaa!','3111!', transduced_sheep_language) (0, 0, 0)[b:3] ==> [(1, 1, 1)] (1, 1, 1)[a:1] ==> [(2, 2, 2)] (2, 2, 2)[a:1] ==> [(3, 3, 3), (2, 3, 3)] (2, 3, 3)[a:1] ==> [(3, 4, 4), (2, 4, 4)] Failing search state (2, 4, 4): Backtracking (3, 4, 4)[!:!] ==> [(4, 5, 5)] True More examples: >>> nondet_transduce_recognize('baa','211',transduced_sheep_language) ... False >>> nondet_transduce_recognize('baa!','211!',transduced_sheep_language) ... True >>> nondet_transduce_recognize('baaaa!','21111!',transduced_sheep_language) ... True >>> nondet_transduce_recognize('baaaa!','2111!',transduced_sheep_language) ... False The last example fails: Too many a's, not enough 1's. >>> nondet_transduce_recognize('baaa!','21111!',transduced_sheep_language) False Too many 1's, not enough a's. Recognition is initialized with a search consisting of the the starting machine state and the start of tape1 and tape2: Initial search state = (starting_machine_state, beginning of tape1, beginning of tape2) = (0,0,0)
|
We assume every input alphabet includes ε and therefore that every transition table is defined for ε. We deal with ε-transitions first, if any. Compare fsa_recognizer.nondet_fsa_rec.generate_new_states). |
Return the set of strings for lower langauge that are paired by
Uses an agenda just as in the nondet recognition algorithm in fsa_recognizer.nondet_fsa_rec.nondet_recognize. However an agenda search state now consists of a triple, a machine state and a surface string position, and the lexical string constructed thus far. >>> apply_dn('baa!', transduced_sheep_language) (0, 0, 0)[b:2] ==> [(1, 1, 1)] (1, 1, 1)[a:1] ==> [(2, 2, 2)] (2, 2, 2)[a:1] ==> [(3, 3, 3), (2, 3, 3)] Failing search state (2, 3, 3): Backtracking (3, 3, 3)[!:!] ==> [(4, 4, 4)] {211!} More examples: >>> apply_dn('baa',transduced_sheep_language) ... {} >>> apply_dn('baa!',transduced_sheep_language) ... {211!} >>> apply_dn('baaaa!',transduced_sheep_language) ... {21111!}
|
Return the set of strings for surface langauge that are paired by
Uses an agenda just as in the nondet recognition algorithm in fsa_recognizer.nondet_fsa_rec.nondet_recognize. However an agenda search state triple now consists of a machine state, a lex string position, and the surface string constructed thus far. >>> apply_up('311!',transduced_sheep_language) 0[3:b] ==> 1 1[1:a] ==> 2 b 2[1:a] ==> 3 ba 2[1:a] ==> 2 ba Failing search state SearchState(2, 3, 'baa'): Backtracking 3[!:!] ==> 4 baa ['baa!'] More examples: >>> apply_up('112',transduced_sheep_language) ... [] >>> apply_up('211!',transduced_sheep_language) ... [baa!] >>> apply_up('21111!',transduced_sheep_language) ... [baaaa!]
|
This code assumes transitions has been pre-compiled for either
analysis or generation. So Search is initialized with a search state consisting of the the starting machine state and the start of input tape and singleton set containing the empty string: Initial search state = (starting_machine_state, beginning of tape1, empty_string) = (0,0,'') |
add new search states to Also construct a list of items for printing debug info
|
Return a list of
(machine_state, surface_tape_position, related_string)
|
An FST is a list of states, each represented by its transducer transition fn (a dicts of dicts):
Each TTFi (Transducer Transition Function) is a function from lexical chars to an FSA Transition function (a function from chars to sets of states). Each embedded dictionary, call it Di, thus represents an FSA transition function. Transduce such a list of TTFs into a different list of dictionaries
Where each Li is just Di.items() [the list of key-value pairs] with the undefined transitions removed: Li = [pair for pair in Di.items() if state transition of pair is not undefined] |
An FST is a list of states, each represented by its transducer transition fn (a dicts of dicts):
Each TTFi (Transducer Transition Function) is a function from surface chars to an FSA Transition function (a function from chars to sets of states). Transduce Each TTF into a dictionary TF1';
Where each lex_chari is associated with the set of surface_chars that map to it in this state and each surface char, surf_charj is pairsed with the set of states that surf_charj:lex_chari map to in this state. |
Output epsilons are just represented as the empty string. Do this transduction here. |
|
einsertion_machinetransducer for the einsertion rule given in (Jurafsky and Martin, Speech and Natural Language Processing, Fig 3.14). Minor changes spelled out in the dot file ineinsertion.dot , from which this transducer is automatically
derived.
|
node_connection_re
|
state_desc_re
|
transduced_sheep_language
|
transduced_sheep_language2
|
transductions
|
transductions2
|
Home | Trees | Indices | Help |
---|
Generated by Epydoc 3.0.1 on Wed Feb 4 22:00:33 2009 | http://epydoc.sourceforge.net |