Package fsa_recognizer :: Module fta_rec

Module fta_rec

A finite state transducer (FST) recognizer.

Our example language relation is transduced sheep language. The upper alphabet will be:

{a, b, !}

The upper language will be sheep language. The lower alphabet will be:

{1, 2, 3}

We will transduce all as into 1s, bs into 2s and 3s and pass all exclamation points through. So the feasible pairs will be:

{a:1, b:2, b:3, !:!}

We generalize a TransitionTable to FSTs: an FST will be a list of dictionaries of dictionaries.

For a state σ that includes the following transitions:

σ[a:1] = (4,5)
σ[b:2] = (3, )

the corresponding state-transition function is:

 {'a': {1: (4,5)}
  'b': {2: (3, )}}

Classes

[hide private]

Verbose

Functions

[hide private]

Boolean

nondet_transduce_recognize(string1, string2, machine)
Return True if string1 is related to string2 by the regular relation defined in the FST given by transitions and finals. source code

transduce_accept_state(search_state, last_index1, last_index2, finals)

source code

transduce_analyze_new_states(search_state, transitions, string1, string2)
We assume every input alphabet includes ε and therefore that every transition table is defined for ε.

source code

List of strings

apply_dn(surface_string, machine, debug=False)
Return the set of strings for lower langauge that are paired by machine with string. source code

List

apply_up(lex_string, machine, debug=False)
Return the set of strings for surface langauge that are paired by machine with lexical string lex_string. source code

find_related_string(transitions, finals, input_string, debug)
This code assumes transitions has been pre-compiled for either analysis or generation.

source code

transduce_relate_accept_state(search_state, last_index, finals)

source code

add_new_search_states(pairs_list, current_input, new_tape_position, search_states, state, r_str, print_items)
add new search states to search_states using the (new_char, new_states) pairs in pairs_list, current input current_input, and the related string r_str built thus far. source code

transduce_relate_new_states(search_state, transitions, string, debug)
Return a list of new_search_states given the current_search_state, transitions from the current transducer, and a surface string. source code

make_generation_transducer(transitions, debug=False)
An FST is a list of states, each represented by its transducer transition fn (a dicts of dicts):

source code

make_analysis_transducer(transitions, debug=False)
An FST is a list of states, each represented by its transducer transition fn (a dicts of dicts):

source code

handle_eps(pair)
Output epsilons are just represented as the empty string.

source code

print_transitions_table(transitions, debug=True)

source code

Variables

[hide private]

epsilon_print_name = 'eps'

alphabet = 'abcdefghijklmnopqrstuvwxyz'

arity_dec_re = re.compile(r'Arity:\s*(\d+)')

att_val_re = re.compile(r'\s*,?(\w+)\s*=\s*(\d+)')

att_val_re2 = re.compile(r'\s*,?(\w+)\s*=\s*"(.+?)"')

blank_line_re = re.compile(r'\s+$')

char_interval_re = re.compile(r'(.)\.\.(.)')

einsertion_machine
transducer for the einsertion rule given in (Jurafsky and Martin, Speech and Natural Language Processing, Fig 3.14).

flags_re = re.compile(r'Flags:(.*)$')

graph_type_re = re.compile(r'digraph')

label_re = re.compile(r'[ ,\{\}]')

net_dec_re = re.compile(r'Net:')

node_connection_re = re.compile(r'\s*(\w+)\s*->\s*(\w+)\s*\[(....

node_dec_re = re.compile(r'\s*node\s*\[(.*)\]\s*;')

node_name_re = re.compile(r'\s*(\w+)\s*\[(.*)\]\s*;')

sigma_dec_re = re.compile(r'Sigma:(.*)$')

size_dec_re = re.compile(r'Size:\s*(\d+),')

state_desc_re = re.compile(r'(.?s\d+):(:?\s*(.+)\s*->\s*(.+)\....

transduced_sheep_language = ([{'!': {'!': ('und'), '1': ('und'...

transduced_sheep_language2 = ([{'!': {'!': ('und'), '1': ('und...

transducer_finals = (4)

transductions = [{'!': {'!': ('und'), '1': ('und'), '2': ('und...

transductions2 = [{'!': {'!': ('und'), '1': ('und'), '2': ('un...

Function Details

[hide private]

nondet_transduce_recognize(string1, string2, machine)

source code

Return True if string1 is related to string2 by the regular relation defined in the FST given by transitions and finals.

Uses an agenda just as in the nondet recognition algorithm in fsa_recognizer.nondet_fsa_rec.nondet_recognize. However an agenda search state now consists of a triple, a machine state and two string positions, the position on the upper tape and the position on the lower tape.

>>> nondet_transduce_recognize('baa!','211!', transduced_sheep_language)
(0, 0, 0)[b:2] ==> [(1, 1, 1)]
(1, 1, 1)[a:1] ==> [(2, 2, 2)]
(2, 2, 2)[a:1] ==> [(3, 3, 3), (2, 3, 3)]
Failing search state (2, 3, 3): Backtracking
(3, 3, 3)[!:!] ==> [(4, 4, 4)]
True
>>> nondet_transduce_recognize('baaa!','3111!', transduced_sheep_language)
(0, 0, 0)[b:3] ==> [(1, 1, 1)]
(1, 1, 1)[a:1] ==> [(2, 2, 2)]
(2, 2, 2)[a:1] ==> [(3, 3, 3), (2, 3, 3)]
(2, 3, 3)[a:1] ==> [(3, 4, 4), (2, 4, 4)]
Failing search state (2, 4, 4): Backtracking
(3, 4, 4)[!:!] ==> [(4, 5, 5)]
True

More examples:

>>> nondet_transduce_recognize('baa','211',transduced_sheep_language)
...
False
>>> nondet_transduce_recognize('baa!','211!',transduced_sheep_language)
...
True
>>> nondet_transduce_recognize('baaaa!','21111!',transduced_sheep_language)
...
True
>>> nondet_transduce_recognize('baaaa!','2111!',transduced_sheep_language)
...
False

The last example fails: Too many a's, not enough 1's.

>>> nondet_transduce_recognize('baaa!','21111!',transduced_sheep_language)
False

Too many 1's, not enough a's.

Recognition is initialized with a search consisting of the the starting machine state and the start of tape1 and tape2:

   Initial search state = (starting_machine_state, beginning of tape1,
                           beginning of tape2)
                        = (0,0,0)

Parameters:

string1 - upper string potentially a member of upper language
string2 - lower string potentially a member of lower language
transitions - an FSTTransitionTable
finals - a tuple of final states.

Returns: Boolean

transduce_analyze_new_states(search_state, transitions, string1, string2)

source code

We assume every input alphabet includes ε and therefore that every transition table is defined for ε.

We deal with ε-transitions first, if any. Compare fsa_recognizer.nondet_fsa_rec.generate_new_states).

apply_dn(surface_string, machine, debug=False)

source code

Return the set of strings for lower langauge that are paired by machine with string. Return the empty set if there are none.

Uses an agenda just as in the nondet recognition algorithm in fsa_recognizer.nondet_fsa_rec.nondet_recognize. However an agenda search state now consists of a triple, a machine state and a surface string position, and the lexical string constructed thus far.

>>> apply_dn('baa!', transduced_sheep_language)
(0, 0, 0)[b:2] ==> [(1, 1, 1)]
(1, 1, 1)[a:1] ==> [(2, 2, 2)]
(2, 2, 2)[a:1] ==> [(3, 3, 3), (2, 3, 3)]
Failing search state (2, 3, 3): Backtracking
(3, 3, 3)[!:!] ==> [(4, 4, 4)]
{211!}

More examples:

>>> apply_dn('baa',transduced_sheep_language)
...
{}
>>> apply_dn('baa!',transduced_sheep_language)
...
{211!}
>>> apply_dn('baaaa!',transduced_sheep_language)
...
{21111!}

Parameters:

surface_string - string potentially a member of surface language
transitions - an FSTTransitionTable
finals - a tuple of final states.

Returns: List of strings

apply_up(lex_string, machine, debug=False)

source code

Return the set of strings for surface langauge that are paired by machine with lexical string lex_string. Return the empty list if there are none.

Uses an agenda just as in the nondet recognition algorithm in fsa_recognizer.nondet_fsa_rec.nondet_recognize. However an agenda search state triple now consists of a machine state, a lex string position, and the surface string constructed thus far.

>>> apply_up('311!',transduced_sheep_language)
0[3:b] ==> 1 
1[1:a] ==> 2 b
2[1:a] ==> 3 ba
2[1:a] ==> 2 ba
Failing search state SearchState(2, 3, 'baa'): Backtracking
3[!:!] ==> 4 baa
['baa!']

More examples:

>>> apply_up('112',transduced_sheep_language)
...
[]
>>> apply_up('211!',transduced_sheep_language)
...
[baa!]
>>> apply_up('21111!',transduced_sheep_language)
...
[baaaa!]

Parameters:

lex_string - lex string potentially a member of lexical language
transitions - an FSTTransitionTable
finals - a tuple of final states.

Returns: List

find_related_string(transitions, finals, input_string, debug)

source code

This code assumes transitions has been pre-compiled for either analysis or generation. So input_string may be either a surface or lexical string, and transitions then represents the appropriate mapping to the set of related_strings to be found and returned.

Search is initialized with a search state consisting of the the starting machine state and the start of input tape and singleton set containing the empty string:

   Initial search state = (starting_machine_state, beginning of tape1,
                           empty_string)
                        = (0,0,'')

add_new_search_states(pairs_list, current_input, new_tape_position, search_states, state, r_str, print_items)

source code

add new search states to search_states using the (new_char, new_states) pairs in pairs_list, current input current_input, and the related string r_str built thus far.

Also construct a list of items for printing debug info print_items

transduce_relate_new_states(search_state, transitions, string, debug)

source code

Return a list of new_search_states given the current_search_state, transitions from the current transducer, and a surface string.

search_state is a triple of:

  (machine_state, surface_tape_position, related_string)

new_search_states will be a list of same computed by:

trying all permissible epsilon transitions for the machine state in search_state
trying all permissible transitions for the character at string[surface_tape_position]

make_generation_transducer(transitions, debug=False)

source code

An FST is a list of states, each represented by its transducer transition fn (a dicts of dicts):

[TTF1, TTF2, ...]
TTF1 = {lex_char1: D1, lex_char2:D2,... }

Each TTFi (Transducer Transition Function) is a function from lexical chars to an FSA Transition function (a function from chars to sets of states).

Each embedded dictionary, call it Di, thus represents an FSA transition function.

Transduce such a list of TTFs into a different list of dictionaries

[TTF1', TF2', ...]
TTF1' = {lex_char1: L1, lex_char2: L2, ...}

Where each Li is just Di.items() [the list of key-value pairs] with the undefined transitions removed:

   Li = [pair for pair in Di.items()                   if state transition of pair is not undefined]

make_analysis_transducer(transitions, debug=False)

source code

An FST is a list of states, each represented by its transducer transition fn (a dicts of dicts):

[TTF1, TTF2, ...]
TTF1 = {surf_char1: D1, surf_char2:D2,... }

Each TTFi (Transducer Transition Function) is a function from surface chars to an FSA Transition function (a function from chars to sets of states).

Transduce Each TTF into a dictionary TF1';

[TF1', TF2', ...]
TF' = {lex_char1: L1, lex_char2: L2, ...}

Where each lex_chari is associated with the set of surface_chars that map to it in this state and each surface char, surf_charj is pairsed with the set of states that surf_charj:lex_chari map to in this state.

handle_eps(pair)

source code

Output epsilons are just represented as the empty string. Do this transduction here.

Variables Details

[hide private]

einsertion_machine

transducer for the einsertion rule given in (Jurafsky and Martin, Speech and Natural Language Processing, Fig 3.14). Minor changes spelled out in the dot file in einsertion.dot, from which this transducer is automatically derived.

node_connection_re

Value:

re.compile(r'\s*(\w+)\s*->\s*(\w+)\s*\[(.*)\]\s*;')

state_desc_re

Value:

re.compile(r'(.?s\d+):(:?\s*(.+)\s*->\s*(.+)\.|\s*\(no arcs\))')

transduced_sheep_language

Value:

([{'!': {'!': ('und'),
         '1': ('und'),
         '2': ('und'),
         '3': ('und'),
         'epsilon': ('und')},
   'a': {'!': ('und'),
         '1': ('und'),
         '2': ('und'),
...

transduced_sheep_language2

Value:

([{'!': {'!': ('und'),
         '1': ('und'),
         '2': ('und'),
         '3': ('und'),
         'epsilon': ('und')},
   'a': {'!': ('und'),
         '1': ('und'),
         '2': ('und'),
...

transductions

Value:

[{'!': {'!': ('und'),
        '1': ('und'),
        '2': ('und'),
        '3': ('und'),
        'epsilon': ('und')},
  'a': {'!': ('und'),
        '1': ('und'),
        '2': ('und'),
...

transductions2

Value:

[{'!': {'!': ('und'),
        '1': ('und'),
        '2': ('und'),
        '3': ('und'),
        'epsilon': ('und')},
  'a': {'!': ('und'),
        '1': ('und'),
        '2': ('und'),
...