.. _cky_implementation: .. highlight:: python :linenothreshold: 6 =================== A CKY Implementation =================== The code in *cky_recognizer.py* gives a simple implementation of a CKY recognizer which also accesses some small grammars suitable for debugging and demoing. The code requires several python modules to all be in the same directory to work, all downloadable from the links below. 1. :download:`cky_recognizer.py ` 2. :download:`supertuple.py ` 3. :download:`small_grammar.py ` 4. :download:`multivalued_dictionary.py ` Running the parser ------------------ The cky_recognizer module defines a class CKYParser. To create CKYParser instance *p* with grammar *g*:: p = CKYParser(g) The parse_input method of this class takes a string argument will parse the given string using the grammar given at initialization time. To parse string *i* with context-free-grammar *g*:: >>> i = "salespeople sold the dog biscuits" >>> g = not_so_toy_grammar() >>> cnf_g = cnf_grammar(g) >>> parser = CKYParser(cnf_g) >>> parser.parse_input(i,True) The *cnf_grammar* function being called in line 3 has a single argument which is an arbitrary context-free grammar. The call to *cnf_grammar* turns the grammar into a CNF grammar and stores it in a format sometimes called **bottum-up grammar** format. This format is suitable for efficient use with a CKY parser. Bottum-up format is very simple. A CNF rule with probability: * A -> B C [.8] is stored in a dictionary whose keys are first daughters and who values are pairs of mothers and second daughters. So in the simplest case, the above rule would be stored as:: >>> BU_Grammar[B] = (A,C, 0.8) Actually there may be more than one rule with the same first daughter, so we might have: * A -> B C [.8] * D -> B E [.6] So in general the bottum_grammar needs to associate a list of rules with any first daughter. For the above, we would have:: >>> BU_Grammar[B] = [(A,C,.8), (D,E,.6)] The implementation below actually stores each pair as as an object instance with attributes *next_dtr* and *mother*, so if we do:: >>> rules = BU_Grammar[B] And the rules[0] is the first rule above, then A and C can be accessed as follows:: >>> rules[0].next_dtr C >>> rules[0].mother_cat A For a grammar with probabilities, the probability can be accessed as:: >>> rules[0].prob 0.8 The CKY loop ------------ The heart of the parser is the CKY loop. This looks like this:: for j in range(1,len(self.chart)): for cat in self.grammar.lexicon[self.input[j-1]]: self.add_lexical_edge(cat,j) for i in range(j-2,-1,-1): for k in range(i+1, j): for cat in self.chart[k][i]: for rule in self.grammar.find_BURules(cat): cat2 = rule.next_dtr if cat2 in self.chart[j][k]: self.add_to_chart(j,i,rule.mother_cat)