.. _cky_implementation:


.. highlight:: python
   :linenothreshold: 6

===================
A CKY Implementation
===================


The code in *cky_recognizer.py* gives
a simple implementation of a CKY recognizer which also accesses some
small grammars suitable for debugging and demoing.  The code requires
several python modules to all be in the same directory to work, all
downloadable from the links below.

1. :download:`cky_recognizer.py <cky_recognizer.py>`
2. :download:`supertuple.py <supertuple.py>`
3. :download:`small_grammar.py <small_grammar.py>`
4. :download:`multivalued_dictionary.py <multivalued_dictionary.py>`

Running the parser
------------------

The cky_recognizer module defines a class CKYParser.

To create CKYParser instance *p* with grammar *g*::

        p = CKYParser(g)

The parse_input method of this class takes a string argument will
parse the given string using the grammar given at initialization time.
    
To parse string *i* with context-free-grammar *g*::

        >>> i = "salespeople sold the dog biscuits"
        >>> g = not_so_toy_grammar()
        >>> cnf_g = cnf_grammar(g)
        >>> parser = CKYParser(cnf_g)
        >>> parser.parse_input(i,True)


The *cnf_grammar* function being called in line 3 has a single argument
which is an arbitrary context-free grammar.  The call to
*cnf_grammar* turns the grammar into a CNF grammar and stores it in a
format sometimes called **bottum-up grammar** format.  This format is
suitable for efficient use with a CKY parser.

Bottum-up format is very simple.  A CNF rule with probability:

* A ->  B C  [.8]

is stored in a dictionary whose keys are first daughters
and who values are pairs of mothers and second daughters.
So in the simplest case, the above rule would be stored as::

  >>> BU_Grammar[B] = (A,C, 0.8)

Actually there may be more than one rule with the same first daughter,
so we might have:

* A ->  B C  [.8]
* D ->  B E  [.6]

So in general the bottum_grammar needs to associate
a list of rules with any first daughter.  For the above,
we would have::

  >>> BU_Grammar[B] = [(A,C,.8), (D,E,.6)]

The implementation below actually stores each pair as 
as an object instance with attributes *next_dtr* and
*mother*, so if we do::

  >>> rules = BU_Grammar[B]

And the rules[0] is the first rule above, then A and C can be accessed 
as follows::

  >>> rules[0].next_dtr
  C

  >>> rules[0].mother_cat
  A

For a grammar with probabilities, the probability can be accessed
as::

  >>> rules[0].prob
  0.8


The CKY loop
------------

The heart of the parser is the CKY loop. This looks 
like this::

        for j in range(1,len(self.chart)):
            for cat in self.grammar.lexicon[self.input[j-1]]:
                self.add_lexical_edge(cat,j)
            for i in range(j-2,-1,-1):
                    for k in range(i+1, j):
                        for cat in self.chart[k][i]:
                            for rule in self.grammar.find_BURules(cat):
                                cat2 = rule.next_dtr
                                if cat2 in self.chart[j][k]:
                                    self.add_to_chart(j,i,rule.mother_cat)