Package parser_course :: Package small_parsers :: Module small_parser :: Class SmallParser
[hide private]
[frames] | no frames]

Class SmallParser

source code

Known Subclasses:

Class providing general methods for a modest class of chart parsers. The idea is that a SmallParser instance p1 is responsible for constructing chart edges sufficient for building all parse trees for the current input p1.input. This set of chart edges is called a parse forest because it is an efficient densely packed representation of all the parses.

A parse forest should be stored in the parser's dtr_dict as a dictionary in the form:

  p1.dtr_dict[edge] =  List of DtrLists

where each DtrList is the list of dtr edges in one local subtree, For example, suppose 'see a man with a telescope' has 2 parses with two local subtrees:

          vp:e1           |        vp:e1
        /       \         |      /                vp:e2       pp:e3    |    v:e4    np:e5
    /     \     /     \   |    /     /               /       \   /       \  |   /     /                see   a  man with a tel.| see   a man with a tel.

Parse nodes have been annotated with the names of the edge representing them. Then for a parser p1 that has just parsed this ambiguous sentence, p1.dtr_dict[e1] is a list of 2 dtr lists:

       p1.dtr_dict[e1] = [[e2,e3],[e4,e5]]

This code provides initialization structure and parse tree handling functionality for a parser meeting the above specs.

To parse string s: p1.parse_input(s). Returns a list of parse trees. To parse string s with tracing on: p1.parse_input(s,True) To inspect chart after parsing: p1.display_chart() To inspect dtr dictionary after parsing: p1.display_dtr_dict() To find the set of parse trees associated with an edge e1:

          self.get_nltk_parse_trees(e1)

To find s-edge spanning input: p1.find_spanning_edge() To find tokenized data: self.input [a list] To find lowercased data string: self.data [a string] To see list of parse trees (after parsing): p1.parses To see list of pretty printed parse trees: self.print_nltk_parses() This returns set of pprinted strings as well.

To draw parse trees in a succession of Tkinter canvass windows:

              self.draw_nltk_parses()

To see if a parse exists (after parsing):

         self.parse_exists()

This returns True or False.

To find a parse edge of a given description:

         p1.find_parse_edge(...)

Arguments vary with different parsers and different notions of a omplete edge description.

To draw all parses for an edge of a given description (after parsing!):

       p1.draw_nltk_parse_trees_for_edge (...)
Instance Methods [hide private]
 
__init__(self, grammar={}, data='', trace=False) source code
 
initialize_chart(self) source code
 
tokenize(self, datastring)
Pretty naive tokenizer.
source code
 
reset_input(self, data) source code
 
parse_input(self, data='', trace=False) source code
 
process(self)
Abstract method.
source code
 
find_spanning_edge(self)
Abstract method: Shd be defined to return a spanning edge from which dtr edges can be recursively retrieved from self.dtr_dict
source code
 
parse_exists(self) source code
 
find_nltk_parses(self) source code
 
get_nltk_parse_trees(self, edge) source code
 
find_list_parses(self) source code
 
get_list_parse_trees(self, edge)
Analogue of get_nltk_parse_trees that doesnt return nltk trees, just ordinary list-representations of trees.
source code
 
find_parses(self) source code
 
display_chart(self)
Abstract method: Shd be defined with parser-specific print methods to print each edge in the chart.
source code
 
draw_nltk_parses(self) source code
 
print_nltk_parses(self) source code
 
draw_list_parses(self)
Analogue of draw_nltk_parses for list representations of tree.
source code
 
tex_output_parses(self, filename, scale=1.0)
This method is called as follows:
source code
Method Details [hide private]

tokenize(self, datastring)

source code 

Pretty naive tokenizer.

Lower case everything. Split on spaces.

process(self)

source code 

Abstract method. Implements a particular a particular chart parsing algorithm.

draw_list_parses(self)

source code 

Analogue of draw_nltk_parses for list representations of tree. Note: nltk still needed.

tex_output_parses(self, filename, scale=1.0)

source code 

This method is called as follows:

     >>> p1.tex_output_parses('parses.tex')

This creates a LaTex file which contains latex tree drawing instructions for all parses of the last sentence parsed. The LaTex qtree package is assumed to be installed. self.parses contains a list of nltk trees for which the nltk method pprint_latdex_qtree is called.

If the parse trees are not fitting on a page, use the optional second argument of the method, which will will shrink the trees using the graphics package scalebox command:

     >>> p1.tex_output_parses('parses.tex', 0.6)

prints trees shrunk to .6 their original dimensions.