Package parser_course :: Package small_parsers :: Module small_parser :: Class SmallParser

Class SmallParser

Known Subclasses:

Class providing general methods for a modest class of chart parsers. The idea is that a SmallParser instance p1 is responsible for constructing chart edges sufficient for building all parse trees for the current input p1.input. This set of chart edges is called a parse forest because it is an efficient densely packed representation of all the parses.

A parse forest should be stored in the parser's dtr_dict as a dictionary in the form:

  p1.dtr_dict[edge] =  List of DtrLists

where each DtrList is the list of dtr edges in one local subtree, For example, suppose 'see a man with a telescope' has 2 parses with two local subtrees:

          vp:e1           |        vp:e1
        /       \         |      /                vp:e2       pp:e3    |    v:e4    np:e5
    /     \     /     \   |    /     /               /       \   /       \  |   /     /                see   a  man with a tel.| see   a man with a tel.

Parse nodes have been annotated with the names of the edge representing them. Then for a parser p1 that has just parsed this ambiguous sentence, p1.dtr_dict[e1] is a list of 2 dtr lists:

       p1.dtr_dict[e1] = [[e2,e3],[e4,e5]]

This code provides initialization structure and parse tree handling functionality for a parser meeting the above specs.

To parse string s: p1.parse_input(s). Returns a list of parse trees. To parse string s with tracing on: p1.parse_input(s,True) To inspect chart after parsing: p1.display_chart() To inspect dtr dictionary after parsing: p1.display_dtr_dict() To find the set of parse trees associated with an edge e1:

          self.get_nltk_parse_trees(e1)

To find s-edge spanning input: p1.find_spanning_edge() To find tokenized data: self.input [a list] To find lowercased data string: self.data [a string] To see list of parse trees (after parsing): p1.parses To see list of pretty printed parse trees: self.print_nltk_parses() This returns set of pprinted strings as well.

To draw parse trees in a succession of Tkinter canvass windows:

              self.draw_nltk_parses()

To see if a parse exists (after parsing):

         self.parse_exists()

This returns True or False.

To find a parse edge of a given description:

         p1.find_parse_edge(...)

Arguments vary with different parsers and different notions of a omplete edge description.

To draw all parses for an edge of a given description (after parsing!):

       p1.draw_nltk_parse_trees_for_edge (...)

Instance Methods

[hide private]

__init__(self, grammar={}, data='', trace=False) source code

initialize_chart(self)

source code

tokenize(self, datastring)
Pretty naive tokenizer.

source code

reset_input(self, data)

source code

parse_input(self, data='', trace=False) source code

process(self)
Abstract method.

source code

find_spanning_edge(self)
Abstract method: Shd be defined to return a spanning edge from which dtr edges can be recursively retrieved from self.dtr_dict

source code

parse_exists(self)

source code

find_nltk_parses(self)

source code

get_nltk_parse_trees(self, edge)

source code

find_list_parses(self)

source code

get_list_parse_trees(self, edge)
Analogue of get_nltk_parse_trees that doesnt return nltk trees, just ordinary list-representations of trees.

source code

find_parses(self)

source code

display_chart(self)
Abstract method: Shd be defined with parser-specific print methods to print each edge in the chart.

source code

draw_nltk_parses(self)

source code

print_nltk_parses(self)

source code

draw_list_parses(self)
Analogue of draw_nltk_parses for list representations of tree.

source code

tex_output_parses(self, filename, scale=1.0)
This method is called as follows:

source code

Method Details

[hide private]

tokenize(self, datastring)

source code

Pretty naive tokenizer.

Lower case everything. Split on spaces.

process(self)

source code

Abstract method. Implements a particular a particular chart parsing algorithm.

draw_list_parses(self)

source code

Analogue of draw_nltk_parses for list representations of tree. Note: nltk still needed.

tex_output_parses(self, filename, scale=1.0)

source code

This method is called as follows:

     >>> p1.tex_output_parses('parses.tex')

This creates a LaTex file which contains latex tree drawing instructions for all parses of the last sentence parsed. The LaTex qtree package is assumed to be installed. self.parses contains a list of nltk trees for which the nltk method pprint_latdex_qtree is called.

If the parse trees are not fitting on a page, use the optional second argument of the method, which will will shrink the trees using the graphics package scalebox command:

     >>> p1.tex_output_parses('parses.tex', 0.6)

prints trees shrunk to .6 their original dimensions.