Class SmallParser
source code
- Known Subclasses:
-
Class providing general methods for a modest class of chart parsers.
The idea is that a SmallParser
instance p1 is responsible
for constructing chart edges sufficient for building all parse trees for
the current input p1.input. This set of chart edges is called a parse
forest because it is an efficient densely packed representation of all
the parses.
A parse forest should be stored in the parser's dtr_dict as a
dictionary in the form:
p1.dtr_dict[edge] = List of DtrLists
where each DtrList is the list of dtr edges in one local subtree, For
example, suppose 'see a man with a telescope' has 2 parses with two local
subtrees:
vp:e1 | vp:e1
/ \ | / vp:e2 pp:e3 | v:e4 np:e5
/ \ / \ | / / / \ / \ | / / see a man with a tel.| see a man with a tel.
Parse nodes have been annotated with the names of the edge
representing them. Then for a parser p1 that has just parsed this
ambiguous sentence, p1.dtr_dict[e1] is a list of 2 dtr lists:
p1.dtr_dict[e1] = [[e2,e3],[e4,e5]]
This code provides initialization structure and parse tree handling
functionality for a parser meeting the above specs.
To parse string s: p1.parse_input(s). Returns a list of parse trees.
To parse string s with tracing on: p1.parse_input(s,True) To inspect
chart after parsing: p1.display_chart() To inspect dtr dictionary after
parsing: p1.display_dtr_dict() To find the set of parse trees associated
with an edge e1:
self.get_nltk_parse_trees(e1)
To find s-edge spanning input: p1.find_spanning_edge() To find
tokenized data: self.input [a list] To find lowercased data string:
self.data [a string] To see list of parse trees (after parsing):
p1.parses To see list of pretty printed parse trees:
self.print_nltk_parses() This returns set of pprinted strings as
well.
To draw parse trees in a succession of Tkinter canvass windows:
self.draw_nltk_parses()
To see if a parse exists (after parsing):
self.parse_exists()
This returns True or False.
To find a parse edge of a given description:
p1.find_parse_edge(...)
Arguments vary with different parsers and different notions of a
omplete edge description.
To draw all parses for an edge of a given description (after
parsing!):
p1.draw_nltk_parse_trees_for_edge (...)
|
__init__(self,
grammar={ } ,
data='
' ,
trace=False) |
source code
|
|
|
|
|
|
|
|
|
|
|
|
|
find_spanning_edge(self)
Abstract method: Shd be defined to return a spanning edge from which
dtr edges can be recursively retrieved from self.dtr_dict |
source code
|
|
|
|
|
|
|
|
|
|
|
get_list_parse_trees(self,
edge)
Analogue of get_nltk_parse_trees that doesnt return nltk trees, just
ordinary list-representations of trees. |
source code
|
|
|
|
|
display_chart(self)
Abstract method: Shd be defined with parser-specific print methods to
print each edge in the chart. |
source code
|
|
|
|
|
|
|
|
|
|
Pretty naive tokenizer.
Lower case everything. Split on spaces.
|
Abstract method. Implements a particular a particular chart parsing
algorithm.
|
Analogue of draw_nltk_parses for list representations of tree. Note:
nltk still needed.
|
tex_output_parses(self,
filename,
scale=1.0)
| source code
|
This method is called as follows:
>>> p1.tex_output_parses('parses.tex')
This creates a LaTex file which contains latex tree drawing
instructions for all parses of the last sentence parsed. The LaTex qtree
package is assumed to be installed. self.parses contains a
list of nltk trees for which the nltk method
pprint_latdex_qtree is called.
If the parse trees are not fitting on a page, use the optional second
argument of the method, which will will shrink the trees using the
graphics package scalebox command:
>>> p1.tex_output_parses('parses.tex', 0.6)
prints trees shrunk to .6 their original dimensions.
|