The grammar and lexicon:
pp -> p np vpg -> vbg np X2 -> X1 pp vp -> vbz np vp -> X1 pp vp -> X2 pp ap -> rb a s -> np vp np -> dt nbar np -> X3 np np -> ap nbar np -> nbar pp np -> n n np -> vbg np X3 -> np cc X1 -> vbz np nbar -> ap nbar nbar -> nbar pp nbar -> n n nbar -> vbg np
a: dt and: cc use: n np nbar codes: n np nbar sees: vbz of: p agency: n np nbar rapidly: rb handling: vbg labor: n np nbar volume: n np nbar as: p costs: n np nbar controlling: vbg way: n np nbar growing: a ap mail: n np nbar the: dt widespread: a ap
Your task was to explain the entries in the last column of the chart, which included all entries ending at index 10.
The modified chart below contains explanations for all the edges, lexical and non lexical. The lexical edges are for all spans of length 1, so the only ones you were responsible for in the assignment were those in cell (9,10).
np -> dt nbar
You can report your discovery of a rule edge in a simple format called a dtr record. Just give cat, the span, cat1 and cat2, and the value of k. So for our example, this is:
np (8,10) (det,9,nbar)
Finally, this sentence has 5 parses in this grammar. This is because certain edges contributing to the s edge in the (0,10) cell can be built in more than one way, given the grammar. These are called ambiguous edges. Find these ambiguous edges. They may not be in the last column, but they are in the chart, Give the lexical records or daughter records for these edges. You do not have to find all the ambiguous edges, just the ones contributing to the final 5 parses. Finally you do not have to find all the edges contributing to the final 5 parses, just the ambiguous ones.
The explanation for why the sentence has 5 parses is contained in the last column.
Although S(0,10) can only be built one way, it is built out of np(0,2) and vp(2,10), and vp(2,10) can be built 4 ways:
VP(2, 10) --------------- 1. (vbz,3,np) 2. (X1, 4, pp) 3. (X2,7,pp) 4. (X1, 7,pp)
np(3, 10) --------------- 1. (nbar, 4, pp) 2. (nbar, 7, pp)
The five parses are below: Notice the way edges are shared between the parses np(0,2) and pp(7,10) occur in all 5 parses. X2(2,4) occurs in parses 3 and 5. np(5,10) and the pp containing it, pp(4,10), occur in parses 1 and 3. All are found only once by the CKY algorithm, but are reused in building the trees that contain them.
Because the grammar is a Chomsky Normal Form Grammar, each tree for the 10-word sentence has exactly the same number of nodes, 19 (2n - 1), only 10 of which (n - 1) are non-branching. That's a total of 50 branching nodes (5 * 10) for the 5 parses. Below are the 20 daughter records used in building the 5 parses above, excluding the lexical records, arranged under the 16 edges they build. Each of the 50 branching nodes in the parse tree is licensed by one of thee 20 daughter records, so clearly one daughter record can license multiple tree nodes.