CKY Assignment Solution

The agency sees use of the codes as a way.

The grammar and lexicon:

Grammar

pp -> p np
vpg -> vbg np
X2 -> X1 pp
vp -> vbz np
vp -> X1 pp
vp -> X2 pp
ap -> rb a
s -> np vp
np -> dt nbar
np -> X3 np
np -> ap nbar
np -> nbar pp
np -> n n
np -> vbg np
X3 -> np cc
X1 -> vbz np
nbar -> ap nbar
nbar -> nbar pp
nbar -> n n
nbar -> vbg np

Lexicon

a: dt
and: cc
use: n np nbar
codes: n np nbar
sees: vbz
of: p
agency: n np nbar
rapidly: rb
handling: vbg
labor: n np nbar
volume: n np nbar
as: p
costs: n np nbar
controlling: vbg
way: n np nbar
growing: a ap
mail: n np nbar
the: dt
widespread: a ap

Part A

Your task was to explain the entries in the last column of the chart, which included all entries ending at index 10.

The modified chart below contains explanations for all the edges, lexical and non lexical. The lexical edges are for all spans of length 1, so the only ones you were responsible for in the assignment were those in cell (9,10).

np -> dt nbar

nbar

You can report your discovery of a rule edge in a simple format called a dtr record. Just give cat, the span, cat1 and cat2, and the value of k. So for our example, this is:

np (8,10) (det,9,nbar)

Part B

Finally, this sentence has 5 parses in this grammar. This is because certain edges contributing to the s edge in the (0,10) cell can be built in more than one way, given the grammar. These are called ambiguous edges. Find these ambiguous edges. They may not be in the last column, but they are in the chart, Give the lexical records or daughter records for these edges. You do not have to find all the ambiguous edges, just the ones contributing to the final 5 parses. Finally you do not have to find all the edges contributing to the final 5 parses, just the ambiguous ones.

The number of parses for an edge E(i,j) is the sum of the number of parses of its dtr records. By definition a lexical edge has just one parse.
Let D = (d1, k, d2), where d1 and d2 are edges. Then

There are 27 edges in red below. These are the only edges to contributed to some parse of the example sentence.

1
the 2
agency 3
sees 4
use 5
of 6
the 7
codes 8
as 9
a 10
way

0 dt
Lexical
np
(dt 1 nbar)

s
(np 2 vp)

s
(np 2 vp)

s
(np 2 vp)

1 np
Lexical
nbar
Lexical
n
Lexical
s
(np 2 vp)

s
(np 2 vp)

s
(np 2 vp)

2 vbz
Lexical
vp
(vbz 3 np)

X1
(vbz 3 np)

vp
(vbz 3 np)
(X1 4 pp)

X2
(X1 4 pp)

X1
(vbz 3 np)

vp
(vbz 3 np)
(X1 4 pp)
(X2 7 pp)
(X1 7 pp)

X2
(X1 4 pp)
(X1 7 pp)

X1
(vbz 3 np)

3 np
Lexical
nbar
Lexical
n
Lexical
np
(nbar 4 pp)

nbar
(nbar 4 pp)

np
(nbar 4 pp)
(nbar 7 pp)

nbar
(nbar 4 pp)
(nbar 7 pp)

4 p
Lexical
pp
(p 5 np)

pp
(p 5 np)

5 dt
Lexical
np
(dt 6 nbar)

np
(dt 6 nbar)

6 np
Lexical
nbar
Lexical
n
Lexical
np
(nbar 7 pp)

nbar
(nbar 7 pp)

7 p
Lexical
pp
(p 8 np)

8 dt
Lexical
np
(dt 9 nbar)

9 np
Lexical
nbar
Lexical
n
Lexical

Grammar

pp -> p np vpg -> vbg np X2 -> X1 pp vp -> vbz np vp -> X1 pp vp -> X2 pp ap -> rb a s -> np vp np -> dt nbar np -> X3 np np -> ap nbar np -> nbar pp np -> n n np -> vbg np X3 -> np cc X1 -> vbz np nbar -> ap nbar nbar -> nbar pp nbar -> n n nbar -> vbg np

Lexicon

a: dt and: cc use: n np nbar codes: n np nbar sees: vbz of: p agency: n np nbar rapidly: rb handling: vbg labor: n np nbar volume: n np nbar as: p costs: n np nbar controlling: vbg way: n np nbar growing: a ap mail: n np nbar the: dt widespread: a ap

Ambiguity

The explanation for why the sentence has 5 parses is contained in the last column.

Although S(0,10) can only be built one way, it is built out of np(0,2) and vp(2,10), and vp(2,10) can be built 4 ways:

   VP(2, 10)
---------------
1. (vbz,3,np)
2. (X1, 4, pp)
3. (X2,7,pp)
4. (X1, 7,pp)

So that means there are at least 4 parses. The reason there are 5 parses is that option 1 --- and only option 1 --- uses the np edge np(3,10) and that edge is two ways ambiguous:

   np(3, 10)
---------------
1. (nbar, 4, pp) 
2. (nbar, 7, pp)

Here are the two parse trees using the ambiguous np from 3 to 10, each using a different way of building it. Check the answer key above to make sure where each of the nodes in the parse tree is coming from. Each node is labeled with its start and end indices, to help find it in the table above. You should also verify that each tree obeys the rules of the grammar. Notice how many edges are shared between the parse trees. They all use pp(7,10). Two of them use PP(4,10). Two of them use X1(2,4), and so on. This is the key to how an exponential number of parse trees can be used using only n² memory.

Each parse tree has 15 nodes. That's 5 * 15 = 75 nodes. That's 75 parse tree nodes built using 27 chart edges, so that's quite a lot of re-use.

Here are the remaining three parses:

Finally, you should try to determine which of the parse trees is the correct one. The question is made more difficult by the fact that the original grammar has been distorted by the Chomsky Normal Form transformation. Remember the category names beginning with X have been introduced by the transformation to split a rule that introduced more than two categories. So:

VP → VBZ NP PP

becomes

VP → X1 PP
X1 → VBZ NP