CKY Assignment Solution

      The agency sees use of the codes as a way.

    The grammar and lexicon:

    Grammar

    pp -> p np
    vpg -> vbg np
    X2 -> X1 pp
    vp -> vbz np
    vp -> X1 pp
    vp -> X2 pp
    ap -> rb a
    s -> np vp
    np -> dt nbar
    np -> X3 np
    np -> ap nbar
    np -> nbar pp
    np -> n n
    np -> vbg np
    X3 -> np cc
    X1 -> vbz np
    nbar -> ap nbar
    nbar -> nbar pp
    nbar -> n n
    nbar -> vbg np
    

    Lexicon

    a: dt
    and: cc
    use: n np nbar
    codes: n np nbar
    sees: vbz
    of: p
    agency: n np nbar
    rapidly: rb
    handling: vbg
    labor: n np nbar
    volume: n np nbar
    as: p
    costs: n np nbar
    controlling: vbg
    way: n np nbar
    growing: a ap
    mail: n np nbar
    the: dt
    widespread: a ap
    

    Part A

    Your task was to explain the entries in the last column of the chart, which included all entries ending at index 10.

    The modified chart below contains explanations for all the edges, lexical and non lexical. The lexical edges are for all spans of length 1, so the only ones you were responsible for in the assignment were those in cell (9,10).

      np -> dt nbar
      
    and there is a dt in the (8,9) cell of the chart and an nbar in the (9,10) cell.

    You can report your discovery of a rule edge in a simple format called a dtr record. Just give cat, the span, cat1 and cat2, and the value of k. So for our example, this is:

      np (8,10)
      (det,9,nbar)
      

    Part B

    Finally, this sentence has 5 parses in this grammar. This is because certain edges contributing to the s edge in the (0,10) cell can be built in more than one way, given the grammar. These are called ambiguous edges. Find these ambiguous edges. They may not be in the last column, but they are in the chart, Give the lexical records or daughter records for these edges. You do not have to find all the ambiguous edges, just the ones contributing to the final 5 parses. Finally you do not have to find all the edges contributing to the final 5 parses, just the ambiguous ones.

    1. The number of parses for an edge E(i,j) is the sum of the number of parses of its dtr records. By definition a lexical edge has just one parse.
        P(E) = sum(P(D) for D in Dtrs(E))
    2. Let D = (d1, k, d2), where d1 and d2 are edges. Then
        P(D) = P(d1) * P(d2)

    There are 27 edges in red below. These are the only edges to contributed to some parse of the example sentence.

      1
    the
    2
    agency
    3
    sees
    4
    use
    5
    of
    6
    the
    7
    codes
    8
    as
    9
    a
    10
    way
    0 dt
    Lexical
    np
    (dt 1 nbar)


    s
    (np 2 vp)


    s
    (np 2 vp)


    s
    (np 2 vp)


    1   np
    Lexical
    nbar
    Lexical
    n
    Lexical
    s
    (np 2 vp)


    s
    (np 2 vp)


    s
    (np 2 vp)


    2     vbz
    Lexical
    vp
    (vbz 3 np)


    X1
    (vbz 3 np)


    vp
    (vbz 3 np)
    (X1 4 pp)


    X2
    (X1 4 pp)


    X1
    (vbz 3 np)


    vp
    (vbz 3 np)
    (X1 4 pp)
    (X2 7 pp)
    (X1 7 pp)


    X2
    (X1 4 pp)
    (X1 7 pp)


    X1
    (vbz 3 np)


    3       np
    Lexical
    nbar
    Lexical
    n
    Lexical
    np
    (nbar 4 pp)


    nbar
    (nbar 4 pp)


    np
    (nbar 4 pp)
    (nbar 7 pp)


    nbar
    (nbar 4 pp)
    (nbar 7 pp)


    4         p
    Lexical
    pp
    (p 5 np)


    pp
    (p 5 np)


    5           dt
    Lexical
    np
    (dt 6 nbar)


    np
    (dt 6 nbar)


    6             np
    Lexical
    nbar
    Lexical
    n
    Lexical
    np
    (nbar 7 pp)


    nbar
    (nbar 7 pp)


    7               p
    Lexical
    pp
    (p 8 np)


    8                 dt
    Lexical
    np
    (dt 9 nbar)


    9                   np
    Lexical
    nbar
    Lexical
    n
    Lexical

    Grammar

      pp -> p np
      vpg -> vbg np
      X2 -> X1 pp
      vp -> vbz np
      vp -> X1 pp
      vp -> X2 pp
      ap -> rb a
      s -> np vp
      np -> dt nbar
      np -> X3 np
      np -> ap nbar
      np -> nbar pp
      np -> n n
      np -> vbg np
      X3 -> np cc
      X1 -> vbz np
      nbar -> ap nbar
      nbar -> nbar pp
      nbar -> n n
      nbar -> vbg np
      

    Lexicon

      a: dt
      and: cc
      use: n np nbar
      codes: n np nbar
      sees: vbz
      of: p
      agency: n np nbar
      rapidly: rb
      handling: vbg
      labor: n np nbar
      volume: n np nbar
      as: p
      costs: n np nbar
      controlling: vbg
      way: n np nbar
      growing: a ap
      mail: n np nbar
      the: dt
      widespread: a ap
      

Ambiguity

The explanation for why the sentence has 5 parses is contained in the last column.

Although S(0,10) can only be built one way, it is built out of np(0,2) and vp(2,10), and vp(2,10) can be built 4 ways:

       VP(2, 10)
    ---------------
    1. (vbz,3,np)
    2. (X1, 4, pp)
    3. (X2,7,pp)
    4. (X1, 7,pp)
    
So that means there are at least 4 parses. The reason there are 5 parses is that option 1 --- and only option 1 --- uses the np edge np(3,10) and that edge is two ways ambiguous:
       np(3, 10)
    ---------------
    1. (nbar, 4, pp) 
    2. (nbar, 7, pp)
    

Here are the two parse trees using the ambiguous np from 3 to 10, each using a different way of building it. Check the answer key above to make sure where each of the nodes in the parse tree is coming from. Each node is labeled with its start and end indices, to help find it in the table above. You should also verify that each tree obeys the rules of the grammar. Notice how many edges are shared between the parse trees. They all use pp(7,10). Two of them use PP(4,10). Two of them use X1(2,4), and so on. This is the key to how an exponential number of parse trees can be used using only n2 memory.

Each parse tree has 15 nodes. That's 5 * 15 = 75 nodes. That's 75 parse tree nodes built using 27 chart edges, so that's quite a lot of re-use.

Here are the remaining three parses:

     

Finally, you should try to determine which of the parse trees is the correct one. The question is made more difficult by the fact that the original grammar has been distorted by the Chomsky Normal Form transformation. Remember the category names beginning with X have been introduced by the transformation to split a rule that introduced more than two categories. So:

    VP → VBZ NP PP
    
becomes
    VP → X1 PP
    X1 → VBZ NP