Basic Dictionary Access
You need to store
various counts. For example, you need to store
the counts of the number of times a word co-occurs with
a tag in the training data.
There are various ways you could do this.
But the deciding factor is this: You are
basically building a V x T matrix, where
V is vocabulary size and T is number of tags.
The matrix will be sparse. Most words won't co-occur with most
tags in the training data.
The natural way to deal with such a sparse
matrix in Python is to use a dictionary:
Let's assume we store the data as follows. The keys for the
dictionary are words. The values are dictionaries.
To each word we associate a dictionary which gives for each tag
the count of the number of times the word co-occurs with that tag:
>>> wordtag['dance']
{'VBZ': 1, 'NN1': 1}
That is, the dictonary associated with 'dance'
tells us we have seen it once as a 'VBZ' (tensed verb)
and once as a 'NN1 (common noun).
Just to be clear in the following discussion,
let's empty the dictionary:
wordtag={}
This statement
declares wordtag as an empty dictionary (a hash table or alist in other languages).
Now let's assign something to it:
>>> wordtag['dance']={'VBZ': 1, 'NN1': 1}
>>> wordtag
{'dance': {'VBZ': 1, 'NN1': 1}}
So now we have a dictionary of dictionaries,
defined for just the one word 'dance'.
Now we ask the dictionary about the count of some arbitrary word/tag pair:
>>> wordtag['walk']['VBZ']
Traceback (most recent call last):
File "", line 1, in ?
KeyError: 'walk'
Of course we get an error. Python has very good error/exception
handling. This kind of error is important to know
about if you're using dictionaries. It's called a 'KeyError'
as the error message (in red) tells us. Needless to
say in Python errors too are objects and always belong
to classes. An error of the Key Error class
is raised whenever we access a dictionary asking for
information about some key the dictionary knows nothing about.
Suppose we now try to ASSIGN a value using the variable assignment
operator ('=').
>>> wordtag['walk']['VBZ']= 1
Traceback (most recent call last):
File "", line 1, in ?
KeyError: 'walk'
Same error. Why? Because this says, (a) find the value
for key 'walk' in dictionary wordtag; that should be a dictionary.
(b) Then assign to that dictionary the value '1' for the key 'VBZ'.
But we never get to step (b) because step (a) raises a
Key Error. So we have to do this:
>>> wordtag['walk']={}
Assign an empty dictionary as the value of 'walk' in wordtag.
And now we can do the assignment without error:
>>> wordtag['walk']['VBZ']= 1
And now we can do other assignments to 'walk':
>>> wordtag['walk']['NN1']= 1
And now when we ask for the dictonary associated
with 'walk' we get what's expected.
>>> wordtag['walk']
{'VBZ': 1, 'NN1': 1}
And we can fetch now embedded values with a single query:
>>> wordtag['walk']['VBZ']
1
So that's basically how things work. And of course
this gives us some problems to solve when we write
a program requiring random access and random updates
to a dictionary or dictionaries.
|