Car Prices |
Let's reconsider the joint distribution of COLOR and MAKE in which the two variables are independent.
Now let's consider prices:
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Expected Value |
Suppose we want to get the average price of a car? We dont just add the 9 prices together and divide by 9 because that ignores the relative frequencies of the prices. A green VW price needs to be weighted more in the final average because green VWs are more common. Weighting by relative frequency:
This turns out just to be another way of taking an AVERAGE.
|
General Expected Value |
The general idea of expected value is that we have some function that assigns a number to every member of a sample space.
The expected value of the function is just the sum of its values for each member of the sample space weighted by probability
|
We're interested in assigning a NUMBER to an event that characterizes the quantity of information it carries.
Next we'll be interested in the expected value of that number. The expected quantity of information per event. (Hint: that's the entropy)
Two Probabilistic Criteria |
|
||||||||
---|---|---|---|---|---|---|---|---|---|
Inverse Probabilities (Attempt 1) |
Let's try:
Examples:
p(m) = 1/8 ==> I(m) = 8 Result: Contradiction of criterion 2, additivity. Consider two independent events such that p(x1) =1/4 and p(x2)=1/8. Then:
|
||||||||
Log probability (Attempt 2) |
Observation: We know the probabilities of independent events multiply, but we want their combined information content to add:
I(p(x) * p(y)) = I(p(x)) + I(p(y)) One function that does something VERY like what we want is log:
Thus we have two ideas:
Examples:
Now we satisfy additivity:
The unit is bits. Think of bits as counting binary choices. Taking probabilities out of it:
|
||||||||
Information Quantity Defined |
Assume a random variable X with probability mass function (pmf) p. For each x in the range of X:
|
We next define entropy as the expected information quantity.
Expected Information Quantity |
I (Information quantity) is a function that returns a number for each member of a sample space of possible events. We can compute the expected value of I, which we call H: |
||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Coin tossing |
Suppose the probability of a head is 1/2. Then,
H(X) = [1/2 * 1] + [1/2 * 1] H(X) = 1 Suppose the probability of a head is 3/4. Then,
H(X) = [1/4 * 2] + [3/4 * .415] H(X) = .5 + .311 = .811
H(X) = [1/8 * 3] + [7/8 * .193] H(X) = .375 + .144 = .519 |
||||||||||||||||||||||||||||||||||||||||||||||||
Entropy as disorder |
Consider the graph of all possible values of p(H)(pdf, ps).
General fact:
|
||||||||||||||||||||||||||||||||||||||||||||||||
Entropy as Choice Number |
Consider the 8-sided die of the text book: Suppose the probability of each face is 1/8. Then,
H(X) = 8([1/8 * 3] H(X) = 3 Suppose the probability of one die is 1/4, and the others are all 3/28. Then,
H(X) = [1/4 * 2] + [7 * 3/28 * 3.22] H(X) = .5 + 2.42 = 2.92
H(X) = [1/8 * 3] + [1/8 * 5.81] H(X) = .375 + .726 = 1.101
|
||||||||||||||||||||||||||||||||||||||||||||||||
Optimal Encoding |
Consider a varaint of 20 questions in which your pay-off halves after each question. To maximize average payoff you want a strategy that guarantees you the least average number of questions. Consider the following horse race:
|
||||||||||||||||||||||||||||||||||||||||||||||||
Entropy of Joint Distribution |
|
||||||||||||||||||||||||||||||||||||||||||||||||
Entropy of Conditional Distribution |
|
||||||||||||||||||||||||||||||||||||||||||||||||
Mutual Information |
We write Mutual Infomration of X and Y as I(X;Y). This is symmetric and defined as:
Lots of different intuitions. Very interesting concept, lots of applications. HEre's one intution:
|
||||||||||||||||||||||||||||||||||||||||||||||||
Interdisiplinary Nature of Information Theory |
Fields where the notion of entropy (or something like it)
plays a role:
|
||||||||||||||||||||||||||||||||||||||||||||||||
Cross entropy (Intuition) |
Meassure average amount of surprise
|
||||||||||||||||||||||||||||||||||||||||||||||||
Cross Entropy Definition |
Cross-Entropy
(Per Word) Cross-Entropy of Model for a corpus of size n
(Per Word) Cross-Entropy of Model for the Language
By a wonderful magical theorem, this =:
|
||||||||||||||||||||||||||||||||||||||||||||||||
Cross-entropy Intuition |
|