Linguistics 596

Viterbi Algorithm


Decoding and Segmentation

Given a string of phones find the most probable sequence of words it realizes. We are given as an input a sequence of phones.

Each phone gets one instant of time:

We use a language to model to assign probabilities to sequences of phones. Paths through the model determine sequences of words. Our output is the most probable sequence of words:

We've simultaneously accomplished decoding and segmentation:
  1. Decoding: Finding what word is realized by a string of phones.
  2. Segmentation: Splitting up the sequence of phones into subsequences representing words (no white spaces between words!).

Language model

  1. Pronunciation Network The key ideas are:
    1. States in this diagram are labeled with phones. Being in a state labeled "dh" means that this state is only compatible with the input "dh".
    2. In formal HMM talk, this is captured by saying that the "observation likelihood" of phone "dh" (o(dh)) for that state is 1, and the observation likelihood for any other phone is 0.
  2. Bigram model
  3. The two models combined
We call the start state start. We name other states by the word subnetworks they belong to.

The core computation: Viterbi probability

In general a state s can be reached at time t via a number of paths. For each path reaching state s at time t, we compute a path probability. We call the best of these viterbi(s,t).

Basic idea: For a state s' compatible with the input phone at time t+1, compute viterbi(s',t+1), assuming we know viterbi(s,t) for all s.

  1. For each s, compute path-prob(s'|s,t)[path probability for reaching s' at time t+1 from s at time t.].
  2. viterbi(s',t+1) = max s in STATES path-prob(s' | s,t)
  3. Assume viterbi(start,0) = 1.0. For all other states s except start, assume viterbi(s,0) = 0.
  4. If a state s is not compatible with input phone at time t, assume viterbi(s,t)=0.
  5. There are two ways a state can be a dead end.
    1. All its transitions to input-compatible states have probability 0.
    2. Its viterbi probability is 0.
    In either case all paths continueing from the state will have path probability 0.

Key idea:

The Viterbi algorihm

  1. Basic idea of Viterbi algorithm. Find the most likely path, given some input, through a probabilistic network. This path determines a sequence of words, thus accomplishing decoding and segmentation.
  2. So for input of length n, we compute the Viterbi probability for each state s and each time 0 through n, keeping track of the path associated with each viterbi score. The state with the best viterbi score is the state we want to end up in. What we want to return is the best probability path to that state.
  3. The path probability matrix
  4. t = 0   Compatible state is start
    viterbi(start,0) = 1.0
    t= 1   o1 = aa; compatible states are onaa and Iaa

    viterbi(onaa,1)

    • path-prob(onaa | start, 0) = viterbi(start,0) * a[start,onaa]
    • a[start,onaa] = pr(onaa|on) * pr(on | start)
    • a[start,onaa] = 1.0 * .00077
    • path-prob(onaa | start, 0) = 1.0 * .00077 = .00077
    • viterbi(onaa,1) = .00077

    viterbi(Iaa,1)

    • path-prob(Iaa | start, 0) = viterbi(start,0) * a[start,Iaa]
    • a[start,Iaa] = pr(Iaa|I) * pr(I | start)
    • a[start,Iaa] = .20 * .079 = .0016
    • path-prob(Iaa | start, 0) = 1.0 * .0016 = .00016
    • viterbi(Iaa,1) = .0016
    t=2   o2 = n; compatible states are onaa n, then and needn

    viterbi(onaa n,2)

    • From onaa
      • path-prob(onaa n| onaa, 1) = viterbi(onaa,1) * a[onaa,onaa n]
      • path-prob(onaa n| onaa, 1) = .00077 * 1.0 = .00077
    • From Iaa
      • path-prob(onaa n| Iaa, 1) = viterbi(Iaa,1) * a[Iaa,onaa n]
      • path-prob(onaa n| Iaa, 1) = .0016 * 0 = 0
    • viterbi(onaa n,2) = Max({.00077, 0}) = .00077

    viterbi(then,2)

    • From onaa
      • path-prob(then| onaa, 1) = viterbi(onaa,1) * a[onaa,then]
      • path-prob(then| onaa, 1) = .00077 * 0 = 0
    • From Iaa
      • path-prob(then| Iaa, 1) = viterbi(Iaa,1) * a[Iaa,then]
      • a[Iaa,then] = pr(the|I) * Pr(then |the)
      • a[Iaa,then] = .00018 * .08
      • path-prob(then| Iaa, 1) = .0016 * .00018 * .08 = .000000023
    • viterbi(then,2) = Max({.000000023, 0}) = .000000023

    viterbi(needn,2)

    • From onaa
      • path-prob(needn| onaa, 1) = viterbi(onaa,1) * a[onaa,needn]
      • path-prob(needn| onaa, 1) = .00077 * 0 = 0
    • From Iaa
      • path-prob(needn| Iaa, 1) = viterbi(Iaa,1) * a[Iaa,needn]
      • a[Iaa,needn] = pr(need|I) * Pr(needn |need)
      • a[Iaa,needn] = .0016 * 1.0 = .0016
      • path-prob(needn| Iaa, 1) = .0016 * .0016 = .000026
    • viterbi(needn,2) = Max({.000026, 0}) = .000026
  5. The algorithm