Linguistics 581
Introduction to Computational Linguistics
Pre-requisites: Mathematical Linguistics or Math 245 (Discrete Mathematics) and Corpus Linguistics or CS 107 (Introduction to Computer Programming)
Required Texts
Jurafsky, Daniel and Martin, James H. 2000. Speech and Language Processing. Prentice-Hall.
Charniak, E. 1997. Statistical Language Learning. Bradford
Course Description
This is an introduction to computational linguistics that will cover such basic concepts as regular expressions, finite-state automata, finite-state transducers, weighted finite-state automata, and n-gram language models, along with applications to phonology, orthography, morphology, and syntax. There is some emphasis on introducing probabilistic models and filling in enough background to understanding the statistical techniques used in speech recognition.
Grading
Assignments(40%)
Midterm (20%)
Final(40%)
Week 1:
What is Computational Linguistics? History of Computational Linguistics.
Week 2:
Introduction to Finite-State Automata (FSAs) and regular expressions (REs).
Week 3:
Morphology. FSAs as morphological recognizers, Finite-State Transducers (FSTs) as morphological parsers. Spelling and allomorphy rules with FSTs.
Week 4:
Partial morphological analysis of a language with richer morphology
Week 5:
Computational phonology. Phonetic variation. Prosody. FST models of phonetic realization rules for text-to-speech applications.
Week 6:
Noisy channel model. Bayesian approach to noise-filtering/error correction. Likelihood and prior probability. Weighted FSAs.
Week 7:
Application of noisy channel model to spelling correction and pronunciation-modeling. Forward algorithm. Viterbi search.
Week 8:
Probabilistic word-in-context models. N-gram language models of word collocation.
Week 9:
Evaluation of statistical language models. Training and test corpus. Entropy, perplexity, and cross-entropy.Week 10:
Word-class and part of speech tagging. Stochastic and rule-based taggers.
Week 1l:
Parsing. Top-down and bottum-up parsing. Ambiguity. Chart parsing.Week 12:
Earley and CKY algorithms.Week 13,14,15: Speech recognition
Noisy channel model and speech recognition. Introduction to Hidden Markov Models (HMMs). Viterbi and A* decoding. Feature extraction. Vector quantization. Training speech recognizers. Forced Viterbi alignment.