Linguistics 581
Introduction to Computational Linguistics
Course Description 
This course will serve as an introduction to the field of computational linguistics, which includes aspects of speech recognition, natural language processing, information retrieval, and information extraction.
The course begins with an introduction to finitestate automata and some basic natural language applications; this is extended to finitestate transducers with applications in morphology (word structure). Other topics covered: ngram language models, classifiers (Naive Bayes and Logistic Regression), sentiment analysis, part of speech tagging, contextfree grammars and contextfree parsing (with statistical extensions), and dstributuional semantics.
Goals 
The primary goal of the course is to acquaint students with a basic set of computational techniques that have proved useful in a variety of natural language applications. The principles and mathematics behind these techniques often overlap with those used in other fields in which machine learning has been successfully applied, such as computer vision. However, the problems, in particular the relevant statistical properties, are quite different. Thus this class should provide a nice complement to other classes you may be taking which use machine learning. Students should acquire enough facility with the concepts and tools so that they can use them to construct wellspecified solutions to simple computational linguistic problems. A wellspecified solution is one that a programmer can use to write a program. 

Practice 
The course will use the textbook:
There will be exercises for most of the chapters covered. 
Programming 
The programming required in this class is very light. We will gloss over many of the difficult details involved in implementing the ideas covered here. However, some use of computation is essential. The programming language used will be Python. Computational assignments will be guided, with data, tool, and partial solutions provided. The tools and partial solutions provided will all be in Python.

Prerequisites  At least two linguistics courses or at least two programming or CS courses. Students with no programming background will find this course challenging. 
Grading 
Grading will be based on exercises and
takehome midterms and finals.

Late Assignments 
The general structure of the course is not wellsuited to
late assignments. Assignment solutions will be discussed
in detail on the day they are turned in, and thus students
who turn assignments in late will be at an advantage.
However, to allow for some
flexibility, late assignments will receive partial
credit. Here is the lateness policy:

Group Work 
Group work is encouraged on the assignments. The midterm and final should be completed without any help. To be clear, collaboration on either the midterm or final will be considered cheating. When turning in collaborative assignments, your collaborators should be identified on your paper. 
Attendance 
Attendance is not a formal part of your grade. However, be aware that hints on how to solve problems on the assignments, the midterms, and the final are handed out liberally in class. These hints will not be posted on the web page. 
Classroom Practice 
Assignments will generally be due on a Tuesday and will be discussed upon return. Model solutions will often be posted. Most of the readings are from the 2nd Edition of your textbook, but some are available ONLY in the 3rd Edition, which has not yet been published. 
Course outline 
here. 
Place and Time 
Tu Th 200315 LSS 246
Contact 
Mailing address:
Department of Linguistics and Oriental Languages
San Diego State University
5500 Campanile Drive
San Diego, CA 921827727
Telephone: (619) 5940252
Office location: SHW, room 238
Office hours: Tu 3:304:30, Th 9:3010:45, TuTh 12:301:45