San Diego State University logo

Department of Linguistics and Oriental Languages



Intro Textbook website

Textbook2 website

Tu Th Course outline

Wed Course outline

Online book draft

Text Analytics Certificate & Minor

Linguistics 572

Python for Social Scientists




Increasingly, social scientists find themselves facing exponentially larger data sets available on the internet and elsewhere without suitable tools to deal with them. Many social scientists end up using spreadsheet programs for their data-processing tasks and spend hours clicking around or copying and pasting, and then repeating the process for other data files. Not only is this a waste of time, but it often leaves you in a situation where it is hard to reproduce the steps that got you a particular result, making your work useless.

This course will show you how to use computing tools freely available via the scripting language Python to use your data more powerfully and effectively. The course touches on many topics of theoretical interest in the emerging field of data science, such as social networks and data visualization, but the focus is on manipulating data so that you can tailor it to the needs of your particular project. The course targets social science students and will assume no prior programming knowledge. Although many of the techniques are relevant to linguistics, economics, and geography, the course focuses on techniques that are applicable to a wide range of data sources, including images, social network data, web pages, blogs.

Tiopics covered include

  • Python Basics
  • Searching for patterns in text and web data (regular expressions)
  • Extracting information from big data sets (Government data)
  • Constructing social networks from data (visualizing social groups)
  • Connecting to your stat package (Python data frames)
  • Visualizing similarity relations
  • Visualizing quantitative relationships on maps


The course will use one required text Automate the Boring Stuff with Python (Al Sweigart), but will also make heavy use of online course notes and freely available Python software.


Full Course outline

Prerequisites and Grading

No course pre-requisites. No knowledge of programming will be asssumed. Upper division standing. Some openness to acquiring computational skills. Some knowledge of what counts as interestinmg data in your own Social Science.

Grading will be based on exercises, quizzes, midterm, and a final project.

Place and Time

Tu Th 1100-1215 AL 104

W 1600-1840 AL 104

Contact Info

Mailing address:
gawron at mail dot sdsu dot edu
Department of Linguistics and Oriental Languages
San Diego State University
5500 Campanile Drive
San Diego, CA 92182-7727
Telephone: (619) 594-0252
Office location: SHW, room 238
Office hours: TuTh 12:45-1:45, Wed by appointment (choose a time in the iterval 1:30-2:30)

Unix | Computational Linguistics Lab