Python for Social Science: Class projects
- There are two different projects related to online
census data in
The census data notebook.
- You can add to your facebook assignment all the optional
parts of the facebook assignment (allowable even if you didn't
do the facebook assignment as one of your assignments,
but in that case you should do the non optional part of
the assignment as well). A variant of this is to go all the
way back to where you downloaded your Facebook network in the
first place and explore one of the other options, besides
personal ego network (like the Like network), and do
the entire assignment, including optional parts, on that.
- You can do some degree of data analysis with
any social network you choose or can find, including one
derived from you own data, but you need to have some
clear definition of what a link is. Some available options:
- Twitter data
is one candidate here, using some defined set of messages
and one of the standard Twitter definitions of link.
- You can also use one of the fictional
networks we've used (Homer, Les Miserables, Anna Karenina)
as a startying point for analysis.
You can also create your own network from any piece
of fiction you choose that you can get online access to.
One issue in doing that is the link definition.
If the text has clearly
marked scene boundaries, you can write a program that
creates a graph using co-occurrence in the same scene as
the definition of a link. In many cases, co-occurrence in
the same paragraph is somewhat easier to use.
- Any data available in Mark Newman's collection
of graphs or
Stanford Large Network dataset or Donald Knuth's Stanford Graph base is a candidate.
- A graph in which the nodes are words is another option.
Doing graph analysis of WordNet is an option for those
with some knowledge of WordNet is. There is an easy-to-use
interface to the WordNet graph available in NLTK, described
here. If you use WordNet, you should
have a look at this IPython notebook.
It provides code for turning Wordnet graphs into networkx
graphs. Another possibility --- which leaves you a bit more on your own ---
is to use the graph of Roget's Thesaurus available
here, which comes from the Stanford graph base. This is not
directly readable by networkx as is, but I have some code
I can help you modify which will turn this in to a gml file
which is.
All of these options are by consent of instructor only.
- You can customize your own web crawling project based
on things you learned in the web crawling
and web crawling assignment notebooks. This will have to
include some data extraction component, where you select
data from some kind of structured page,
and save it in some usable format. By consent of instructor
only.
- You can design your project, based on your own data.
By consent of instructor only.