9.10. Python tools for visualization¶
9.10.1. Data for house visualization¶
Let’s talk data first.
The data is United States Congressional Voting Records 1984, taken from THe UCI machine learning archive. It is also available in R as data included with the mlbench package. In R, you would do:
> library(mlbench) > data(HouseVotes84)
The HouseVotes84 data set includes votes for each of the U.S. House of Representatives Congressmen on the 16 key votes identified by the Congressional Quarterly Almanac (CQA).
The CQA contains 16 variables, and consider nine different types of votes represented by three classes: yea (voted for, paired for, announced for), nay (voted against, paired against, announced against) and unknown (voted present, voted present to avoid conflict of interest, did not vote or otherwise make a position known).
The 16 bills voted on are:
9.10.2. Dimensionality reduction¶
For dimensionality reduction, we are going to use LSI (Latent Semantic Indexing): House members are rows, Votes are columns. So in place of a term/document matrix, we have a member/vote matrix.
9.10.3. The Basic script outline¶
We do the following:
>>> (R_data, data_sums,row_labels,col_labels) = read_data.read_R_data_file(R_data_file, data_type=str) >>> matrix = convert_house_data_to_ints (R_data)
First we read in the file. We will look at that code
more closely in a moment. For now, the big picture.
R_data contains the raw data matrix.
Each line looks like this:
1 republican n y n y y y n n n y NA y y y n y
We want to convert this to integers using the following idea:
using this convention,
leaving out the party affiliation. We’ll use that
information at the very end to help sort our points into two different
The first row of
matrix looks like this:
1>>> matrix 2array([-1., 1., -1., 1., 1., 1., -1., -1., -1., 1., 0., 1., 1., 3 1., -1., 1.])
This represents a single member of the house based on 16 votes, hence a 16 dimensional representation. We want to reduce this to two dimension so that we can see it:
k = 2 member_reps = make_k_space_term_reps (matrix, k)
Of course all the magic is in the function
member_reps. Let’s pass over the magic for now
and focus on what it makes appear:
member_reps is 435 x 2 matrix representing
each member of the house with two numbers. The first member looks like this:
>>> member_reps array([-0.06135958, 0.02517892])
So we have a point on the xy-plane. Next we scatter those points over a two dimensional plot:
1with open(os.path.join(data_dir, 'republicans.dat'),'w') as repub_ofh: 2 with open(os.path.join(data_dir, 'democrats.dat'),'w') as demo_ofh: 3 for r in range(len(member_reps)): 4 party_affiliation = R_data[r] 5 print party_affiliation, member_reps[r] 6 if party_affiliation == 'republican': 7 ofh = repub_ofh 8 else: 9 ofh = demo_ofh 10 print >> ofh, '%.5f %.5f' % (-member_reps[r],member_reps[r])
In line 10, we insert a minus sign (“-”) before the x coord to rotate the plot around the y axis. The effect is that Democrats end up on the left hand side, and Republicans on the right.
9.10.4. The Basic script in total¶
1set key left box 2#set samples 50 3set terminal postscript 4set output 'house_data.ps' 5plot 'republicans.dat' with points pointtype 5 pointsize 2 lc rgb "red", 'democrats.dat' with points \ 6 pointtype 5 pointsize 2 lc rgb "blue"
These are the basic steps in plotting the data.