9.10. Python tools for visualization¶

9.10.1. Data for house visualization¶

Let’s talk data first.

The data is United States Congressional Voting Records 1984, taken from THe UCI machine learning archive. It is also available in R as data included with the mlbench package. In R, you would do:

> library(mlbench)
> data(HouseVotes84)

The HouseVotes84 data set includes votes for each of the U.S. House of Representatives Congressmen on the 16 key votes identified by the Congressional Quarterly Almanac (CQA).

The CQA contains 16 variables, and consider nine different types of votes represented by three classes: yea (voted for, paired for, announced for), nay (voted against, paired against, announced against) and unknown (voted present, voted present to avoid conflict of interest, did not vote or otherwise make a position known).

The 16 bills voted on are:

handicapped-infants
water-project-cost-sharing
adoption-of-the-budget-resolution
physician-fee-freeze
el-salvador-aid
religious-groups-in-schools
anti-satellite-test-ban
aid-to-nicaraguan-contras
mx-missile
immigration
synfuels-corporation-cutback
education-spending
superfund-right-to-sue
crime
duty-free-exports
export-administration-act-south-africa

9.10.2. Dimensionality reduction¶

For dimensionality reduction, we are going to use LSI (Latent Semantic Indexing): House members are rows, Votes are columns. So in place of a term/document matrix, we have a member/vote matrix.

9.10.3. The Basic script outline¶

We do the following:

>>> (R_data, data_sums,row_labels,col_labels) = read_data.read_R_data_file(R_data_file, data_type=str)
>>> matrix = convert_house_data_to_ints (R_data)

First we read in the file. We will look at that code more closely in a moment. For now, the big picture. R_data contains the raw data matrix. Each line looks like this:

1   republican    n    y    n    y    y    y    n    n    n    y NA    y    y    y    n    y

We want to convert this to integers using the following idea:

$\begin{array}[t]{lll@{}l} y &\rightarrow& &1\\ n &\rightarrow& -&1\\ \text{{\sc na}} &\rightarrow & &0 \end{array}$

The function convert_house_data_to_ints converts R_data to int s using this convention, leaving out the party affiliation. We’ll use that information at the very end to help sort our points into two different bins.

The first row of matrix looks like this:

>>> matrix[0]
array([-1.,  1., -1.,  1.,  1.,  1., -1., -1., -1.,  1.,  0.,  1.,  1.,
      1., -1.,  1.])

This represents a single member of the house based on 16 votes, hence a 16 dimensional representation. We want to reduce this to two dimension so that we can see it:

k = 2
member_reps = make_k_space_term_reps (matrix, k)

Of course all the magic is in the function make_k_space_term_reps, which produces member_reps. Let’s pass over the magic for now and focus on what it makes appear: member_reps is 435 x 2 matrix representing each member of the house with two numbers. The first member looks like this:

>>> member_reps[0]
array([-0.06135958,  0.02517892])

So we have a point on the xy-plane. Next we scatter those points over a two dimensional plot:

with open(os.path.join(data_dir, 'republicans.dat'),'w') as repub_ofh:
    with open(os.path.join(data_dir, 'democrats.dat'),'w') as demo_ofh:
        for r in range(len(member_reps)):
            party_affiliation = R_data[r][0]
            print party_affiliation, member_reps[r]
            if party_affiliation == 'republican':
                ofh = repub_ofh
            else:
                ofh = demo_ofh
            print >> ofh, '%.5f  %.5f' % (-member_reps[r][0],member_reps[r][1])

In line 10, we insert a minus sign (“-”) before the x coord to rotate the plot around the y axis. The effect is that Democrats end up on the left hand side, and Republicans on the right.

9.10.4. The Basic script in total¶

set key left box
#set samples 50
set terminal postscript
set output 'house_data.ps'
plot 'republicans.dat' with points pointtype 5 pointsize 2 lc rgb "red", 'democrats.dat' with points \
        pointtype 5 pointsize 2 lc rgb "blue"

These are the basic steps in plotting the data.

9.10.5. The image¶

House of representatives voting patterns visualization using LSI — Multidimensional Scaling of vote data from the House using LSI¶