# 9.2. Introduction to Networkx¶

Note

Most of the points covered here are covered in
an ipython notebook (`.ipynb`

).
Help with the homework assigned for this module is also available there.

In this section we present a very brief introduction
to `networkx`

, one of the more widely used Python
tools for network analysis.
For the code below to work you must have installed
packages named `networkx`

and
(for `graphviz`

) `pydot`

. These
are part of the standard Canopy distribution.

Some additional `networkx`

help:

- For a tutorial on using
`networkx`

, look here. Detailed documentation of`networkx`

and many resources and tutorials are available at the Networkx website. But use the Enthought package manager rather than the website for installation. - This Rice University Tutorial. is very stripped down and very well done.

## 9.2.1. Basics¶

We will import as follows:

```
>>> import networkx as nx
```

Networkx provides some graphs as examples. One of these happens to be Zachary’s karate club. That can be read in as follows:

```
>>> kn=nx.karate_club_graph()
```

In general, you will be reading in graphs that you construct yourself
and save in a file in some standard format or that are available as
course data. An alternative way of constructing Zachary’s graph is to download `Zachary's karate club network`

and save this in the directory you start Python in. GML format is a standard
graph format `networkx`

knows about, so that file can be read into Python
as follows:

```
>>> kn = nx.read_gml('karate.gml')
<networkx.classes.graph.Graph object at 0x1014b7f90>
```

You will often find it useful to do:

```
>>> filename = 'yourfile.gml'
>>> G = nx.read_gml(filename,relabel=True)
```

The `relabel=True`

flag builds a graph that
uses names (actually, whatever occurs as value of the `label`

attribute) as node identifiers.
For example, when reading in the *Les Miserables* graph
or the *Anna Karenina* graph, this will give you characters
identified by their two character IDs.

`networkx`

also provides a number of methods
that compute statistics of your graph, many of which
we will discuss below. To get the number of nodes in your graph,
for example, do:

```
>>> num_nodes = G.number_of_nodes()
>>> print 'number of nodes: ' + str(num_nodes)
```

## 9.2.2. Graph generators in `networkx1`

¶

`Networkx`

provides a number of functions that generate famous
graphs. We have already seen one example. Zachary’s
karate club graph can be generated with a simple
function call:

```
>>> kn=nx.karate_club_graph()
```

To create a Watts-Strogatz (WS) small world graph, do:

```
>>> G = nx.watts_strogatz_graph(24,4,.3)
```

G is a WS-graph with 24 nodes, each of which is linked to its four nearest neighbors in the original circular layout. The probability of rewiring is set to .3. Now draw the result.

```
>>> nx.draw_circular(G)
>>> plt.show()
```

To create an **erdos-renyi** graph with *80* nodes, and edge
connection probability *.1*, do:

```
>>> nx.gnp_random_graph(80,.1)
```

For more examples of graphs that can be generated, see the Networkx docs.

## 9.2.3. A simple example¶

As our first example, we will reproduce the result discussed in the visualization section, in which we visualize Zachary’s karate club in a way which that shows the factionalized nature of the social relationships.

First, download `Zachary's karate club network`

and save it in your Python directory, if you haven’t already. Then
read it into Python:

```
>>> kn = nx.read_gml('karate.gml')
```

You can now draw the graph:

```
>>> import matplotlib.pyplot as plt
>>> nx.draw_circular(kn)
```

If you are not working in Ipython with pylab active, you will
need to execute the `show`

command to actually display the graph:

```
>>> plt.show()
```

You may want to make some small adjustments to the display in the display window. You can then save the figure in an image file in various formats. Here we choose “png”, a standard webbrowser friendly format:

```
>>> plt.savefig('networkx_circular_karate.png')
```

## 9.2.4. Graph analysis¶

`networkx`

has a standard dictionary-based
format for representing graph analysis computations
that are based on properties of nodes.

We will illustrate this with the example of
**betweenness_centrality**.
The problem of centrality and the various ways
of defining it was discussed in Section Social Networks.
As noted there, key facts about the karate graph can be revealed with
a measure of graph centrality called betweenness centrality.
`networkx`

offers an implementation, along with a host of
other graph analysis tools:

```
>>> betweenness = nx.betweenness_centrality(kn)
```

Like the other centrality
measures discussed below,
`betweenness_centrality`

returns
a dictionary associating each node in the graph with
a betweenness score. Higher is better. That is, higher means
that node is evaluated as occupying a more central position in
the graph:

1 2 3 4 5 6 | ```
>>> betweenness
{1: 0.43763528138528146, 2: 0.053936688311688304, 3: 0.14365680615680618,
4: 0.011909271284271283, 5: 0.0006313131313131313, 6: 0.02998737373737374,
7: 0.029987373737373736, 8: 0.0, 9: 0.05592682780182781, 10: 0.0008477633477633478,
...
}
``` |

Computing centrality measures is often followed by sorting the results to find the nodes with highest centrality. Here’s an example of how to do that. Dictionaries have no concept of order, so to sort their contents you must first turn them into lists, and then sort the resulting list:

1 2 3 4 5 6 7 8 | ```
>>> b_il = betweenness.items()
>>> b_il.sort(key=lambda x:x[1],reverse=True)
>>> b_il
[(1, 0.43763528138528146), (34, 0.30407497594997596), (33, 0.14524711399711399),
(3, 0.14365680615680618), (32, 0.13827561327561325), (9, 0.05592682780182781),
(2, 0.053936688311688304), (14, 0.04586339586339586), (20, 0.03247504810004811),
....
]
``` |

So we’ve discovered that node 1 and node 34 have the highest betweenness scores, and might therefore be thought to be the most central people in this social organization, in accord with what is suggested by the display produced by the Fruchterman-Reingold algorithm.

We can compare the results obtained from `betweenness_centrality`

with
those from `degree_centrality`

:

```
>>> deg = nx.degree_centrality(kn)
```

The results in this case aren’t very different from those of betweenness centrality, so we omit them here.

Other analysis tools implemented in `networkx`

.

`edge_betweenness(G):`

Illustrated below in the the Girvan-Newman example.`eigenvector_centrality(G)`

: (also eigenvector_centrality_numpy). Explaining this concept of centrality is beyond the scope of this course. It requires computing the eigenvectors of the adjacency matrix of the graph, and is closely related to pagerank score used by Google to rank the centrality of websites on the Internet.`nx.closeness_centrality(G)`

. Closeness centrality of a node**u**is the reciprocal of the sum of the shortest path distances from**u**to all**n-1**other nodes. Since the sum of distances depends on the number of nodes in the graph, closeness is normalized by the sum of minimum possible distances n-1. So if node*n*is a neighbor of all*n-1*other nodes in the graph, closeness_centrality is 1.`current_flow_betweenness_centrality`

. This kind of centrality was first discussed in [NEWMAN2003] . Newman calls it “random walk centrality”, also known as “current flow centrality”. The current flow centrality of a node*n*can be defined as the probability of passing through*n*on a random walk starting at some node*s*and ending at some other node*t*(neither equal to*n*), Newman argues that betweenness for some social networks should not be computed just as a function of shortest paths, but of all paths, assigning the shortest the greatest weight. This is because messages in in such networks may get to where they’re going not by the shortest path, but by a random path, such as that of sexual disease through a sexual contact network.`nx.clustering coefficient(G)`

: clustering(G, nbunch=None, with_labels=False, weights=False) Clustering coefficient for each node in nbunch.`nx.average_clustering(G)`

: Average clustering coefficient for a graph. Note: this is a space saving function ; It might be faster to use clustering to get a list and then take average.`nx.triangles(G, nbunch=None, with_labels=False)`

: Returns number of triangles for nbunch of nodes as a dictionary. If nbunch is None, then return triangles for every node. If with_labels is True, return a dict keyed by node. Note: Each triangle is counted three times: once at each vertex. Note if G is the graph, and t is thenumber triangles of a node n, and d is its degree, then :math: nx.clustering_coefficient(G,n) = frac{t}{d(d-1)}`nx.connected_component_subgraphs(G)`

: Used in the Girvan-Newman implementation below. This returns a list of the fully connected components of graph G. If the graph is fully connected it returns a list containing the graph`nx.number_connected_components(G)`

: This just returns the length of the list returned by`nx.connected_component_subgraphs(G)`

.

## 9.2.5. Implementing Girvan-Newman¶

Here is a simple python implementation of the Girvan-Newman graph-partition
method using `networkx`

. The source code is
`here.`

The Girvan-Newman algorithm is divisive. It tries to find an optimal
way of cutting the graph into two pieces, and then it does the same on
the pieces. Hence the `girvan_newman`

function is recursive.
Each time it finds a way to split a graph into two optimal
pieces, it calls itself on the pieces. It terminates when it is given
a graph with only one node. It returns a list of all the components
it has found:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | ```
import networkx as nx
def girvan_newman (G):
if len(G.nodes()) == 1:
return [G.nodes()]
def find_best_edge(G0):
"""
Networkx implementation of edge_betweenness
returns a dictionary. Make this into a list,
sort it and return the edge with highest betweenness.
"""
eb = nx.edge_betweenness_centrality(G0)
eb_il = eb.items()
eb_il.sort(key=lambda x: x[1], reverse=True)
return eb_il[0][0]
components = nx.connected_component_subgraphs(G)
while len(components) == 1:
G.remove_edge(*find_best_edge(G))
components = nx.connected_component_subgraphs(G)
result = [c.nodes() for c in components]
for c in components:
result.extend(girvan_newman(c))
return result
``` |

Note the use of a
`while`

loop in line 21. This keeps removing
the remaining edge with highest betweenness. Each time an edge is removed
we recompute the edge with highest betweenness as well as the
components in the graph. When we get two components,
the condition for re-entering the `while`

loop fails,
and we exit the loop, having
succeeded in splitting the graph. For densely connected
graphs, many edges may have to be removed to split
the graph.

## 9.2.6. Using weighted links¶

We read in `a weighted version of the karate graph`

, which has Zachary’s original
weights for the intensity of the relationship (this was measured by
how many extra-club activities the two club members shared):

```
import networkx as nx
kn = nx.read_gml('weighted_karate.gml')
```

Nex we sort the edge list by weight

```
all_edges = sorted(kn.edges(), key= lambda x: kn.get_edge_data(*x)[0]['value'], reverse=True)
```

Now for each edge we print the edge and its weight:

```
from collections import Counter
all_node_ctr = Counter()
for (u,v) in all_edges:
for x in (u,v):
all_node_ctr[x] += 1
print '%-4s %-4s %.3f' % (u,v,weight)
```

1 2 3 4 5 6 7 | ```
26 32 7.000
26 32 7.000
2 3 6.000
2 3 6.000
1 3 5.000
1 3 5.000
.....
``` |

Next we sort by degree:

```
all_nodes_il = sorted(all_node_ctr.items(), key = lambda x:x[1], reverse=True)
for (char, ct) in all_nodes_il:
print '%4s %.3f' % (char, ct)
top_chars_degree_centrality = all_nodes_il[:12]
print
for (ch, ct) in top_chars_degree_centrality:
print '%4s %.3f' % (ch, ct, )
```

Now we draw. We’ll use a force-spring layout, and we’ll try increasing spring tension for high weighted edges

```
new_labels_kn = dict([(char,char) for (char,score) in top_chars_degree_centrality])
pos = nx.spring_layout(kn,scale=1.0,weight='value') #
nx.draw_networkx(kn,pos, with_labels=False,node_size=200,node_color="lightgray") #Default node size=300
nx.draw_networkx_labels(kn,pos,new_labels_kn,font_size=14,font_color='blue')
nx.draw_networkx_nodes(kn,pos,new_labels_kn,node_color='red',node_size=400,label_pos=1.0,node_shape='D',font_size=16)
```

## 9.2.7. Coloring factions¶

The `networkx`

version of the karate graph has
nodes annotated with faction membership information.
Let’s use that to color our factions.

The following are the steps taken to produce the colored FG layout figure in the lecture notes. Some of the commented out lines show experiments you could try.

Start with faction1_color as the default node color We make a list of L tokens of the faction1_color string, where L is the length of kn.nodes()

```
#nx.draw_spring(kn)
#plt.savefig('networkx_spring_karate.png')
import math
#(faction1_col, faction2_col) = ('lightgray','salmon')
(faction1_col, faction2_col) = ('yellow','lightgreen')
node_color = [faction1_col] * len(kn.nodes())
# Now using the faction membership abnnotations from Zachary's data, change the known faction2 nodes to faction2_color.
node_dict = dict(kn.nodes(data=True))
for n in kn.nodes():
if node_dict[n]['club'] == 'Officer':
node_color[n] = faction2_col
print node_dict
print node_color
```

```
{0: {'club': 'Mr. Hi'}, 1: {'club': 'Mr. Hi'}, 2: {'club': 'Mr. Hi'}, 3: {'club': 'Mr. Hi'}, 4: {'club': 'Mr. Hi'}, 5: {'club': 'Mr. Hi'}, 6: {'club': 'Mr. Hi'}, 7: {'club': 'Mr. Hi'}, 8: {'club': 'Mr. Hi'}, 9: {'club': 'Officer'}, 10: {'club': 'Mr. Hi'}, 11: {'club': 'Mr. Hi'}, 12: {'club': 'Mr. Hi'}, 13: {'club': 'Mr. Hi'}, 14: {'club': 'Officer'}, 15: {'club': 'Officer'}, 16: {'club': 'Mr. Hi'}, 17: {'club': 'Mr. Hi'}, 18: {'club': 'Officer'}, 19: {'club': 'Mr. Hi'}, 20: {'club': 'Officer'}, 21: {'club': 'Mr. Hi'}, 22: {'club': 'Officer'}, 23: {'club': 'Officer'}, 24: {'club': 'Officer'}, 25: {'club': 'Officer'}, 26: {'club': 'Officer'}, 27: {'club': 'Officer'}, 28: {'club': 'Officer'}, 29: {'club': 'Officer'}, 30: {'club': 'Officer'}, 31: {'club': 'Officer'}, 32: {'club': 'Officer'}, 33: {'club': 'Officer'}}
['yellow', 'yellow', 'yellow', 'yellow', 'yellow', 'yellow', 'yellow', 'yellow', 'yellow', 'lightgreen', 'yellow', 'yellow', 'yellow', 'yellow', 'lightgreen', 'lightgreen', 'yellow', 'yellow', 'lightgreen', 'yellow', 'lightgreen', 'yellow', 'lightgreen', 'lightgreen', 'lightgreen', 'lightgreen', 'lightgreen', 'lightgreen', 'lightgreen', 'lightgreen', 'lightgreen', 'lightgreen', 'lightgreen', 'lightgreen']
```

Next lay out the network:

```
pos = nx.spring_layout(kn,scale=1.0)
```

Now put in the labels, using a label->new_label mapping. We’ll change the labels to agree with Zachary’s original indexing, so .

```
new_labels = dict((x,x + 1) for x in kn.nodes())
nx.draw_networkx_labels(kn,pos,new_labels,font_size=14,font_color='black')
```

We’ll highlight the two leader nodes, by drawing slightly larger black circles round them. Notice how the edges and nodes have to be drawn separately now.

```
nx.draw_networkx_nodes(kn,pos,{0:0,33:33},node_color=['black','black'],node_size=800)
# Now draw all the nodes, including leaders, using faction color scheme.
nx.draw_networkx_nodes(kn,pos,new_labels,node_color=node_color,node_size=500)
nx.draw_networkx_edges(kn,pos)
```

## 9.2.8. Notes on color¶

For your choices with matplotlib colors, look here in the matplotlib docs. Or you can use direct html color specs replacing line ?? above with something like this:

```
>>> (faction1_col, faction2_col) = ('#00CCFF','#FF00FF')
```

This following table shows the HTML color scheme:

## 9.2.9. Using the graphviz interface¶

Here is another way to access tools outside of
Python for graph drawing, in particular
a popular graph drawing program named `graphviz`

.

Via the module `pydot`

`networkx`

provides an interface
to `graphviz`

. This
is loaded automatically whenever `graphviz`

-relevant functions
are called. For example:

```
>>> nx.write_dot(kn,'networkx_circular_karate.dot')
```

This writes a “.dot” file. This is a layout neutral representation
of the graph. Now graphviz can be called with various layout
functions to create an image file from the the graph
representation in `networkx_circular_karate.dot`

.

The most interesting result for the karate graph was via the graphviz *circo*
function:

```
circo -Teps -o networkx_circular_karate_via_dot.eps networkx_circular_karate.dot
```

The result looks like this.

`networkx`

also offers the option of calling
`graphviz`

layout programs directly and plotting them using
matplotlib. For example, `graphviz`

‘s version of
a Fruchterman-Reingold layout can be computed
as follows:

```
>>> from networkx import graphviz_layout
>>> layout=nx.graphviz_layout
>>> pos = layout(pb,prog='sfdp')
```

Then `networkx`

‘s interface to plotting facilities can be called.