9.1.1. Arrays for plotting


Python and Ipython notebook versions of code (.ipynb).

This section contains a very brief introduction to arrays, just enough to get you started on plotting.

As we noted in Section arrays, arrays are columns of numbers: contain items in sequence, like the following:

>>> import numpy as np
>>> x = np.array([1.0,2.,3.1])

Like lists you can access them by index:

>>> x[2]

As we noted when we first encountered arays, a fundamental

reason is space. We can save a great deal of space storing sequences if we know that all the items in the sequence are of the same data type. Another reason is time; mathematical operations can be made much more efficient if they are performed on sequences of uniform type. So the one-type restriction on arrays is quite helpful, in light of the fact that there are people out there doing massive amounts of number crunching involving very large arrays.

A large part of why arrays provide such massive gains in efficiency is vectorization of operations. To review, the fancy mathematical term for a column of numbers is a vector. To vectorize an operation means to generalize it from an operation on numbers to an operation on vectors. When you load numpy, vectorized versions of all the basic arithmetic operations are defined. For example, consider addition:

>>> x = np.array([1.0,2.,3.1])
>>> y = np.array([-1.0,-2.,2.9])
>>> x + y
array([ 0.,  0.,  6.])

The result of adding array x and array y is a new array whose $i$th element is the sum of $x[i]$ and $y[i]$.

Similar generalizations apply to all the 2-place arithmetic operations. So why should ordinary working data scientists care about arrays? One answer of course is that efficiency usually ends up mattering, even when you think it won’t. But there is a simpler answer that has immediate consequences even for beginners. Vectorization provides us with a lot of programming conveniences that make for clearer, more concise code. These benefits can be very nicely illustrated with plotting examples.

We now illustrate how vectorization works with user defined arithmetic functions:

def func(x):
    return (x-3)*(x-5)*(x-7)+85
x = np.arange(0, 10, 0.01)
y = func(x)

Now y is an array containing the elementwise result of applying func to each element of x.

What all this has to do with plotting is this: The simplest way to use pyplot is to give it two columns of numbers as follows:

import matplotlib.pyplot as plt
plt.plot([1,2,3,4], [1,4,9,16], 'ro')
plt.axis([0, 6, 0, 20])

This plots the points (1,1), (2,4), (3,9), and (4,16). So for each position i in arrays x and y, we plot (x[i], y[i]). This might seem a little awkward at first, but the advantage becomes clear when we use x as defined above:

plt.plot(x, func(x), 'ro')
plt.axis([0, 10, 0, 20])

Remember that func(x) is an array containing the elementwise result of applying func to each element of x. So for each position i in arrays x and func(x), we plot the point (x[i],func(x[i])), which is just what plotting a function should be.

The key point about arrays for now is this:

For many functions, applying a function $f$ to an array x returns an array containing the results of applying $f$ elementwise to x, so plotting $f$ over the interval given by x is just a matter of giving plot the arguments f and f(x).

In particular, it is not quite true that arrays allow only things that are of one type. A single cell may also contain a structured tuple called a record. See this numpy page.