The Introduction to Python section gave you an introduction to basic
Python data types like strings, lists, tuple, and dictionaries.
In this section we introduce some more advanced data types
available in special Python packages called
We will begin by learning how to create, access, and update the basic
numpy data structure, the ndimensional array, as well how to add, subtract, and multiply with arrays using vectorized arithmetic operations, operations that apply elementwise to all the elements of an array. What we learn about operations on numbers will carry over to Boolean conditions, conditions that are True or False of the individual elements in an array. Applying a Boolean condition is also a vectorized operation, so applying a Boolean condition to an array results in an array Boolean result. We will learn to use such Boolean arrays to extract portions of arrays that satisfy Boolean conditions, allowing for high-level queries and manipulations of the data.
An immediate payoff from our brief survey of numpy is that all the principles for computing with numpy arrays will carry over with minor modifications to computing with pandas.
The pandas module is Python’s most popular toolset for manipulating data in tabular form (Excel sheets, data tables). The two main pandas data types are DataFrame and Series.
A DataFrame is a table of data. Datasets at all levels of analysis of analysis can be represented as DataFrames.
You can think of a DataFrame as being organized in rows and column, like a numpy 2D array, but differing from it in one important respect: A DataFrame uses keyword indexing instead of positional indexing.
Despite this change in how indexing works, all the principles that apply to computing with numpy arrays will carry over with minor modifications to computing with pandas DataFrames. This is especially true of Boolean indexing, which will be your fundamental tool for selecting and reshaping data in pandas. Where a DataFrame is like a 2D array, a Series is like a 1D array; both the rows and the columns of pandas DataFrames are Series objects.
We concluide our brief tour of
pandas with a look at some of
its aggregation tools, including cross-tabulation, grouoing, and
pivot tables, as well as some tools for merging data.
- 6.1. Numpy and Arrays
- 6.2. More on two-dimensional arrays
- 6.3. Elementwise arithmetic operations
- 6.4. More on arrays
- 6.5. Pandas Introduction
- 6.5.1. Create Data
- 6.5.2. Selecting Columns
- 6.5.3. Selecting rows
- 6.5.4. Boolean conditions
- 6.5.5. Selecting Rows with Boolean Conditions
- 6.5.6. Combining Conditions with Boolean operators
- 6.5.7. Keyword indexing and Alignment
- 6.5.8. Sorting and positional indexing
- 6.5.9. Loading Data: A more realistic example
- 6.5.10. Selection: Selecting parts of Pandas data frames
- 6.5.11. Summary/Review: Selection & Indexing
- 6.5.12. The .value_counts( ) method
- 6.5.13. Cross-tabulation
- 6.5.14. Solution 1
- 6.5.15. Solution 2
- 6.5.16. Complaints: a new dataset
- 6.5.17. Using groupby
- 6.5.18. Understanding cross-tabulation
- 6.5.19. Exercises combining everything we’ve learned in Part One.
- 6.6. Pivot Tables
- 6.7. Merging