This is a set of Exercises for Python for Social Science, based on the Enthought training materials exercises. The material covered in the exercises is standard Python introductory material, but it is particularly compatible with both the online Python for Social Science Textbook and the Enthought Training videos.

To hand in an exercise you should read it through carefully, then click on the download link to the associated Python file. You will edit the Python file, test it, and then submit the edited Python file online following the directions given in class.

1. Python Essentials¶

This exercise module covers the first part of the course, the introduction to Python, otherwise knoiwn as Python essentials : types, functions, loops.

The “Name String Exercise” (first exercise in the “Strings” section) introduces some key conventions used thoughout the exercises and should be done first even if the programming concepts are familiar.

1.1. Strings¶

This group of exercises explores the use of strings, one of the key data types of this course, since we will spend a lot of time looking at information in text, and generally texts will represented as Python strings. Some of these exercises put an emphasis on looking at strings as containers, which is what they are in Python.

1.1.1. Name String Exercise¶

In this exercise, we will do some simple things with name strings. [Download]

When the download file is a Python source file (File extension is .py), you can edit the file in an appropriate editor or IDE (such as the Canopy editor). It is usually more comfortable to work in an editor that allows you to run the contents of the file (all IDEs allow this). Remember to save the file when you feel the exercise is has been completed.

Finally, when the download file is a Python source file (File extension is .py), the convention in these exercises is that it has been designed to work as a Python script. That is, it can be run independently of any editing program or notebook from a commandline.

Here is how to do that First connect to the directory in which the source file has been saved; in this case, we will assume the file name_string.py has been saved in the folder/directory Exercises:

gawron$cd Exercises gawron$  dir                       [or ls in Linux/Mac]
...
name_string.py
...

The dir or ls command lists the contents of the current directory or folder, and will confirm that the Python source file is where you think it is.

Next execute the file using the Python interpreter as follows:

gawron$python name_string.py [the output from the program will appear] If you wish to explore the programming state a bit (for example, when debugging), it is usually a good idea to run Python in Interactive mode, so that after executing the program you can check the values of variables and other relevant parts of the prorgam state. This is done as follows: gawron$ python name_string.py

[the output from  the program will appear]
>>>

After the program is run, the Python interpreter prompt appears and you can type Python commands, including commands that depend on your code having been run.

Now let’s turn to the name_string exercise proper.

We would like to create a custom welcome message for a future application that greats the user using his/her name.

1. Create a string with your first name and assign it to a variable fst_name.

2. Create another one with your last name and assign it to a variable last_name.

3. Print a third string made of the concatenation of “Hello “, and the two variables above.

4. Now print a string full of the character “=” to underline the previous output. For example, if your name is Eric Jones, your welcome message is:

Hello Eric Jones

you should then print 16 occurences of “=”.

Note

Your code should work with any name (in other words, you can’t assume that you can count the length of the name).

Hint

The number of characters needed is the length of “Hello “, the length of the first name, the length of the last name, plus 1 more for the space in between.

1.1.2. Poem String Exercise¶

In this exercise, we will revise a poem so that it scans corectly. [Download]

Despite the number of snakes that are used on books, websites and package names, the Python language was named after the British comedy group Monty Python. Part of their Big Red Book was a poem called The Haggis poem that we will “analyze”.

Copy the text below (only the first half of the poem) and use it to create a string containing the following poem in a variable called poem:

Jack

Much to his Mum and Dad's dismay
Jack ate himself one day.
He didn't stop to say his grace,
He just sat down and ate his face.
"We can't have this," his Dad declared,
"If that lad's ate, he should be shared."
But even as he spoke they saw
Jack eating more and more:
First his legs and then his thighs,
His arms, his nose, his hair, his eyes...
"Stop him, someone!" Mother cried.
"Those eyeballs would be better fried!"

The name in this version of the poem is wrong and all occurences of “Jack” should be replaced by “Horace”.

1. Count how many occurences of “Jack” there are and find where (at what character) the first occurence happens.

2. Modify the variable poem by replacing them all at once by “Horace”. Print the result.

Hint

Hmm. List methods might help.

1.1.3. Vote String Exercise¶

In this exercise, we will give democracy a boost by counting up the votes in a small town referendum.[Download]

Let’s assume that you are responsible for analyzing the outcome of a referendum organized to decide whether or not Texas should secede from the rest of the USA in your (very small) town. You are being given the data in the form of the string below, which is a set of “yes/no” votes, where “y” and “Y” both mean “yes” and “n” and “N” both mean “no”.

votes = “y y n N Y Y n n N y Y n Y”

Determine the percentages of “yes” and “no” votes in this small dataset.

Note: There is no need for a “for” loop: by simply exploring the methods available on any string, you will find enough tools to do this.

1.1.4. Caesar’s Cipher Exercise¶

In this exercise, we use operations on strings to implement one of the most famous codes in the history of codes, Caesar’s cipher. [Download]

Whenever there’s a Hurricane spinning out in the Atlantic (or Pacific), the US National Oceanic and Atomospheric Administration (NOAA) issues advisories about the storm’s strength. In this example, we will look at one such advisory for infamous Hurricane Katrina) that did so much damage to New Orleans in 2005.

Imagine you would like to build an application that “reads” storm advisories and assigns a danger level without human interaction. There are a lot of tools that could help with this (like NLTK) with fancy-dancy algorithms, but we’re going to take a very simple approach of scanning the document for mincing words.

Unlike a few of the other string exercises, the text for this example is located in a file on disk. We haven’t gotten to reading and writing files yet, but that is ok. The following snippet of code opens the file “katrina_advisory.txt” (which is located in the same directory as this exercise), and dumps its contents into a string called text. From here on out, you can work with text just as if you created the string yourself:

f.close()
print '-' * 51
print
print text
1. Text and data processing always starts by some clean up. Format the text by converting it to lower case, remove spaces before and after the content.

2. Ok. Now for our own fancy-dancy algorithm. Let’s count the number of alarming terms in total in the processed text. For our purposes, we’ll consider the following terms as alarming: “killed”, “destroyed”, “death”, “devastating”. (They all seem fairly alarming to me…)

3. Let’s also track how urgent NOAA thought the message was. For this, we’ll see if they started the message with the word “URGENT” (or “urgent”). Make a variable called is_urgent that is True if “urgent” is the first word and False otherwise.

Hint

Look at the methods available on strings. At least one of them will be stunningly useful for our purposes…

4. Now, let’s define the “danger level” as the number of occurrences of alarming terms computed above. But let’s further say that if the message started with “urgent”, then will increase the danger level by an additional 3. This is completely arbitrary, but you get the idea.

So, now its up to you to compute the danger level. Try and think of a way to do this without using an “if” statement (since we haven’t talked about it yet). If you get stuck, look at the hint below.

Hint

Since is_urgent is a boolean value, it can be True or False. In Python, True is also viewed as 1 for mathematical calculations and False is viewed as 0. Try printing out True * 10 and False * 10 at the command line and see what happens. Perhaps this gives you an idea of how you can use the is_urgent variable you calculated above as part of the danger level calculation.

The danger level you calculate should be 9.

1.1.6. Star Data Exercise¶

In this exercise, we will learn about the stars, specifically how to extract and print out information from a file containing astronomical data in formatted strings. [Download]

The data file for the “Third Catalogue of Nearby Stars” contains information about nearby stars in lines which look like the following:

Proxima Centauri  M5  e      11.05 15.49 771.8
Alp1Cen           G2 V        0.01  4.38 749.0
Alp2Cen           K0 V        1.34  5.71 749.0
52Tau Cet         G8 Vp       3.49  5.77 286.0

The data is provided in fixed-width fields, as follows:

0:17   Star name
18:28   Spectral class
29:34   Apparent magnitude
35:40   Absolute magnitude
41:46   Parallax in thousandths of an arc second

Both the lower limit and in the upper limit are inclusive here.

1. Given the following string, containing one line from the file, extract each of the data items from the string. You should strip extraneous whitespace and convert strings containing floating point numbers to Python floats:

star_string = "Proxima Centauri  M5  e      11.05 15.49 771.8"

1.1.6.1. References¶

Preliminary Version of the Third Catalogue of Nearby Stars. 1991. GLIESE W., JAHREISS H. Astron. Rechen-Institut, Heidelberg.

1.1.7. Web Color Exercise¶

The markup languages used to display web pages, HTML and CSS, usually express colors as hexadecimal strings of the form

#rrggbb

where the first two hexadecimal digits are the amount of

red,

the second two the amount of

green,

and the last two the amount of

blue,

in the color to be displayed. So, for example, blue corresponds to no red, no green and max value in blue and is therefore #0000ff. On the other hand,

yellow

is a mixture of red and green and can be represented by:

#dddd00

that is with the red and green values at 221, and the blue value at 0, when converted to decimal.

1. The color “indigo” usually has red, green and blue values respectively of 75, 0, and 130.

Create a hexadecimal string of the format above that represents this color using Python’s string formatting methods:

red = 75
green = 0
blue = 130

Hint

Remember that you need to ensure two digits for each color and that there is a formatting code for generating hexadecimal string representations of integers directly, so there is no need to convert a number to hexadecimal beforehand.

2. The document you download for this assignment is an ipython notebook, which contains (markdown) cells which understand html codes. If you double-click on this current cell or the one containing the general introduction above, you will see that we used codes similar to the ones you just generated to color the word <span style=”color:#ff0000;”>red</span>.

Using formating and the output of your previous question, generate a new string that inserts the indigo color inside a string of the form: <span style=”color:#??????;”>indigo</span> where #?????? is replaced by the hexadecimal code you found in question 1.

To test your result, copy the output of your solution, double-click on this cell, paste it [here] and click inside another cell to make it render.

1.1.8. Star Format Exercise¶

In this exercise, we will learn about formating star data. This exercise takes in teh opposite direction from the previous exercise. We go from numerical data to formatted strings that would be printed out to a file. [Download]

The data file for the “Third Catalogue of Nearby Stars” contains information about nearby stars in lines which look like the following:

Proxima Centauri  M5  e      11.05 15.49 771.8
Alp1Cen           G2 V        0.01  4.38 749.0
Alp2Cen           K0 V        1.34  5.71 749.0
52Tau Cet         G8 Vp       3.49  5.77 286.0
Eps Ret           K2 IV       4.44  3.57 067.0

The data is provided in fixed-width fields, as follows:

0:17   Star name
18:28   Spectral class
29:34   Apparent magnitude
35:40   Absolute magnitude
41:46   Parallax in thousandths of an arc second

These boundaries are both inclusive but include a space to the right of the value to separate it from the next. The apparent and absolute magnitude quantities have 2 decimal places of precision, while the parallax has 1. The parallax values should be padded with leading zeroes if the value is less than 100.

1. Given the following values for a star:

star_name = "Eps Ret"
spectral_class = "K2 IV"
apparent_magnitude = 4.44
absolute_magnitude = 3.57
parallax = 67.0

produce a properly formatted string using the format string method. The string you produce should correspond to the last line of the file above.

You can store your solution in a variable named eps_ret and test it with the following code:

success = eps_ret == "Eps Ret           K2 IV       4.44  3.57 067.0"

1.1.8.1. References¶

Preliminary Version of the Third Catalogue of Nearby Stars. 1991. GLIESE W., JAHREISS H. Astron. Rechen-Institut, Heidelberg.

1.1.9. DNA String Exercise¶

In this exercise, we use operations on strings to implement one of the most important codes of all, the DNA code. [Download]

1.2. Lists¶

This exercise group explores programming with lists.

This is a series of exercises designed to improve your knowledge of Python lists. Keep in mind that many of the functions and methods that work with lists work with other Python sequences.

1.2.1. List Operations Exercise¶

This exercise provides the opportunity to experiment some more with list creation and the use of their methods. [Download]

1. Create a list named ‘a’ with the elements 10, 21, 23, 11 and 24 in this order.

2. Modify the first element and the last element to be 0.

3. Add the element 11 at the end of the list a.

Hint

To explore the available methods for lists, type a.< TAB > in Ipython. Keep this in mind when you answer the questions below. They can all be answered using the appropriate methods attached to the list a.

4. How many occurrences of 11 is there in a?

5. Extend the list a with another list [“foo”,4]

6. What is the location (or index) of the first occurrence of 11?

7. Insert the value 100 as the third element.

Hint

All python sequences start at index 0.

8. Remove the fourth element.

Hint

All python sequences start at index 0.

9. Remove the first occurrence of 11

10. Sort the list

11. Reverse the list

12. Compute the length of the resulting list.

13. Test if 11 is in the list anymore and if 99 is not in the list

1.2.2. Managing ACME Exercise¶

We are now (badly) managing the employees of a new startup called ACME Corp., which has locations in Taos, Phoenix, Santa Fe, and Flagstaff. This exercise is about managing information about the company in lists. [Download]

1. The employees of this company have the following email addresses (by order of arrival date in the company):

Wile.E.Coyote@acme.com
Looney.Tunes@acme.com
Chuck.Jones@acme.com
Michael.Maltese@acme.com
Speedy.Gonzales@acme.com
Calamity.Coyote@acme.com
Bugs.Bunny@texavery.com

Copy these 8 emails into a list called employee_emails. Also create a list of employee IDs from 0 to 7 without writing each ID manually (let’s assume that we will reuse your code once ACME’s products finally start to work and sell and the company becomes huge).

2. A new employee, number 8, is joining the company: “Acceleratti incredibilis”. Add his email address to the list. Update the employee_ids list.

3. Suprisingly, one of ACME’s products (the “Earthquake Pills”) works remarkably well and was developed surprisingly fast. Pull up the emails for the team responsible for them, that is employees with IDs 2, 3, 4 and 5. This can be done using list slicing.

4. Despite the Earthquake Pills, this year, the poor financial results of the company only allow the company to shell out bonuses to every other employee (starting with employee 0). Using slicing, pull up their email addresses to announce the good news to them.

5. In fact the following year, the company is doing even worse. Mad not to have had a bonus the year before, the Looney Tunes decides to spin off half the company to create a new one with employees with odd IDs, except that Bugs Bunny guy (employee 7), because he doesn’t really belong here… Pull up their emails to send them a secret message.

Hint

Again slicing could help here since we can extract every other element with it. Could we change the start point to grab the other set of every other employee?

6. His communication was intercepted: Looney Tunes is fired. Remove him from the list of employees. Remove his employee ID as well.

7. Capture the list of locations of the company in a list (ordered by importance): “Taos”, “Phoenix”, “Santa Fe”, and “Flagstaff”. Considering the management issues in ACME, it is decided to reverse the order of these locations, and move the headquarters to Flagstaff. Update the list of locations.

8. The Boss ends up missing the nice skiing in Taos, and decides to reverse the location order again. The challenge here is to reverse the order without using the reverse method.

Hint

Slicing could help…

Exercises.essentials.lists.managing_acme.managing_acme.question_one()[source]

Return the email list shown above and a employee ID list.

Exercises.essentials.lists.managing_acme.managing_acme.question_two(employee_name)[source]

Add employee_name (a string) and the appropriate id number to the existing employee_list and id_list.

Exercises.essentials.lists.managing_acme.managing_acme.question_three(employee_name)[source]

Get the email addresses for employees 2,3,4, and 5

Exercises.essentials.lists.managing_acme.managing_acme.question_four()[source]

Use slicing to get the email addresses of every other employee

Exercises.essentials.lists.managing_acme.managing_acme.question_five()[source]

Use slicing to get the email addresses of employees with odd id numbers.

Exercises.essentials.lists.managing_acme.managing_acme.question_six()[source]

Remove “Looney Tunes” from the list of employees alongwith his ID number.

Exercises.essentials.lists.managing_acme.managing_acme.question_seven()[source]

Build a list containing “Taos”, “Phoenix”, “Santa Fe”, and “Flagstaff”, in that order. Return the reverse of that list. Hint: Check list methods

Exercises.essentials.lists.managing_acme.managing_acme.question_eight()[source]

Reverse the location list without using the reverse method.

1.2.3. Sort Words Exercise¶

Given a (partial) sentence from a speech, print out a list of the words in the sentence in alphabetical order. Also print out just the first two words and the last two words in the sorted list:

speech = '''Four score and seven years ago our fathers brought forth
on this continent a new nation, conceived in Liberty, and
dedicated to the proposition that all men are created equal.
'''

Ignore case and punctuation.

Let’s analyze the Katrina advisory further by computing the number of words and paragraphs, and extracting its metadata. Let’s assume that this is useful to know if our application will be able to post it on Twitter or send it by text messages or if other means of communication are needed.

Again, we will load the content of the advisory for you since we haven’t seen how to read files yet.

1.2.4.1. Question 1¶

Count the number of paragraphs in the text (2 paragraphs are delimited by a blank line). Print the result (the correct number of paragraphs is 12).

Hint

Paragraphs are delimited by the string “\n\n”

1.2.4.2. Question 2¶

Count the number of lines of text. This can be done without the need for a for loop, though a loop is an acceptable solution if you know how to implement it.

Hint: How can we get a list of lines from the content of the file? Count the number of lines total. Count the number of empty lines. The result is 34.

1.2.4.3. Question 3¶

We will define the first metadata for the alert message as a preview of the content. It will be made with the first 4 and the last 4 words. Combine this information into a string type variable preview similar to ‘The first four words … the last four words’.

1.2.4.4. Question 4¶

Let’s analyze the first paragraph and normalize its content:

URGENT -- WEATHER MESSAGE
NATIONAL WEATHER SERVICE NEW ORLEANS LA
1011 AM CDT SUN AUG 28, 2005

Parse it to extract its priority flag made of the first word of the paragraph, the location it originates from (city, state), the time and the date and store that into 4 distinct variables. It is safe to assume that the location will always follow “National Weather Service” on the second line and that the time will always be the first 3 entries on the third line.

Store the rest of the message into a “content” variable.

These date, location and flag metadata could be used add this information automatically on a map, in a calendar, with appropriate flagging, though this is beyond the scope of this exercise.

Count the paragraphs.

Count the lines.

Get meta data, a strng (1st 4 words, last 4 words of text)

Extract specific info

1.3. Dictionaries¶

This group of exercises explores the use of dictionaries, containers that associate one kind of data with another.

1.3.1. Roman Dictionary Exercise¶

In this exercise, we use operations on dictionaries to keep track of Roman social connections. [Download]

Mark Antony keeps a list of the people he knows in several dictionaries based on their relationship to him:

friends = {'julius': '100 via apian', 'cleopatra': '000 pyramid parkway'}
romans = dict(brutus='234 via tratorium', cassius='111 aqueduct lane')
countrymen = dict([('plebius','786 via bunius'),
('plebia', '786 via bunius')])
1. Print out the names for all of Antony’s friends.
2. Now all of their addresses.
3. Now print them as “pairs”.
4. Hmmm. Something unfortunate befell Julius. Remove him from the friends list.
5. Antony needs to mail everyone for his second-triumvirate party. Make a single dictionary containing everyone.
6. Antony’s stopping over in Egypt and wants to swing by Cleopatra’s place while he is there. Get her address.
7. The barbarian hordes have invaded and destroyed all of Rome. Clear out everyone from the dictionary.

1.3.2. Potato Market Dictionary Exercise¶

This exercise uses dictionaries to track buy and sell orders in a small market. [Download (a Jupyter notebook)]

In this example, we’re going to use a couple of dictionaries to track the buy/sell orders in a (very) simple financial market – well actually, a potato market.

In any kind of market, folks meet together to buy and sell “stuff.” At a flea market that stuff is fossils, broken electronics, Aunt Nelly’s coffee table, etc. In the financial market the stuff is stocks (IBM, GOOG, ATT), commodities (oil, potatoes, pork bellies), currency (pounds, yen, euros), and many others.

If you’re familiar with how a market works, you can probably skip to the end of this section. If not, then let’s imagine you have a potato market that works in the following way. Every morning, potato farmers (the sellers) show up with a bag of potatoes to sell, and chefs (the buyers) show up to buy potatoes for their restaurants. After everyone has their coffee, discusses the weather and the new fashion in overalls, the farmers sit down together on a bench. Each farmer holds up a price that they are willing to sell their bag of potatoes for. Now, one of them may have a fishing trip planned in the afternoon and he’ll price his potatoes cheaply so they will sell fast. Another might have his eye on a new fishing pole he wants, and would like to sell his bag of potatoes for as much as possible. (Our farmers like to fish…) As you might guess, the farmer selling at a lower (minimum) price will sell his bag first.

Meanwhile, the chefs will gather around the bench and hold up the price they are willing to pay for a bag of potatoes. The ones who need to get back and prepare a big dinner might be willing to pay a high price for potatoes while the ones not sure they even want potatoes on the menu tonight might only be willing to buy if they get a good deal, so they hold up a low price. For the buyers, the person willing to pay the highest (maximum) price will be the first to head back to their restaurant with a bag of potatoes.

At anytime while the market is open, a farmer or chef can change their price. Also, new farmers/chefs may join the market, and others may leave the market.

As you can see, this really results in two different prices for potatoes. There is an “offer” or “ask” price which is what the farmers are willing to sell potatoes for, and there is a “bid” price which is what chefs are willing to pay. Only when these two match do potatoes and cash change hands.

So, during all of this commotion, there is a potato market manager in the corner. He has two jobs. First, he watches both buyer and seller prices to see if any of them “match.” When that happens, he pairs the farmer and the chef together so they can exchange their potatoes and money. His second job? Make sure the transistor radio crackling from the window sill stays tuned to the local AM radio station, KAND.

Our potato market is pretty similar to a financial market that uses a [limit order book](http://www.nasdaq.com/investing/glossary/l/limit-order-book): this is a collection of buy (chef) and sell (farmer) orders from various traders where the trader is willing to wait to get the price he wants rather than trading immediately.

To conclude, the offer price is the lowest of the sell orders in the limit order book, while the bid price is the highest of the buy orders in the limit book.

Ok, so now that we have a problem defined, let’s imagine that we use one dictionary to track the farmers’ sell orders and one to watch the chefs’ buy orders. From here, we’ll figure out the existing offer/ask prices, folks entering the market, leaving the market, an order match, and… dum-dum-dum–dum, a [black swan event](http://en.wikipedia.org/wiki/Black_swan_theory).

Note: In a real financial market, each order will also have the number of shares to buy or sell associated with an order (I’ll sell 1000 shares at a price of 20.50). In our potato market, we’ll ignore that detail. Every order will be for a single bag of potatoes. Also, instead of using a person’s name for an order, typically something like an (integer) order id would be used. But, for our case, it is more fun to use names…

Farmers selling potatoes and the price they are willing to sell at:

sell_orders = {
"Joe": 10.50,
"Jane": 10.25,
"Bob": 10.75,
"Melvin": 11.00,
}

Chefs ready to buy potatoes and the price they are willing to pay:

"Pierre": 9.50,
"Joel": 9.25,
"Geno": 9.75,
"Ellen": 9.50
}
1. Compute the offer price and the bid price for potatoes.

Hint

There are a couple of functions, min() and max(), that are “built-in” to python. They should be useful on this one. At the python command prompt, type min? and max? to see how they work. You’ll probably want to grab the values from each of the dictionaries and then use the appropriate one of these functions on them.

2. A new farmer, Arnold, shows up ready to sell his bag of potatoes for $10.00. Update the sell orders to reflect this and print it out. 3. Geno’s wife calls and says to come home quick, the dog got tangled in an extension cord. Geno rolls his eyes and reluctantly heads home, rueing the day he got that stinking dog.  Remove Geno from the market. What is the new bid price?  4. Chef Juan, comes running up after a bus bound for the Vegetarians Who Love Eggs Conference stopped at his taco stand and cleared him out of potato tacos. He needs more potatoes for the late morning run that always happens. Chef Juan bids 10.00 for a bag of potatoes. Add him to the buy order. Again, check the bid and offer prices. Hmm. They should be the same now. 5. At this point, the market manager notices that Juan is bidding the same price that Arnold is offering. He teams these two up and they shake hands. Juan pays, gets his potatoes and dashes back to make more tacos. Arnold, heads back to tune the carburetor on his tractor – it konked out on the back 40 last week, and he had to walk all the way home. Since they are now gone, remove both of these guys from the market. As you do this, compare there bid and offer prices just to make sure they match. 6. And then it happens… The devastating event that turns the potato market upside down. From the window sill, crackling over the air waves comes the announcement. Frank, owner of “Be the Fish: Lure Store” announces his retirement sale. All of his famous handmade Super-Z fishing jigs are half off while supplies last. As soon as the farmers hear this, they eye each other suspiciously and simultaneously make a mad scramble for the door. The potato seller bench clears in 3 seconds flat, leaving all the chefs in a panic. Where will they get their potatoes? As they walk back to their restaurants, they notice that Juan, having already heard the news from the pack of farmers running by, potatoes in hand, was busy changing his sign from “potato tacos$1.00” to “potato tacos \$1.50.”

Clear out the sell_orders dictionary to show there are no longer any sellers. Print out the names of the sad chefs who are left without any potatoes.

1.3.3. DNA Dictionary Exercise¶

This exercise returns to the DNA data we used in the DNA strings exercise. We show how dictionaries play a natural role storing the decoding relations for DNA. [Download (a Jupyter notenook)]

If you haven’t done the “DNA String” exercise in the lecture called “Introduction to Strings” then you probably should do that exercise before attempting this one.

1.3.3.1. Background¶

Sequences of DNA are frequently represented by strings of letters, each corresponding to a base:

"C" is cytosine
"G" is guanine
"T" is thymine

A gene encodes a protein by specifying the amino acids that compose it via groups of 3 bases (called “codons”). Each codon corresponds to an amino acid or a special “start” or “stop” sequence.

In the usual genetic code the sequence “ATG” indicates the start of the encoding of the protein (and also encodes the amino acid methionine). The three codons “TAA”, “TAG” and “TGA” are stop codons and indicate that the protein is finished.

1. Below is a dictionary called codon_table that maps codons to their corresponding amino acid abbreviations (the stop codons are usually abbreviated by a *). Extract the abbreviation associated with the codon “AAG”:

codon_table = {
'TTT': 'F', 'TTC': 'F', 'TTA': 'L', 'TTG': 'L',
'TCT': 'S', 'TCC': 'S', 'TCA': 'S', 'TCG': 'S',
'TAT': 'Y', 'TAC': 'Y', 'TAA': '*', 'TAG': '*',
'TGT': 'C', 'TGC': 'C', 'TGA': '*', 'TGG': 'W',

'CTT': 'L', 'CTC': 'L', 'CTA': 'L', 'CTG': 'L',
'CCT': 'P', 'CCC': 'P', 'CCA': 'P', 'CCG': 'P',
'CAT': 'H', 'CAC': 'H', 'CAA': 'Q', 'CAG': 'Q',
'CGT': 'R', 'CGC': 'R', 'CGA': 'R', 'CGG': 'R',

'ATT': 'I', 'ATC': 'I', 'ATA': 'I', 'ATG': 'M',
'ACT': 'T', 'ACC': 'T', 'ACA': 'T', 'ACG': 'T',
'AAT': 'N', 'AAC': 'N', 'AAA': 'K', 'AAG': 'K',
'AGT': 'S', 'AGC': 'S', 'AGA': 'R', 'AGG': 'R',

'GTT': 'V', 'GTC': 'V', 'GTA': 'V', 'GTG': 'V',
'GCT': 'A', 'GCC': 'A', 'GCA': 'A', 'GCG': 'A',
'GAT': 'D', 'GAC': 'D', 'GAA': 'E', 'GAG': 'E',
'GGT': 'G', 'GGC': 'G', 'GGA': 'G', 'GGG': 'G',
}

codon = “AAG”

2. We have created another dictionary amino_acid_table that maps the amino acid abbreviations to their full names. Extract the full name of the amino acid associated with the codon “CAA”:

amino_acid_table = {
'A': "alanine",
'C': "cystine",
'D': "aspartic acid",
'E': "glutamic acid",
'F': "phenylalanine",
'G': "glycine",
'H': "histidine",
'I': "isoleucine",
'K': "lysine",
'L': "leucine",
'M': "methionine/start",
'N': "asparagine",
'P': "proline",
'Q': "glutamine",
'R': "arginine",
'S': "serine",
'T': "threonine",
'V': "valine",
'W': "tryptophan",
'Y': "tyrosine",
'*': "stop",
}

codon = "CAA"

3. Human mitochondrial DNA has a slightly different genetic code where “AGA” and “AGG” are additional “stop” codons, “TGA” is not a “stop” codon but instead codes for tryptophan, and “ATA” codes for methionine instead of isoleucine.

Copy the codon table into a new mitochondrial_table and modify the dictionary so that it corresponds to the mitochondrial DNA genetic code.

Note

You don’t need to manually copy the whole table in the editor. Instead see if you can find dictionary methods to copy the codon_table. We will review all dictionary methods at the next lecture, so this is a preview.

Hint

Remember that to find methods (tools) attached to an object, create a notebook cell, create any dictionary inside and then type dot and then the TAB key. For example:

 {}.[HIT THE TAB KEY HERE]

4. (Bonus). If you already know about writing loops in Python, build programatically a list of all of the codons that can produce serine.

1.4. Sets¶

This group of exercises explores the use of sets, containers that have no duplicates. Sets come in two flavors, mutable and immutable. The immutable flavor is called a frozenset.

1.4.1. Flight Distances Exercise¶

This exercise asks you to compute distances between cities and to represent the distances in dictionary whose keys are frozensets. [Download]

Flying Circus Airlines flies between the following cities (with distances):

Atlanta-Chicago:                    590.0
Atlanta-Dallas:                     720.0
Atlanta-Houston:                    700.0
Atlanta-New York:                   750.0
Austin-Dallas:                      180.0
Austin-Houston:                     150.0
Boston-Chicago:                     850.0
Boston-Miami:                      1260.0
Boston-New York:                    190.0
Chicago-Denver:                     920.0
Chicago-Houston:                    940.0
Chicago-Los Angeles:               1740.0
Chicago-New York:                   710.0
Chicago-Seattle:                   1730.0
Dallas-Denver:                      660.0
Dallas-Los Angeles:                1240.0
Dallas-New York:                   1370.0
Denver-Los Angeles:                 830.0
Denver-New York:                   1630.0
Denver-Seattle:                    1020.0
Houston-Los Angeles:               1370.0
Houston-Miami:                      970.0
Houston-San Francisco:             1640.0
Los Angeles-New York:              2450.0
Los Angeles-San Francisco:          350.0
Los Angeles-Seattle:                960.0
Miami-New York:                    1090.0
New York-San Francisco:            2570.0
San Francisco-Seattle:              680.0

We can represent this data in a dictionary mapping the pair of cities to the distance between them. Because the distance between cities isn’t directional information, a set of the cities (without any order) seems like the right way to store the keys (pairs of cities).

1. Do you remember why a regular set cannot be a key in a dictionary? We will therefore use frozen sets instead. Build a frozen set with Atlanta and Chicago and another one with Atlanta and Dallas. Make a dictionary called flight_distances mapping these sets to their distances (590 and 720 respectively).

2. We have built for you the full dictionary flight_distances. Use it to print the distance from Seattle to Chicago.

3. Compute the total distance flying from Austin to Houston to San Francisco and compare it to the distance if you fly Austin to Dallas to Los Angeles to San Francisco.

4. Flying Circus Airlines adds a direct flight between Austin and San Francisco, which is 1500 miles. Update the flight distances data structure to reflect this.

5. Flying Circus Airlines cancels the service from Boston to Miami. Remove it from the flight distances.

6. Below, we have built for you the list of cities that Flying Circus Airlines reaches. (The question Bonus 1 after this one will get you to build it yourself if you know about looping.) Another company, SouthBy Airlines, flies to the second list of cities given below. The CEO of SouthBy is trying to evaluate if it makes sense to buy Flying Circus Airlines.

1. How many cities would be covered if both companies decided to merge?

2. Is Flying Circus Airlines bringing any new cities to the table?

3. To see if Flying Circus share holders will approve the merger, list the cities that are currently not covered by them and that would be covered by the merged company.

4. To evaluate the efficiency of the future merger, compute how many cities are reached by only one of the 2 companies currently.

Hint

Remember this exercise is about sets. Turn the relevant data into sets and use set methods.

7. You will need to be familiar with loops and functions to do the remaining bonus questions.

1. Build the set of cities that Flying Circus Airlines flies to programatically. You should have a set containing exactly the cities listed in flying_circus_cities above.

Hint

Use a for loop over the keys of the dictionary above and the union operation on sets to build that set.

1. Write a function that takes a list of cities that are directly connected, and computes the total distance to fly between those cities.

2. Other Essentials (functions, loops, IO)¶

2.1. Basic Loops¶

This exercise group explores using basic loops.

2.1.1. Filter Words Exercise¶

We provide you with the following beginning of a famous children song. Print out only words that start with “o”, ignoring case:

My Bonnie lies over the ocean.
My Bonnie lies over the sea.
My Bonnie lies over the ocean.
Oh bring back my Bonnie to me.

Bonus: Print out words only once.

2.1.2. Inventory Exercise¶

Use loops to calculate and report the current inventory in a warehouse. [Download]

Assume the warehouse is initially empty.

The string warehouse_log is a stream of deliveries to and shipments from a warehouse. Each line represents a single transaction for a part with the number of parts delivered or shipped. It has the form:

part_id count

If “count” is positive, then it is a delivery to the warehouse. If it is negative, it is a shipment from the warehouse.

Hint

You should write a loop that updates a dictionary whose keys are part_ids and whose values are the current number of items in the warehouse with that part_id.

Exercises.essentials.basic_loops.inventory.inventory.update_inventory(warehouse_log)[source]

Return a dict conatining the current inventory size for each part_id based on string warehouse_log.

2.1.3. DNA Translation¶

This is a more ambitious exercise showing how basic loops play a role in computing something meaningful, DNA sequence functions. That is, we are going to use loops to decode a DNA sequence and discover what protein it builds. [Download]

If you haven’t done the “DNA String” or the “DNA Dictionary” exercises then you probably should do those exercises before attempting this one.

Sequences of DNA are frequently represented by strings of letters corresponding to the bases:

“A” is adenine “C” is cytosine “G” is guanine “T” is thymine

A gene encodes a protein by specifying the amino acids that compose it via groups of 3 bases (called “codons”). Each codon corresponds to an amino acid or a special “start” or “stop” sequence.

In the usual genetic code the sequence “ATG” indicates the start of the encoding of the protein (and also encodes the amino acid methionine). The three codons “TAA”, “TAG” and “TGA” are stop codons and indicate that the protein is finished.

In the assignment code, there is a dictionary codon_table that maps codons to their corresponding amino acid abbreviations

In this example, we will look at a genetic sequence from the human genome which encodes the histone cluster 1, H1b.

A ACC TGC TCT TTA GAT TTC GAG CTT ATT CTC TTC TAG CAG TTT CTT GCC
ACC ATG TCG GAA ACC GCT CCT GCC GAG ACA GCC ACC CCA GCG CCG GTG GAG
AAA TCC CCG GCT AAG AAG AAG GCA ACT AAG AAG GCT GCC GGC GCC GGC GCT
GCT AAG CGC AAA GCG ACG GGG CCC CCA GTC TCA GAG CTG ATC ACC AAG GCT
GTG GCT GCT TCT AAG GAG CGC AAT GGC CTT TCT TTG GCA GCC CTT AAG AAG
GCC TTA GCG GCC GGT GGC TAC GAC GTG GAG AAG AAT AAC AGC CGC ATT AAG
CTG GGC CTC AAG AGC TTG GTG AGC AAG GGC ACC CTG GTG CAG ACC AAG GGC
ACT GGT GCT TCT GGC TCC TTT AAA CTC AAC AAG AAG GCG GCC TCC GGG GAA
GCC AAG CCC AAA GCC AAG AAG GCA GGC GCC GCT AAA GCT AAG AAG CCC GCG
GGG GCC ACG CCT AAG AAG GCC AAG AAG GCT GCA GGG GCG AAA AAG GCA GTG
AAG AAG ACT CCG AAG AAG GCG AAG AAG CCC GCG GCG GCT GGC GTC AAA AAG
GTG GCG AAG AGC CCT AAG AAG GCC AAG GCC GCT GCC AAA CCG AAA AAG GCA
ACC AAG AGT CCT GCC AAG CCC AAG GCA GTT AAG CCG AAG GCG GCA AAG CCC
AAA GCC GCT AAG CCC AAA GCA GCA AAA CCT AAA GCT GCA AAG GCC AAG AAG
GCG GCT GCC AAA AAG AAG TAG GAA GCT GGC GTG TGA AAA CCG CAA CAA AGC
CCC AAA GGC TCT TTT CAG AGC CAC CCA
1. Write Python code that:
1. Finds the first start codon in the sequence (Hint: remember what you did in the “DNA String” exercise).
2. Loops over the codons, building a string of the abbreviations of the protein’s amino acids (eg. the protein should start with “MSETAPA…”)
3. Stops when it reaches a stop codon.
4. Prints out the amino acid string.
2. Print the number of amino acids in the protein.
3. There is another dictionary amino_acid_table that maps the abbreviations to their full names. Take the string of the abbreviations of the amino acids and print out for each amino acid its full name and whether or not it is used by the protein.
4. Bonus: Because most amino acids have multiple codons which can produce them, there are many different sequences that will potentially produce this protein. Compute how many there are.
5. If you need to do this sort of bioinformatics manipulation, the “Biopython” library does all of these sorts of things and more.
6. References

Here are some steps your code should take:

1. Find the location of the first start codon
2. Loop, building a string of abbreviations for amino acids
3. Print out the string
4. Print the number of amino acids in the protein.
5. For each amino acid, print its name and whether or not it is in the protein.
6. As a bonus, how many different sequences could produce this protein?

2.1.4. Prime Numbers Exercise¶

This uses basic loops to test for a simple mathematical property, whether a number is prime. [Download (Jupyter notebook)]

2.1.4.1. Background¶

A prime number is a number that has no divisors other than 1 and itself. For example, 4 is not a prime number because it can be evenly divided by 2 (4 divided by 2 is an integer, 2), while 5 is a prime number because it cannot be divided by 2, 3, or 4.

A classic algorithm to find primes is the Sieve of Eratosthenes. Prime numbers have various interesting applications, including cryptography, for example in the widely used RSA encryption algorithm.

The goal of this short exercise is to get toward a reasonably efficient way to collect all prime numbers that are below one million. The exercise will get you to experiment with nested for loops, both breaking out of them, as well as the for-else construct, a pattern that is a little more advanced than the basic for loop but that is very convenient in certain situations.

To efficiently check very many numbers for being prime, we will want to be a little clever in how many tests we do. For example, when we are testing if 5 is prime, we can actually skip testing if it is divisible by 4. Since 4 is not a prime number, as it equals 2 times 2, if 4 is a divisor then 2 must be as well. So if we have already checked for divisibility by 2, then we can skip checking for divisibility by 4. In fact, we really only need to test for division by prime numbers. More optimizations than this are possible as well.

2.1.4.2. Exercises¶

1. For the first step, let’s write some code to test if a single number, say 79, is a prime number.

Hint

If you don’t know where to start, try to encode the reasoning we followed above for testing if 5 was a prime number. Also remember that x % y equals 0 if y evenly divides x. Finally, the for-else pattern can be useful here to detect that we have tried unsuccessfully to divide 79 by all possible divisor candidates.

2. Reuse the code above to create a list containing all prime numbers less than max_n=100. Print the length of the list.

3. (Bonus) In Python, for loops are slower than in C or Fortran, especially for very large numbers of items. For most common operations, where the number of items in the loop is modest, Python is fast enough. But if you deal with a very long list of objects, and especially if you use nested loops, like in our algorithm above, the looping can become quite slow.

To observe this, run the code above with increasing max_n, up to 1000, 10000, or even higher if you are patient, and notice how the execution time goes from instantaneous to many seconds.

One of the main reasons that loops can be slow, is that Python can have arbitrary types of objects in a list, which makes some optimizations difficult to apply. This is one of the main reasons for the existence of the [Numpy](http://www.numpy.org/) module and tools like [Cython](http://www.cython.org/) that we will cover later in other courses.

If we want to find all the primes less than 1,000,000, we will need to be careful about our algorithm, and avoid too much looping. Think about some ways to skip some of the values in our nested loops above. For help, have a look to http://en.wikipedia.org/wiki/Prime_number or open the hint below.

Hint

There are a couple of simple optimizations that we can use.

First, since all prime numbers except 2 are odd, we can start by taking 2 as prime and then only checking odd numbers starting with 3.

Second, we only actually need to check for prime factors up to, and including, the square root of the number we are testing. If the test number is divisible by something larger than the square root, then the resulting quotient must be less than the square root, and we would find that as a divisor first. This is an especially important optimization for large numbers.

2.2. List Comprehensions¶

2.2.1. Climate data analysis (part I) Exercise¶

We want to analyze some world wide climate data from the National Climatic Data Center, since they archived the world’s largest climate data around the world with historical data dating back many centuries. To evaluate if their datasets will be relevant for our analysis, we can download their list of countries. The file has been downloaded for you and is available as part of this exercise (the file is called NCDC_country_list.txt) and each line contains the country name one can download data for. We would like to analyze it using list, sets, and dictionary comprehensions. In a subsequent exercise, we will use the original complete data file which provides not only the country name but its code to allow collecting and analyzing the data corresponding to it.

In load_normalize_data, we’ll load the data for you into a large string containing all the countries.

2.2.1.1. Question 1¶

We would like to list all the countries in this list that start with the letter “b” because we are interested in datasets for Brazil. This can be done with a for loop as follows

>>> country_list = countries.split("\n")
>>> b_countries = []
>>> for country in coutry_list:
if country[0] == "b":
b_countries.append(country)

Re-write this to use a list comprehension instead. Use the partial definition question_one function below as a guide. Your function should take string countries as an argument, turn it into a list, and use a list comprehension to filter out ll the countries except those that begin with b.

2.2.1.2. Question 2¶

Several countries are repeated in the result generated by the list comprehension. This is because there are multiple codes used by NCDC for a given country when it is particularly large. Cast your list to another Python standard datastructure that will enforce uniqueness.

2.2.1.3. Question 3¶

If we are always going to collect all the country names and then remove duplicates, we could build a set directly rather than going through a list. Use a set comprehension (or a generator expression if you are using an older version of Python) to produce the set of names that start with”b”.

2.2.1.4. Question 4¶

Use a dictionary comprehension (or generator expression) to produce a dictionary whose keys are all the countries and whose values are the number of times they appear in the data file because they have been sub-divided. Print the content of the dictionary in a nice way, one country per line.

Exercises.essentials.list_comprehensions.climate_data_collection.climate_data_collection.question_four(countries)[source]

Start with countries string, make it a list, produce a dictionary whose keys are countries and and whose values are the number of times a country has been sub-divided. Hint: You may find the count method on lists useful:

4

2.2.2. Filter Words Exercise¶

This is the same exercise as Exercises.essentials.basic_loops.filter_words with some wrinkles.

We provided you with the following beginning of a famous children song. You were supposed to print out only words that start with “o”, ignoring case:

My Bonnie lies over the ocean.
My Bonnie lies over the sea.
My Bonnie lies over the ocean.
Oh bring back my Bonnie to me.

2.2.2.1. Question 1¶

What you did in the previous exercise was something like this:

>>> lyrics = lyrics.replace('.',"")
>>> lyrics = lyrics.replace(',',"")

>>> words = lyrics.split()
>>> o_words = []
>>> for word in words:
if word[0] == "o":
o_words.append(word)

The last step used a basic for-loop. Your task in this exercise is to do the same things but replace the last step with a list comprehension.

2.2.2.2. Question 2¶

Bonus: Print out words only once.

Hint

Cast your list to another Python standard datastructure that will enforce uniqueness.

2.2.2.3. Question 3¶

If we are always going to collect all the o-words and then remove duplicates, we could build a set directly rather than going through a list. Use a set comprehension (or a generator expression if you are using an older version of Python) to produce the set of words that start with “ob”.

Exercises.essentials.list_comprehensions.filter_words_list_comp.filter_words.question_three(lyrics)[source]

Start with lyrics string, make it a list, narrow down to words starting with “o”, enforce uniqueness, all in one step.

2.3. Functions¶

This exercise group explores the programming idea of a function.

2.3.1. Column Cipher Exercise¶

This exercise asks you to write function to implement a column cipher. It will make use of slicing. [Download]

A column cipher works by writing the message in rows of a fixed length, and then extracting the columns and concatenating them. So the message “THISISACOLUMNCYPHER” with rows of length 5, would be written:

THISI
SACOL
UMNCY
PHERX

and then be sent as “TSUPHAMHICNESOCRILYX”. Note that the message has beend padded with extra characters (in this case “X”) to make its length a multiple of the number of columns.

In this exercise you will encode the message “This message is very secret” with a column cypher with rows of length 3.

1. It is common in these examples to remove spaces and use all upper-case letters. Convert the message to this format with Python code.

2. Add a number of “X” characters to the end of the message so that its length is multiple of 3.

Bonus: Try to do this so that if you change the message your code will still work.

Hint: the modulo operator % gives the remainder when dividing by an integer. What does % do to negative integers in Python?

3. The first column contains the characters at index 0, 3, 6, etc. Use slices to extract this column.

4. Extract the second and third columns using slicing and produce the encoded message.

To decode the encoded message, you would repeat this process, but with 3 rows of length 8 instead of 8 rows of length 3.

2.3.1.1. Bonus¶

If you know about functions and loops, you can attempt this bonus question.

1. Write a function called encode_message which takes two arguments, the message to encode (which is called the plaintext) and the number of rows. It should return the encoded message.
2. Write a function which takes two arguments, the encoded and the number of rows. It should return the decoded message, although that message may contain some padding characters. Thus, the decoded message may not be exactly the same as the plaintext that

2.3.2. Great Circle Exercise¶

In this exercise you compute the distance between cities using Lat/Long coordinates. [Download]

The shortest distance between two points on the globe, assuming it is perfectly spherical, is the length of the great circle path. If you are given two locations in latitude and longitude, then the Haversine Formula gives this distance in a numerically stable way. Here are the steps to computing $$d$$, the distance:

\begin{align}\begin{aligned}a &= \sin^{2}\frac{(\phi_{1} - \phi_{2})}{2} + \cos(\phi_{1})\cos(\phi_{2})\sin^{2}\frac{(\lambda_{1} - \lambda_{2})}{2}\\c &= 2\; \arcsin(\sqrt{a})\\d & = rc\end{aligned}\end{align}

Where $$\phi_{1},\;\lambda_{1}$$ is the latitude, longitude of point 1 and $$\phi_{2},\;\lambda_{2}$$ is the latitude,longitude of point 2, and $$r$$ is the radius of the globe.

2.3.2.1. Question 1¶

Write a function called haversine_formula that takes as inputs a radius and two points specified by tuples of (latitude, longitude) and returns the distance between the points along a great circle. These are the math functions needed to implement above formula:

from math import sin, cos, asin, radians, sqrt

The functions asin is $$\arcsin$$ and the function radians is used to convert degrees to radians. You will need this since lat/longs are degrees, and the math functions $$\sin$$ and $$\cos$$ will only give the expected answers if their arguments are in radians.

Test your function using these values:

>>> r_earth = 6371.0 # km
>>> austin = (30.2500, -97.7500)
>>> cambridge = (52.2050, 0.1190)

>>> print haversine_formula(r_earth, austin, cambridge)
7894.56962773

2.3.2.2. Question 2¶

Here is the list of cities from the “Flight Distances” exercise, along their latitude and longitudes:

cities = {
'Atlanta': (33.7569444444, -84.3902777778),
'Austin': (30.3, -97.7333333333),
'Boston': (42.3577777778, -71.0616666667),
'Chicago': (41.9, -87.65),
'Dallas': (32.7825, -96.7975),
'Denver': (39.7391666667, -104.984722222),
'Houston': (29.7627777778, -95.3830555556),
'Los Angeles': (34.05, -118.25),
'Miami': (25.7833333333, -80.2166666667),
'New York': (40.67, -73.94),
'San Francisco': (37.7666666667, -122.433333333),
'Seattle': (47.6, -122.316666667),
}

write a function named city_distance that, given a dictionary of city names and locations, plus the names of two cities, returns the great circle distance between the two cities. The result should be rounded to the nearest 10 km.

Hint

The built-in round() function takes an optional second argument for the number of digits of precision. This argument can be negative.

Your function should work like this:

>>> dist = city_distance(cities, "Austin", "San Francisco")
>>> print dist

2.3.2.3. Question 3¶

Write a function that, given a set of cities returns a dictionary whose keys are pairs of cities and whose values are the distances between them. You should use an appropriate data structure for the keys, and round distances to the nearest 10 km.

def compute_distances(cities):

Bonus points if you compute the distance for a given pair of cities only once.

Hint

Have a look at the flight_distances exercise in the Frozenset lecture for a possible data structure.

You will need to convert your latlongs to radians to use the haversine formula. Call the deg2radian function defined below to do that.

2.4. If Statements¶

Some if-statement exercises.

2.4.1. Fizz Buzz Exercise¶

In this exercise you use the power of if-statements to write a program that teaches children about division. The game is called fizz buzz. [Download]

In this short exercise, we will program the core of a little game designed to teach children about the concept of divisibility. The idea is, for any given number, to print a special message if it is a multiple of 3 and/or 5, or else just print the number itself. It will teach you to use if statements to analyze that number as well as the modulo operator %. In the following lectures, we will learn about looping, which will be needed to draw and analyze more than 1 number at a time. For now, we will draw a number randomly and print the message for it.

The following code generates a random number from 1 to 100:

from random import randint
n = randint(1, 101)
print n

2.4.1.1. Question 1¶

For the first level of our game, write a test that prints “Fizz” if the number drawn is a multiple of 3, or just prints the number itself if it is not.

Test your code with many different values of n to make sure it works correctly. For that, run the code above again multiple times and you will see n taking multiple values. You can run your test after each generation of n. (That should make you look forward to learning about for loops :).)

Hint

You will need an if-statement.

You can use the modulo operator to test for divisibility: n is divisible by 3 if n % 3 equals 0.

2.4.1.2. Question 2¶

The second level of our game will be a little more complex. Now write a test that prints “Fizz” if the number is a multiple of 3, or prints “Buzz” if the number is a multiple of 5. If it is a multiple of both 3 and 5, it should print “FizzBuzz”. Finally, if it is neither a multiple of 3 nor 5, it should just print the number.

Test your code with many different values of n to make sure it works correctly.

2.4.1.3. Question 3¶

We will be analyzing a lot of numbers. In real-life problems, the analyzing may be much more time consuming than just the modulo operation, so it would be useful to build a cache of the output text. What data structure would conveniently store a number and its corresponding message? Build an empty one called cached_analysis. In a separate cell, for a given number n, check if a message has already been stored. Print it if it has. Otherwise, use the same test as before to build the message, and store it in cached_analysis in addition to printing it.

Hint

Since we want to map a number to a message, so that we can look up the number and print the corresponding message, the data-structure we need is a dictionary.

2.5. File IO¶

Some file IO exercises.

2.5.1. ASCII Log File Exercise¶

Read in a set of logs from an ASCII file.

Read in the logs found in the file short_logs.crv. The logs are arranged as follows:

DEPTH    S-SONIC    P-SONIC ...
8922.0   171.7472   86.5657
8922.5   171.7398   86.5638
8923.0   171.7325   86.5619
8923.5   171.7287   86.5600
...

So the first line is a list of log names for each column of numbers. The columns are the log values for the given log. Despite the forbidding sounding extension (“.crv”), short_logs.crv is just an ASCII text file containing information organized as a table, in which the values are all separated by spaces.

Make a dictionary with keys as the log names and values as the log data:

>>> logs['DEPTH']
[8922.0, 8922.5, 8923.0, ...]
>>> logs['S-SONIC']
[171.7472, 171.7398, 171.7325, ...]

2.5.1.1. Bonus¶

run -t 'ascii_log_file.py'

And see if you can come up with a faster solution. You may want to try the long_logs.crv file in this same directory for timing, as it is much larger than the short_logs.crv file. As a hint, reading the data portion of the array in at one time combined with strided slicing of lists is useful here.

Write a for loop that loops through the log file and fills the dictionary as described above.

2.5.2. Star Catalog Exercise¶

The file stars.dat contains data about some of the nearest stars. Data is arranged in the file in fixed-width fields:

0:17    Star name
18:28   Spectral class
29:34   Apparent magnitude
35:40   Absolute magnitude
41:46   Parallax in thosandths of an arc second

A typical line looks like:

Proxima Centauri  M5  e      11.05 15.49 771.8

In addition, some lines may start with the ‘#’ character, indicating that the line is a comment.

In this exercise you will write two functions: one that reads in data from files of this format, and one which writes data out to files of this format.

The data read in from the file should be returned as a list of dictionaries, one for each star, with keys: “name”, “spectral_class”, “apparent_magnitude”, “absolute_magnitude”, and “parallax”.

Similarly, the function that writes the data to a file should expect a list of dictionaries of this form.

The read function should ignore comment lines, while the write function should accept an optional argument containing a multiline comment, which should be written at the start of the file.

Hint

You may want to review the lecture on string formatting and the star_format exercise.

2.5.2.1. Bonus¶

Gracefully handle errors such as invalid file names, badly formatted data, and data which doesn’t match the expected structure (such as missing keys or values of the wrong type).

2.5.2.2. References¶

Preliminary Version of the Third Catalogue of Nearby Stars. GLIESE W., JAHREISS H. 1991. Astron. Rechen-Institut, Heidelberg.

Read stellar information from a file.

This function opens the specified file and reads the data, returning a list of dictionaries.

Parameters: filename – The name of the file to read. data: A list of dictionaries with keys “name”, “spectral_class”, “apparent_magnitude”, “absolute_magnitude”, and “parallax” containing the data from the file.
Exercises.essentials.file_io.star_catalog.star_catalog.write_stars(filename, data, comment=None)[source]

Write stellar information to a file.

This function opens the specified file and writes the data.

Parameters: filename – The name of the file to read. comment (str) – An optional comment to be written at the start of the file. data: A list of dictionaries with keys “name”, “spectral_class”, “apparent_magnitude”, “absolute_magnitude”, and “parallax” containing the data to be written to the file.