Source code for Exercises.essentials.list_comprehensions.climate_data_collection.climate_data_collection

"""
We want to analyze some world wide climate data from the `National
Climatic Data Center <http://www.ncdc.noaa.gov/data-access>`_, since
they archived the world's largest climate data around the world with
historical data dating back many centuries. To evaluate if their
datasets will be relevant for our analysis, we can download their list
of countries. The file has been downloaded for you and is available as
part of this exercise (the file is called `NCDC_country_list.txt`) and each line contains the country name one can download data for. We would like to analyze it using list, sets, and dictionary comprehensions. In a subsequent exercise, we will use the original complete data file which provides not only the country name but its code to allow collecting and analyzing the data corresponding to it.

In `load_normalize_data`, we'll load the data for you into a large string
containing all the countries.

Question 1
----------

We would like to list all the countries in this list that start with the letter "b" because we are interested in datasets for Brazil. This can be done with a `for` loop as follows ::

    >>> country_list = countries.split("\\n")
    >>> b_countries = []
    >>> for country in coutry_list:
            if country[0] == "b":
               b_countries.append(country)

Re-write this to use a list comprehension instead.  Use the
partial definition `question_one` function  below as a guide.
Your function should take string `countries` as an argument,
turn it into a list, and use a list comprehension to
filter out ll the countries except those that begin with `b`.

Question 2
----------

Several countries are repeated in the result generated by the list
comprehension. This is because there are multiple codes used by NCDC
for a given country when it is particularly large. Cast your list to
another Python standard datastructure that will enforce uniqueness.

Question 3
----------

If we are always going to collect all the country names and then
remove duplicates, we could build a set directly rather than going
through a list. Use a set comprehension (or a generator expression if
you are using an older version of Python) to produce the set of names
that start with"b".

Question 4
----------

Use a dictionary comprehension (or generator expression) to produce a
dictionary whose keys are *all* the countries and whose values are the
number of times they appear in the data file because they have been
sub-divided. Print the content of the dictionary in a nice way, one
country per line.

"""

def load_normalize_data ():
    countries = open("NCDC_country_list.txt", "r").read()
    # Let's normalize the content
    countries = countries.lower()
    return countries


def question_one (countries):
   # your code goes here
   print("Countries that start with 'b':")
   print(b_countries)



def question_two (countries):
    # your code goes here
    pass


def question_three (countries):
    """
    Start with countries string, make it a list,
    narrow down to countries starting with "b",
    enforce uniqueness, all in one step.
    """
    # Your code
    return unique_b_countries



def question_three ():
   # your code goes here
   pass

[docs]def question_four (countries):
    """
    Start with countries string, make it a list,
    produce a dictionary whose keys are countries
    and and whose values are the number of times a country has been
    sub-divided.  Hint:  You may find the count method on lists
    useful::

      >>> list('abracadabra').count('a')
      4
    """
    # Your code here
    pass

# Copyright 2008-2016, Enthought, Inc.  
# Use only permitted under license.  Copying, sharing, redistributing or other unauthorized use strictly prohibited.  
# http://www.enthought.com

if __name__ == '__main__':
    countries = load_normalize_data
    b_countries = question_one(countries)
    unique_b_countries = question_two(b_countries)
    unique_b_countries = question_three(countries)
    country_subdivision_frequencies  = question_four(countries)
    print("Number of times countries have been sub-divided:")
    for key, value in list(country_frequencies.items()):
        print("{key} : {value}".format(key=key, value=value))