The primary purpose of this assignment is to work with invariants and asserts. Secondary purposes include continuing to work with Python data structures and working on good programming style.

Background

The United States has many national parks that contain a wide diversity of plant and animal life; this assignment aims to quantify this biodiversity. Since a large park is more likely to span different kinds of terrain and contain a wider variety of plants and animals, in this assignment we will consider the density of different kinds of plant and/or animal life for each park, i.e., the diversity divided by the area of the park. To simplify the computation, we will group the various kinds of plants and animals into two broad categories: flora (roughly: plants and things like plants) and fauna (roughly: animals and things like animals).

Expected Behavior

Write a program, in a file biodiversity.py, that behaves as follows.
  1. Use input() (without any arguments) to read in the name of a file pinfo ("park information"). Read the file pinfo and organize the data into a dictionary that maps each national park name to a tuple that contains, as one element, its area (see Data Structures, below, for the requirements for the data structures your program should use).
  2. Use input() (without any arguments) to read in the name of a file sinfo ("species information"). Read the file sinfo and for each line in this file, use the Category field (see Input below) to update the flora/fauna counts for the park referred to in the dictionary from the previous step.
  3. Print out, for each park named in the file pinfo, the number of different kinds of flora and fauna per acre (see Output below).

The mapping from the Category field of the input data to the flora/fauna categories in your program is as follows:

Category in input data Flora/fauna category
Algae
Fungi
Nonvascular Plant
Vascular Plant
Flora
Amphibian
Bird
Crab/Lobster/Shrimp
Fish
Insect
Invertebrate
Mammal
Reptile
Slug/Snail
Spider/Scorpion
Fauna
anything else ignore it

It is often the case that real-world data are imperfect and contain errors and omissions. Applications that process such data have to program defensively to work around such flaws; your program should do the same. In particular:

  • a park that is listed in the park information file may or may not have anything listed for it in the species information file; and
  • a park that is listed in the species information file may or may not be listed in the park information file.

If information about a park is missing in either input file, your program should ignore that park and continue to process the data for the remaining parks,

Data Structures

Your program should keep track of the following information about each national park:
  1. its area;
  2. the number of different species of flora (algae, fungi, plants, etc.); and
  3. the number of different species of fauna (animals, birds, bugs, etc.);

This information should be maintained as a tuple, ( ... ). Since you will be updating the counts for the flora and fauna as you read the species data, you should have some sort of mutable data structure within the tuple.

Input

Both the park information and species information files are CSV files. A line that begins with the character '#' is a comment and should be discarded. The first line of each file is a comment line that gives information about the columns in that file.
  • The park information file has the following format, with each line containing information about a park:
    Park Name State Acres Latitude Longitude

    Of this information, we will use only the Park Name and Acres fields in this assignment. An example of a park information file is given here.

  • The species information file has the following format, with each line containing information about one species at one park:
    Park Name Category Scientific name Common names Occurrence Nativeness Abundance Seasonality Conservation status (empty)

    Of this information, we will only use the Park Name and Category fields in this assignment. An example of a species information file is given here.

Output

Print out information for all of the parks listed in the input park information file, one park per output line, as follows:
  • If the species information file contains data about species for the park, print this out in the following format:
    print("{} -- flora: {:f} per acre; fauna: {:f} per acre".format(park_name, flora_per_acre, fauna_per_acre))
  • If the species information file does not contain any data about the park, print this out in the following format:
    print("{} -- no data available".format(park_name))

The parks may be printed out in any order. However, for each park, keep the name the same as what was read in from the park information file, and use the print format shown above (it is simplest to copy-paste it and then edit the names of the variables to those in your code).

Assertions

Your program should use assert statements to check the following (however, see below for replacements for asserts in situations where asserts are difficult to state).
  • For each method, any pre-conditions on the arguments for that method. The assert should be placed at or very soon after the beginning of the method.
  • For each loop that either (i) computes a value; or (ii) transforms some data, at least one assert that holds in the body of the loop. You can choose what the invariant is, but it must be something that:
    • reflects the computation of the loop; and
    • is not simply a statement of the iteration condition (or some expression whose value follows directly from the iteration condition).

Asserts are not necessary for loops that neither compute values nor transform data, e.g., loops that simply read in data or print out results.

This level of assertion-checking is almost certainly overkill, but we'll do this for a little while in order to get more experience with pre-conditions and loop invariants and to practise working with assert statements.

Try to make your asserts as precise and specific as you can.

Try to make your asserts as precise and specific as you can. This document shows a simple way to check types of Python variables and values.

Replacements for asserts

In some situations, it may be difficult or impossible to write an assert that captures what you want to capture. In such situations, in place of an assert you can write a comment giving the invariant or assumption you want to state. Such a comment should be written as follows:
### INVARIANT: ...your invariant in English and/or Python
or
### ASSUMPTION: ...your assumption in English and/or Python

Programming Requirements

  1. Input files:
    • Read the files yourself. Don't import Python's csv library.
    • Make sure you close the file when you have finished reading it.
    • Each file should not be read more than once.
  2. Make sure that the data structures in your code satisfy the requirements for the assignment (see Data Structures above). In particular;
    • you should use a dictionary that maps each park name to a tuple containing information for that park; and
    • flora and fauna information in the tuple should be maintained in a mutable data structure that is updated appropriately as species information is read and processed.
  3. Your program should be able to handle missing data about parks in either the park information file or the species information file. In other words, it is possible to have a park listed in either of these files but not in the other. If this happens, your program should continue processing normally.
  4. Make sure you use asserts to check function preconditions and at least one invariant within each loop (see Assertions above).

Examples

Several examples of this data analysis are given here.