Lab Week 7 - Designing a Moving Average
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
Lab Week 7 - Designing a Moving Average
Learning Objectives:
1. Develop algorithms involving loops.
2. Implement a data processing algorithm to address a problem that comes up commonly in the Earth sciences.
3. Work with time-series data.
Background
Prep work/reading to be completed before lab is specified in the in wk07_lab_reading.ipynb file. Do NOT attempt the lab until you have done the pre-reading.
You will use Equations 3 and 4 in the reading inside a for loop to design a running mean.
The dataset for this lab is the times series of carbon dioxide ($CO_2$) measurements taken daily on Mauna Loa, Hawaii, since 1958, that has established clear evidence that (1) atmospheric $CO_2$ has increased over the last century, and (2) that the rate of increase of $CO_2$ is also increasing, i.e. the slope of the $CO_2$ versus time curve is not constant but increasing. We got these data from the ↗ Scripps CO2 Program. In the lab you will develop an algorithm that will smooth the $CO_2$ concentration data with a moving average or running mean.
The file daily_in_situ_co2_mlo.csv has 34 header lines, and contains 6 columns with the data organized as follows (you can ignore the last two columns in this lab):
The units for the CO$_2$ measurement are ppm (parts-per-million).
Variables Checklist
Make sure that you define the following variables in you code. Incorrectly named or misspelled variables will receive zero marks in autograde.
● years , 1D array, dtype=float
● months , 1D array, dtype=float
● days , 1D array, dtype=float
● dates , 1D array, dtype=object (datetime.datetime)
● co2 , 1D array, dtype=float
● co2_1994 , 1D array, dtype=float
● dates_1994 , 1D array, dtype=object (datetime.datetime)
● window_length , int
● x , 1D array, dtype=float
● z , 1D array, dtype=float
Part 1: Parsing Mauna Loa CO2 Daily Data
● In your first code cell, load the dataset from daily_in_situ_co2_mlo.csv using np.genfromtxt with the correct keyword arguments for delimiter and skip_header keywords.
● Parse the CO2 data into separate arrays for each column:
years , 1D array, dtype=float
months , 1D array, dtype=float
days , 1D array, dtype=float
co2 , 1D array, dtype=float
● Convert each year-month-day triplet into datetime.datetime objects as shown in the lab reading, and save the converted dates into a new array called dates .
● Find all observations which occured in 1994, and extract the CO2 data into a new array co2_1994 . Also extract all relevant datetimes for 1994 into a new array dates_1994 .
● Make a new figure with 2 subplots (2 rows, 1 column). Include the appropriate titles and labels.
● Subplot 1: Plot co2 versus dates , and overlay co2_1994 and dates_1994 over top in a different colour. Remember that usually a line plot should have the independent variable on the x-axis and the dependent variable on the y-axis. CO$_2$ depends on time not the other way around so CO$_2$ goes on the y-axis.
● Subplot 2: Plot co2_1994 versus dates_1994
Part 2: Basic Algorithm
To develop the running mean code, we will break the problem down into a series of steps and first use simple data for the variable x in the pre-reading. In a new code cell:
● Define x as in Part C of the pre-reading.
x = np.array([1, 5, 3, 7, 9, 4, 6, 9, 7, 4, 10, 6, 2])
● Calculate the length of x and save the integer value in a new variable named n .
● Define a variable named window_length , setting window_length = 3 .
● Add code directly below the definition of window_length to
● Test that the window length is odd-valued, and if not, increment the window length by one. Add a print statement that alerts the user to the change in window length. After you've tested your code, set window_length = 3 again for the rest of Part 2.
● Create an array named z with shape (n, ) containing all np.nan values. Later, you replace the elements in z with values from the running mean calculation.
● Write a single for loop to do the following, looping n times:
● Use variable ii for the loop iteration counter (index).
● For each iteration of the loop, print the loop index, ii and the value of x at index ii . Let x_i = x[ii] .
● Your code output should produce the following:
ii = 0, x_i = 1
ii = 1, x_i = 5
ii = 2, x_i = 3
...
● Add an if block containing a continue statement to skip the current loop iteration and continue to the next if ii == 0 or ii == n - 1 .
The first and last few points in the array need special consideration which is why we initially set this inner loop to work for $z_1$ to $z_{n-2}$ only. Eventually the loop will need to compute a value for $z_i$ for all elements in $x$, i.e. for ii = 0, 1, ...,len(x)-1 .
● Inside the for loop, make a new variable named window as follows:
● window should be centered on the point $x_i$, and contain the points to the left and right of x_i such that window has length window_length . (i.e. implement equation 3 from the prep-reading in code).
● For each loop iteration, print the windowed data to the screen and confirm your result is correct. e.g:
ii = 1, window = [1,5,3]
ii = 2, window = [5,3,7]
ii = 3, window = [3,7,9]
...
● Add a line of code to calculate the mean of the values window for each iteration.
■ Save the mean of window in a new variable named window_mean .
■ Test your code by printing the value of window_mean to the screen and check that the value of window_mean at jj = 2 and jj = 11 is what you calculated for $z_2$ and $z_{11}$ when you did the lab -pre-reading).
i = 1, window = [1,5,3], window mean = 3
i = 2, window = [5,3,7], window mean = 5
...
● Finally, add a line of code to save the value of window_mean for the ii 'th loop iteration into the ii -th element of array z .
● Double-check that the array z is filled (no longer np.nan values, except at the ends where the continue statement happened), and that the final values in z match those in your print statements.
Part 3: Edge Cases
Now make changes to the same code cell from Part 2 to handle the edge-cases at the beginning and end of the time series.
● Comment out the if block that checks if ii == 0 or ii == n - 1 .
● Try setting window_length = 5 and running your code.
Your code should break because the ends of the array need special treatment. The number of points near the end that are affected depends on window_length . Look at the figure and the equations in the pre-reading to understand why. When you have a window_length of 5 and you try to compute the running mean at $z_1$ and $z_{11}$ what points is the running mean equation trying to find? Do they all actually exist?
● Add some if statements to handle these awkward cases at the start and end of the array.
There are multiple approaches: you can compute the running mean using fewer points for example, or use other choices... there is no "correct" answer, they all have different strengths and weaknesses.
● Try window lengths of 3 and 5 again to make sure your code is working for any window_length at the ends of the array and in the middle section.
When you are satisfied that your running mean code is working properly, comment out any print statements inside your for loop so that your code does not produce an unnecessarily long output and clog up the console.
Part 4: Running Average of CO2 Data:
● In your code cell for Part 2 and Part 3 combined, set the variable x to be equal to the co2_1994 variable containing $CO_2$ measurements for 1994.
● Calculate the 31 day running mean for the co2_1994 variable.
● Make a new figure that includes the original data, co2_1994 , and your running mean, z , as a function of time, dates_1994 , appropriately labeled.
Submission Instructions
You should submit:
● A single Jupyter notebook ( .ipynb ) with:
● A Markdown cell with a descriptive title header (e.g. EOSC 211 Lab Week 7), and containing your name, student number, and a list of collaborators (if applicable).
● Code and Markdown cells as needed to answer the assignment questions.
● A .pdf file of your Jupyter Notebook which includes any output requested (figures, print statements, etc.), but no other output (i.e. no debugging checks etc).
In [ ]:
2025-10-20