ST 114: Project 1 2022
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
ST 114: Project 1
2022
Instructions:
In this project, you will write functions for smoothing time series data. A Python file smoothers.py has been provided for you to complete. Do not change the docstring! You will need to edit the file and submit it to gradescope.
● You may work together, but you must list all collaborators in your smoothers.py file.
● Again, do not change the docstring!
● DO NOT COPY AND PASTE EACH OTHER’S CODE! You’ll learn better if you write your code on your own.
● If you use any resources (online or other), be sure to list them in your smoothers.py file.
Assignment:
1. Complete the function sort data that takes in a tuple data whose first element is a list of x coordinates and whose second element is a list of y coordinates. The function should return a copy of the input tuple sorted in non-decreasing order with respect to the x coordinates of the input tuple.
Below are some example evaluations of the function.
>>> sort data(([5, 1, 7], [1, 2, 3]))
([1, 5, 7], [2, 1, 3])
>>> sort data(([2, 4, 8], [1, 2, 3]))
([2, 4, 8], [1, 2, 3])
>>> sort data(([11, 4, -5], [1, 2, 3]))
([-5, 4, 11], [3, 2, 1])
Hint: Use the numpy function argsort to get the indices that would sort the x coordinates, e.g., the indices that would sort x = [5, 1, 7] in ascending order would be [1, 0, 2] since x[1]
< x[0] < x[2].
2. Complete the function find index that takes in a list x and a number x new and returns the small- est index i such that x[i] < x new < x[i+1]. Assume that i) there are no duplicated values in x. ii) x is sorted in ascending order, and iii) x new is a number between min(x) and max(x), inclusive. Below are some example evaluations of the function when x = [1, 5, 7, 9].
>>> find_index(x, 1)
0
>>> find_index(x, 2)
0
>>> find_index(x, 6)
1
>>> find_index(x, 7)
1
>>> find_index(x, 8)
2
>>> find_index(x, 9)
2
3. Complete the function linear predict that takes in two parameters data and x new.
● data is a tuple of two items. The first item is a list of x coordinates which are numbers, and the second item is a list of y coordinates which are also numbers. Assume that data[0] is sorted in ascending order with no duplicated values.
● x new is a number between min(data[0]) and max(data[0]), inclusive.
● The function returns the linearly interpolated value y new at the point x new:
yi+1 x yi
xi+1 x xi
where xi = data[0][i], yi = data[1][i] and i is the smallest index such that xi < x new < xi+1. For example, suppose data = ([0, 5, 10],[1, 7, -5]) and x new = 2. Then linear predict(data, x new) produces the value y new = 3.4. In the plot below, the blue solid circles are the points (data[0][i], data[1][i]) for i = 0, 1, 2. The red solid circle is the point (x new, y new) = (2, 3.4).
As another example, suppose data = ([0, 5, 10], [1, 7, -5]) and x new = 9. Then linear predict(data, x new) produces the value y new = -2.6. In the corresponding plot, the blue solid circles are the points (data[0][i], data[1][i]) for i = 0, 1, 2. The red solid circle is the point (x new, y new) = (9, -2.6).
4. Complete the function linear interpolate that takes in two parameters data and x new list.
● data is a tuple of two items. The first item is a list of x coordinates which are numbers, and the second is a list of y coordinates which are numbers. Assume that data[0] is sorted in ascending order.
● x new list is a list whose values are numbers betweenmin(data[0]) and max(data[0]) inclusive.
● The function returns a list whose items are the linearly interpolated values at the points in x new list.
For example, suppose data = ([0, 5, 10], [1, 7, -5])and x new list = [2, 9]. Then linear interpolate(data, x new list) should return the list [3.4, -2.6]. The red solid circles are the interpolated data points (2, 3.4) and (9, -2.6).
5. Complete the function kernel predict that takes in three parameters data, x new, and h.
● data is a tuple of two items. The first item is a list of x coordinates which are numbers, and the second is a list of y coordinates which are numbers. Assume that data[0] is sorted in ascending order.
● x new is a number between min(data[0]) and max(data[0]) inclusive.
● h is a positive number.
● The function returns the following predicted value y new at the point x new:
n_1
ynew = 匕 wiyi
i=0
where n = len(data[0]), xi = data[0][i], yi = data[1][i]. The numbers wi are nonnegative weights that add up to 1. Thus, y new is a weighted average of the y i. To get the weights, you will use a Gaussian kernel function Kh(u, v) to compute weights where:
Kh(u, v) = exp ╱ x 、
Here, the weight wi is given by
Kh(xnew, xi)
j=0 Kh(xnew, xj)
For example, suppose data = ([0, 5, 10], [1, 7, -5]), x new = 2, and h = 6. Then kernel predict(data, x new, h) requires computing
K6(2, 0) = exp ╱ x 、 = 0.7165313
K6(2, 5) = exp ╱ x 、 = 0.4723666
K6(2, 10) = exp ╱ x 、 = 0.0048279
to compute the weights
w0 = w1 =
w2 =
K6(2, 0) |
K6(2, 0) + K6(2, 5) + K6(2, 10) K6(2, 5) |
K6(2, 0) + K6(2, 5) + K6(2, 10) K6(2, 10) |
K6(2, 0) + K6(2, 5) + K6(2, 10) |
s 0.6002478
s 0.3957077
s 0.0040444
which are then used to compute a weighted average
ynew = w0 · 1 + w1 · 7 + w2 · x5 s 3.3499799.
Note: This weighted average is called the Nadaraya-Watson estimator.
6. Complete the function kernel smooth that takes in three parameters: data, x new list, and h.
● data is a tuple of two items. The first item is a list of x coordinates which are numbers, and the second is a list of y coordinates which are numbers. Assume that data[0] is sorted in ascending order.
● x new list is a list whose values are numbers betweenmin(data[0]) and max(data[0]) inclusive
● h is a positive number.
● The function returns a list whose items are the kernel predicted values at the points in x new list
For example, suppose data = ([0, 5, 10], [1, 7, -5]), x new list is a list of 30 equally spaced points between 0 and 10, and h = 6. In the plot below, the red solid circles are the kernel predicted values at those 30 equally spaced points using bandwidth parameter h = 6.
7. Complete the function knn predict that takes in three parameters: data, x new, and k.
● data is a tuple of two items. The first item is a list of x coordinates which are numbers, and the second item is a list of y coordinates, which are also numbers. Assume that data[0] is sorted in ascending order with no duplicated values.
● x new is a number between min(data[0]) and max(data[0]) inclusive.
● k is a positive integer.
● The function returns the following predicted value y new at the point x new:
ynew = 匕 yi ,
ieN怎(xnew)
whereyi = data[1][i]. The set Nk(xnew) is the set of indices of the k items in the x coordi- nates data[0] that are closest to x new. Thus, y new is the average of the y values of the k nearest neighbors to x new. The acronym knn stands for k-nearest neighbors.
For example, suppose data = ([0, 5, 10, 15], [1, 7, -5, 11]). Let’s see what Nk(xnew) and ynew are for a few different choices for x new and k.
● If x new = 2 and k = 2 then
N2(2) = [0, 1] and ynew = = 4.
● If x new = 2 and k = 3 then
N3(2) = [0, 1, 2] and ynew = = 1.
● If x new = 8 and k = 2 then
N2(8) = [1, 2] and ynew = = 1.
● If x new = 8 and k = 3 then
N3(8) = [1, 2, 3] and ynew = s 4.33333.
8. Complete the function knn smooth that takes in three parameters: data, x new list, and k.
● data is a tuple of two items. The first item is a list of x coordinates which are numbers, and the second item is a list of y coordinates which are also numbers. Assume that data[0] is sorted in ascending order with no duplicated values.
● x new list is a list whose values are numbers betweenmin(data[0]) and max(data[0]) inclusive.
● k is a positive integer.
● The function returns a list whose items are the knn predicted values at the points in x new list using number of neighbors parameter k.
For example, suppose data = ([0, 5, 10, 15], [1, 7, -5, 11]). In the plot below, the blue solid circles are the points (data[0], data[1]). The red solid circles are the knn predicted values at 30 equally spaced points between 0 and 10 using a neighbor parameter of k = 2.
9. Let x1 , . . . , xn be n numbers stored in the list x. The median of x, denoted median(x), is the “middle value” of the numbers x in the sense that half of the numbers in x are less than the median and the other half of the numbers in x are greater than the median. Let y denote a copy of x sorted in
ascending order. Then
median(x) = ,
y │ x 1┐ y [ 垃(ì) _1]+y [ 垃(ì)]
2
if n is odd.
if n is even.
Complete the function my median with the following specifications:
● It takes in one parameter, x, which is a list of numbers.
● It returns median(x) as a float.
● Do not alter x. Evaluating your function my median(x) should not change the value of the list x.
● Do not use Python’s built-in median function. You will get zero credit on this problem if you do.
10. Complete the function median filter that takes in two parameters, y and W.
● y is a list of y coordinates which are numbers.
● W is a positive integer.
● The function returns a list where the ith element in y is amapped to a median value, as a float, over a sliding window containing y[i] specifically.
,.median(y[i-W], . . . , y[i-1], y[i], y[i+1], . . . , y[i+W]) y[i] 1- .median(y[0],y[1], . . . , y[i-1], y[i], y[i+1], . . . , y[i+W])
..median(y[i-W], . . . , y[i-1], y[i], y[i+1], . . . , y[n-2], y[n-1])
if i - W > 0and i + W < len(y) if i - W < 0and i + W < len(y) if i - W > 0and i + W > len(y)
The basic idea of the median filter is that it replaces the ith point with the median of the points in a window of size 2 ·W around the ith point. (Note: The window has 2 ·W items in the first case above: the window is smaller for the second and third cases.) An example function call on an input list of 7 items is given below.
>>> y = [-1, 6, 7, -2, 0, 8, 13]
>>> z = median_filter(y, 1)
>>> z
[2.5, 6.0, 6.0, 0.0, 0.0, 8.0, 10.5]
Below is the assignments that median filter(y, 1) makes to each slot in z.
z[0] = median([-1, 6])
z[1] = median([-1, 6, 7])
z[2] = median([6, 7, -2])
z[3] = median([7, -2, 0])
z[4] = median([-2, 0, 8])
z[5] = median([0, 8, 13])
z[6] = median([8, 13])
The plot below shows a longer list y of 50 elements (daily Air Quality Index measurements) plotted against index values 0, 1, 2, . . . , 49 as blue solid circles. The plot also shows the items in the list median filter(y, 1) plotted against the index values 0, 1, 2, . . . , 49 as semitransparent brown solid circles. We can see that the median filter “removes” unusual or outlying observations.
The plot below shows the same list y of 50 elements (daily Air Quality Index measurements) plotted against index values 0, 1, 2, . . . , 49 as blue solid circles. The plot also shows the items in the list median filter(y, 6) plotted against the index values 0, 1, 2, . . . , 49 as semitransparent brown solid circles. We can see that the broader median filter smooths the data to a much greater extent.
Note: Air Quality Index data from https://archive.ics.uci.edu/ml/datasets/Air+ Quality
11. In this problem, you’ll put everything together! Complete the function smoother that takes in two parameters data and x new list that the user must specify and five default parameters: method, filter flag, W, h, and k.
● data is a tuple of two items. The first item is a list of x coordinates which are numbers, and the second item is a list of y coordinates which are also numbers. Assume that data[0] has no duplicated values. Do not assume that the tuple data is sorted in non-decreasing order with respect to the x coordinates of the input tuple data[0].
● x new list is a list whose values are numbers betweenmin(data[0]) and max(data[0]) inclusive.
● method is a string that is ’interpolate’ , ’kernel’ , or ’knn’ . The default value of method is ’interpolate’ .
● filter flag is a Boolean with a default value of False.
● W is a positive integer with a default value of 5.
● h is a positive number with a default value of 1.
● k is a positive integer with a default value of 1.
● If method has a value other than ’interpolate’ , ’kernel’ , or ’knn’ , the function prints the string ’Unsupported method.’ and returns None
● If method is ’interpolate’ and filter flag is False, the function returns the list of linearly interpolated values over the points x new list.
● If method is ’interpolate’ and filter flag is True, the function returns the list of linearly interpolated values over the points x new list after applying median filter on the y values in data with window parameter W.
● If method is ’kernel’ and filter flag is False, the function returns the list of kernel smoothed values over the points x new list with bandwidth parameter h.
● If method is ’kernel’ and filter flag is True, the function returns the list of ker- nel smoothed values over the points x new list with bandwidth parameter h after applying median filter on the y values in data with window parameter W.
● If method is ’knn’ and filter flag is False, the function returns the list of knn smoothed values over the points x new list with nearest neighbor parameter k.
● If method is ’knn’ and filter flag is True, the function returns the list of knn smoothed
values over the points x new list with nearest neighbor parameter k after applying median filter on the y values in data with window parameter W.
12. The variable AQI data is a tuple whose first element is a list of x coordinates and whose second el- ement is a list of y coordinates. The y coordinates are the Air Quality Index values (CO concentration per hour) per day in an Italian city. The x coordinates are days, and, in order, they take the values 1-50. Create plots under the specified scenarios below using an x new list of 100 equally spaced points between 1 and 50. For all scenarios, plot the original data points in AQI data as black squares with markers of size 5.
Scenarios:
(a) Without median filtering.
● Linear interpolation (green circles with dashed line)
● A kernel smoother with h = 0.1, 1, and 10. Overlay all three kernel smoothers on the same plot. Use a solid blue line for h = 0.1, solid green line for h = 1, and solid magenta line for h = 10.
● A knn smoother with k = 1, 5, and 15. Overlay all three knn smoothers on the same plot.
Use a solid blue line for k = 1, solid green line for k = 5, and solid magenta line for k = 15. (b) With median filtering (W = 5).
● Linear interpolation (green circles with dashed line)
● A kernel smoother with h = 0.1, 1, and 10. Overlay all three kernel smoothers on the same plot. Use a solid blue line for h = 0.1, solid green line for h = 1, and solid magenta line for h = 10.
● A knn smoother with k = 1, 5, and 15. Overlay all three knn smoothers on the same plot. Use a solid blue line for k = 1, solid green line for k = 5, and solid magenta line for k = 15.
13. Write 5-8 sentences in a block comment at the bottom of your code that reflect on the project you have completed. Here are some questions to consider in your answer:
● What did you learn that was particularly interesting?
● What result did you get that was surprising to you?
● Which of the smoothers do you like the most? the least?
● What are you most proud of yourself for doing in this project?
2022-03-30