闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Homework 1 - Estimating Means from Samples

IRE2004

Figure 1: The Billy Bishop Toronto City Airport (YTZ)

Introduction

This assignment applies basic inferential statistics in R. You learned these statistical techniques in IRE1002 or a similar course: computing means and variances of a sample, calculating the conﬁdence interval around a mean estimate, and analyzing the relationship between sample size and conﬁdence intervals. It may be helfpul to refer to your notes or the textbook from that course.

You will submit your work as an R script (a ﬁle ending in .R) on Quercus. A template script ﬁle is provided on Quercus: [studentno]_ [lastname]_ [firstinitial]_hw1 .R Open this ﬁle in RStudio. It will appear in your ‘Source’ pane. Fill in your answers to the exercises in the assigned sections. Then save the ﬁle, using youw own student number and name, and submit that ﬁle as your completed homework.

1 Computing the mean and variance of a sample

The following seven temperature readings were taken from the Billy Bishop Toronto City Airport weather station at 1 P.M. for the ﬁrst seven days of September 2018: 20, 22, 23, 23, 24, 22, and 20 degrees C.

For the following exercises, you can only use the following operators and functions in R.

● the assignment operator to save things to memory: = or <-

● math operators like +, -, *, /, and ˆ

● these functions: c() and sum()

1.1 Calculate the mean by hand.

Use = (or <-) and c()to save the temperature readings in a vector named temps. Then compute the sample mean by hand. (Don’t use R’s built-in mean() function.) The sample mean () is the sum of all observations divided by the count of observations:

αi

1.2 Calculate the sample variance of temps by hand.

The sample variance formula is the sum of squared diﬀerences between each observation and the mean (calculated above), divided by the total number of observations minus 1:

(αi _ ) 2

n _ 1

Step 1. Store the sample mean in a new variable temps_mean

Step 2. Calculate a vector of diﬀerences (αi _ ) and store in temps_diffs

Hint: if you tell R to subtract a number from a vector, it will subtract that number from each element of the vector.

Step 3. Calculate the vector of squared diﬀerences, store in temps_diffs_squared

Hint: if you tell R to square a vector, it will square each element of that vector.

Step 4. Calculate the sum of squared diﬀerences, store in temps_ssd

Step 5. Divide the sum of squared diﬀerences by (n _ 1) to obtain the sample variance estimate. Store in temps_var_byhand

What is the sample variance of the week of temperature observations?

1.3 Calculate sample variance by hand in one line of code.

You can do all ﬁve steps above in just one line of R code. Write one line of R code that calculates the sample

variance using only temps, sum(), and the operators listed above. (Hint: use brackets to control the order of operations.)

Thankfully, we don’t always have to compute these sample statistics by hand. R has built-in functions mean() and var(). You can check your work above using R’s built-in functions.

2 The precision of mean estimates: conﬁdence intervals

How warm was it on average at Billy Bishop Airport in September 2018? One way to answer this question is to draw a random sample of temperature readings from that month and calculate the mean. However, the accuracy (precision) of that estimate depends on the size of your sample. The bigger your sample, the more accurate your mean estimate will be.

Here we encounter a random sample of temperature readings from this month and estimate both the mean temperature and the uncertainty of our estimate. One way to describe that uncertainty is with a conﬁdence interval.

2.1 Read September temperatures sample into R

The ﬁle sample_temps_sep .csv on Quercus contains a random sample of all hourly temperature readings

from the month of September 2018. You can read this data into R by placing the ﬁle in your working directory

and using the read_csv() function, which is part of the package tidyverse. The Tidyverse supplies many of the tools we will use in this course. You should install this package on your system. You only need to install it once. In the future, you can load it into memory using library().

install .packages( !tidyverse !) # You only need to do this once, ever

library(tidyverse) # Do this every new R session

After executing the above, use read_csv() to read the CSV into a variable named sep. Note the underscore (not period) in read_csv().1

2.2 Summarize the temperature data: sample size, mean, and variance

Type sep into the console to see a summary of the data. There are two columns in this dataset:

● sep$datetime - the date and time of the temperature reading

● sep$temp - the temperature reading in Celsius

You can access columns in a dataset using the $ operator, as shown above. Store the temperature variable from the September sample in a new vector: temps_sep. Then compute three things:

● Sample size: How many observations are in the data?

● Sample mean

● Sample variance

2.3 Estimating uncertainty: the conﬁdence interval around your mean estimate

In the previous step, you estimated the mean temperature in September 2018, based on this sample. How precise is that estimate? This problem walks your through a conﬁdence interval calculation using the t distribution.

Step 1. Compute the standard error of the mean estimate.

Above we calculated both the mean () and variance (s2 ) of a sample. The standard error of a mean estimate is given by the following formula:

^s2

. . . where s2 is the sample variance and n is the number of observations in the sample.

Compute the standard error of the mean estimate of sample temperatures in September, store in se_sep. (You can compute square roots in R using the function sqrt().)

Step 2. Combine the standard error and the t distribution to compute the 95% conﬁdence interval

We will generally use the Student t distribution to put conﬁdence intervals around mean estimates. You can

look up the upper and lower bounds of the t distribution using the qt() function.2

qt(0.975 , df = length(temps_sep) - 1 )

## [1] 2 .009575

This shows us how many standard deviations we need to move down the t distribution until there is only 2.5% of probability density remaining to the right. If we set this as the bound in both directions, there is exactly

2.5% on the left + 2.5% on the right = 5% of probability outside our range. The remaining probability between these bounds will be 95%. These bounds deﬁne our 95% conﬁdence interval.

We multiply the standard error of the mean computed in Step 1 (s) by 2.0095752 to obtain the upper and

lower bounds of the 95% conﬁdence interval, deﬁned as distance from the mean. Calculate this and store in error_sep.

The upper bound of the 95% conﬁdence interval is the sample mean plus the distance calculated above. The lower bound is the sample mean minus this. What are the bounds of the 95% conﬁdence interval of our

estimate of the mean temperature in September 2018?

Again, you can check your work by comparing the conﬁdence interval computed above to one of R’s built-in functions: t .test(temps_sep)

3 Increasing sample size to increase precision of mean estimates

One way to generate a more precise estimate of the mean temperature in September 2018 would be to increase the size of our random sample. In this problem, we set a precision goal and draw a new sample that will generate a mean estimate of the desired precision.

3.1 Calculate the needed sample size

What sample size would we need to reduce the conﬁdence interval to 士〇．5 degrees Celsius? Using the equation deﬁning the conﬁdence interval above, work backwards to calculate the sample size n. Assume that:

● As you increase sample size, the sample variance (s2 ) remains the same as you calculated in the previous problem.

● Use the same approximation of the 95% bounds of the t distribution as calculated above using qt().

You will need to do some algebra to rearrange the conﬁdence interval equation to solve ﬁrst for a target standard error s , and then use this to solve for a target sample size (n).

3.2 Draw and analyze a new random sample

Draw a sample of that size from the population of September temperature estimates. The ﬁle

all_temps_sep .csv on Quercus contains all the temperature measurements from the Billy Bishop monitoring station from Sep 2018. Read it using read_csv() as shown below. Then use sample() to draw a random sample of those measurements, replacing ??? below with your target sample size for obtaining the new conﬁdence interval, calculated in the previous step.

all_temps_sep = read_csv ( !all_temps_sep .csv!)

temps_sep_big = sample(all_temps_sep$temp, size = ???)

After drawing your random sample, calculate the sample mean, standard error of the mean, and conﬁdence interval by hand. You can follow similar steps to those in the previous problem. Compare your conﬁdence interval to that generated by t .test() to conﬁrm you did it correctly. (Note that because the code above draws a random sample, your results be diﬀerent vary each time you run your code. Each classmate will also get slightly diﬀerent results.)

3.3 Did you hit your conﬁdence interval target?

The goal was to generate a mean estimate that had a conﬁdence interval of 士〇．5〇 degrees Celsius. Did you achieve this level of precision? If not, please explain the possible reasons here. (There are good reasons you might not hit your goal precision.)

Notes

The notes section contains additional information about the problem set.

Controlling random samples

Computers generally draw samples that are only quasi-random. They use a “seed” number that varies over time (e.g. the date-time) and then feed that seed to an equation that produces an as-good-as-random result.

This allows us to control a random sample to produce the same result every time, which is useful if you don’t

want your sample to change every time you run your code. (For example, working through Problem 3). To ensure that your call to sample() produces the same sample every time, use set .seed():

set .seed([any number here])

myvec = sample( . . .)

Weather data

The weather data in this problem set was obtained from the Billy Bishop weather station using the R package

riem: https://ropensci.github.io/riem/articles/riem_package.html. The following code shows how the data for this problem set was obtained, cleaned, and sampled.

install .packages( !riem !)

library(riem)

library(tidyverse)

weathernet = riem_networks()

View (weathernet) # look for weather networks in Canada

on .stations = riem_stations("CA_ON_ASOS") # Ontario network

View (on .stations) # look for Toronto weather stations

# get Billy Bishop Airport (YTZ) 2018 weather data (obtained 12/22/2018)