闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

STATS 769

TERM TEST - SEMESTER 2, 2019

STATISTICS

Data Science Practice

1. [10 marks]

This question makes use of the R data frame trips, which is shown below.

> trips

type duration distance hour day month year

1 scooter 187 308 18 0 7 2018

2 scooter 822 1828 20 4 7 2018

3 scooter 221 646 23 0 7 2018

4 scooter 299 626 20 0 7 2018

5 scooter 636 2612 11 0 7 2018

6 scooter 283 278 13 1 7 2018

7 scooter 2213 3351 16 0 7 2018

8 bicycle 2276 5601 15 6 7 2018

9 scooter 349 565 19 1 7 2018

10 scooter 758 1373 16 6 7 2018

(a) Write an R function, testError(), to perform the following steps:

(i) Randomly select one row of the data frame trips to act as a test set. The remainder of the data frame (nine rows) will act as a training set.

(ii) Fit a simple linear regression to predict duration from distance using

the training set.

(iii) Use the ﬁtted model to predict duration for the test set.

(iv) Calculate (and return) the squared diﬀerence between the prediction and

the real duration in the test set.

Your function would be used like this:

> testError()

[1] 22399 .27

[7 marks]

(b) Explain what the following R code is doing.

> sqrt(mean(sapply(1:100, function(i) testError())))

[3 marks]

2. [10 marks]

(a) Explain what the following shell code is doing and write down the result

of running the code.

head -1 trips .csv > subset .csv

grep scooter trips .csv >> subset .csv

wc -l subset .csv

The contents of the CSV ﬁle "trips .csv" is shown below.

"type","duration","distance","hour","day","month","year" "scooter",187,308,18,0,7,2018 "scooter",822,1828,20,4,7,2018 "scooter",221,646,23,0,7,2018 "scooter",299,626,20,0,7,2018 "scooter",636,2612,11,0,7,2018 "scooter",283,278,13,1,7,2018 "scooter",2213,3351,16,0,7,2018 "bicycle",2276,5601,15,6,7,2018 "scooter",349,565,19,1,7,2018 "scooter",758,1373,16,6,7,2018

[5 marks] Explain the meaning of the following Makefile. What is the purpose of each line of code?

report .html: report .Rmd

Rscript -e "rmarkdown::render(\"report .Rmd\")"

Describe the result of running the following shell code (assuming that the Makefile shown above is in the current directory and there is also a ﬁle report .Rmd in the current directory).

touch report .Rmd

make

The content of the ﬁle report .Rmd is shown below.

# A report

```{r}

mean(read .csv("trips .csv")$distance)

```

[5 marks]

3. [10 marks]

(a) Explain the meaning of the following XQuery expression. What is the

purpose of each line of code?

{