Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

STATS 769: Advanced Data Science practice

TERM TEST - SEMESTER 2, 2021

INSTRUCTIONS

This assessment is open book, you are permitted to access your course manuals and other

written material including online resources.

calculators are permitted.

Submit your answers on canvas, ideally as a plain text ile or an R Markdown document (but a word document or a pDF ile will also be accepted).

lt is your responsibility to ensure your assessment is successfully submitted on time.

we STRONGLY recommend you download your submitted document from canvas, after submitting it, to verify you have uploaded the correct document.

Attempt ALL questions.

Total marks are 40.

support

lf you have any concerns regarding your Assessment, please call the contact centre for advice, rather than your instructors.

The contact centre can be reached on these numbers:

Auckland: 09 373 7513

Outside Auckland: 0800 61 62 63

lnternational: +64 9 373 7513

For any canvas issues, please use 24/7 help on canvas by chat or phone.

lf any corrections are announced during the assessment, you will be notiied by a canvas Announcement. please ensure your notiications are turned on.

Question lnterpretation:

please note that during the assessment period you cannot contact your instructors for clariication on how to interpret the wording of any speciic questions or to verify that your answer is correct.

lnterpreting wording and making appropriate assumptions is part of what is being assessed. You will need to interpret the question yourself and check your own answers.

lf you believe there is a typo, irst re-read the question to check you have not misunderstood the

question, as it is very common for students to misread questions. lf you still believe there is a typo, please phone the contact centre.

Academic Honesty Declaration:

By completing this assessment, l agree to the following declaration:

l understand the university expects all students to complete coursework with integrity and honesty. l

promise to complete all online assessment with the same academic integrity standards and values. Any identiied form of poor academic practice or academic misconduct will be followed up and may result in disciplinary action.

As a member of the university,s student body, l will complete this assessment in a fair, honest, responsible and trustworthy manner. This means that:

l declare that this assessment is my own work.

l will not seek out any unauthorised help in completing this assessment.

l am aware the university of Auckland may use plagiarism detection tools to check my content. l will not discuss the content of the assessment with anyone else in any form, including, Canvas,

piazza, Facebook, Twitter or any other social media or online platform within the assessment period.

l will not reproduce the content of this assessment anywhere in any form at any time.

l declare that l generated the calculations and data in this assessment independently, using only the tools and resources deined for use in this assessment.

l will not share or distribute any tools or resources l developed for completing this assessment.

NOTES:

lmportant information within the questions below (such as the number of marks for each question and  the speciic tasks that you must perform)are formatted Iike this (in bold, with a light grey background.

You have 1 hour and 15 minutes to complete the test; there is an additional 15 minutes to cater for the online delivery mode.

You are NOT expected to run any code for this test - you do not have all of the data for any questions and you will not be able to cut-and-paste code or data from this page.

Questions

1. 10 marks

This question relates to a Csv ile called es七-da七a .Csv  that contains counts of traic at 15-minute  intervals at diferent locations around New zealand. The irst few lines of the ile are shown below. The ile contains thousands of lines like these.

03-SEP-2016,02010015,L,1,904

30-AUG-2016,01N00988,L,1,143

04-定UL-2016,01N00190,H,1,0

11-AUG-2016,00600122,L,1,82

09-定UL-2016,08500161,H,1,0.5

08-SEP-2016,09400240,H,1,0

This ile contains similar information to the iles that we have been using in labs for this course, but it is NOT exactly the same as any of the iles that we have used so far.

This question also relates to the following shell command:

awk -F,  -e '$3  ==  "H" { prin $1","$2","$5 }'  七es-daa.csv

EXPIain what the  -F ,   part of the shell command is for.

EXPIain what the  $1 " , "$2 " , "$5   part of the shell command is for.

write down the irst few lines of output from this code.

write a singIe sheII Command (based on this shell command)that would output the number of lines that are produced by this shell command.

The output of your code would be a single number.

write a singIe sheII Command (based on this shell command)that would take the output of this  shell command and sort the lines in descending order by the last ield (where a comma is used as the ield separator)AND display just the irst six lines of that result.

2. 10 marks

This question relates to a ile called es七一da七a . son  that contains the same data as es七da七a .Csv , but in a JSON format. The irst few lines of the ile are shown below (there are thousands of lines in the   complete ile).

{

"source":  [

"waka Koahi"

],

"days": [

{

"Dae": "03-sEP-2016",

"sie": "02010015",

"class": "L",

"Direc七ion":  1,

"coun": 904

},

{

"Dae": "30-AUG-2016",

"sie": "01N00988",

"class": "L",

"Direc七ion":  1,

"coun": 143

},

{

"Dae": "04-责UL-2016",

"sie": "01N00190",

"class": "H",

"Direc七ion":  1,

"coun": 0

},

write R Code to read the ile es七一da七a . son   into R and create a data frame called es七国son with ive columns.

your data frame would look like the output below.

head(七es七国son)


Da七e

Si七e

class

Direc七ion

coun

1 03-SEP-2016

02010015

L

1

904.0

2 30-AUG-2016

01N00988

L

1

143.0

3 04-定UL-2016

01N00190

H

1

0.0

4 11-AUG-2016

00600122

L

1

82.0

5 09-定UL-2016

08500161

H

1

0.5

6 08-SEP-2016

09400240

H

1

0.0

This question also relates to a ile called es七一da七a .xml  that contains the same data as

es七一da七a .Csv , but in an XML format. The irst few lines of the ile are shown below (there are thousands of lines in the complete ile.

xml version="1.0" encoding="UTF-8"?>

<七raffic>

<sie id="00200954">

<class>Hclass>

<direcion>2direcion>

5

sie>

<sie id="08800004">

<class>Hclass>

<direcion>2direcion>

0

sie>

<sie id="00200951">

L

<direcion>1direcion>

122

sie>

<sie id="01S20508">

L

<direcion>2direcion>

120

sie>

The following code reads the ile es七一da七a .xml   into R and creates an R object.

library(xml2)

xml <- read xml("七es-daa.xml")

days <- xml一find一all(xml ,  "//day")

esxml <- lapply(days ,

func七ion(x )

{

da七e <-

xml find all(x ,

"@da七e")

sie <-

xml find all(x ,

"sie/@id")

class <- xml find all(x , "si七e/class")

direc七ion <- xml  find all(x ,  "si七e/direc七ion")

coun <- xml find all(x , "sie/coun