STATS 769: Advanced Data Science practice TERM TEST - SEMESTER 2, 2021
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
STATS 769: Advanced Data Science practice
TERM TEST - SEMESTER 2, 2021
INSTRUCTIONS
This assessment is open book, you are permitted to access your course manuals and other
written material including online resources.
calculators are permitted.
Submit your answers on canvas, ideally as a plain text ile or an R Markdown document (but a word document or a pDF ile will also be accepted).
lt is your responsibility to ensure your assessment is successfully submitted on time.
we STRONGLY recommend you download your submitted document from canvas, after submitting it, to verify you have uploaded the correct document.
Attempt ALL questions.
Total marks are 40.
support
lf you have any concerns regarding your Assessment, please call the contact centre for advice, rather than your instructors.
The contact centre can be reached on these numbers:
Auckland: 09 373 7513
Outside Auckland: 0800 61 62 63
lnternational: +64 9 373 7513
For any canvas issues, please use 24/7 help on canvas by chat or phone.
lf any corrections are announced during the assessment, you will be notiied by a canvas Announcement. please ensure your notiications are turned on.
Question lnterpretation:
please note that during the assessment period you cannot contact your instructors for clariication on how to interpret the wording of any speciic questions or to verify that your answer is correct.
lnterpreting wording and making appropriate assumptions is part of what is being assessed. You will need to interpret the question yourself and check your own answers.
lf you believe there is a typo, irst re-read the question to check you have not misunderstood the
question, as it is very common for students to misread questions. lf you still believe there is a typo, please phone the contact centre.
Academic Honesty Declaration:
By completing this assessment, l agree to the following declaration:
l understand the university expects all students to complete coursework with integrity and honesty. l
promise to complete all online assessment with the same academic integrity standards and values. Any identiied form of poor academic practice or academic misconduct will be followed up and may result in disciplinary action.
As a member of the university,s student body, l will complete this assessment in a fair, honest, responsible and trustworthy manner. This means that:
l declare that this assessment is my own work.
l will not seek out any unauthorised help in completing this assessment.
l am aware the university of Auckland may use plagiarism detection tools to check my content. l will not discuss the content of the assessment with anyone else in any form, including, Canvas,
piazza, Facebook, Twitter or any other social media or online platform within the assessment period.
l will not reproduce the content of this assessment anywhere in any form at any time.
l declare that l generated the calculations and data in this assessment independently, using only the tools and resources deined for use in this assessment.
l will not share or distribute any tools or resources l developed for completing this assessment.
NOTES:
lmportant information within the questions below (such as the number of marks for each question and the speciic tasks that you must perform)are formatted Iike this (in bold, with a light grey background).
You have 1 hour and 15 minutes to complete the test; there is an additional 15 minutes to cater for the online delivery mode.
You are NOT expected to run any code for this test - you do not have all of the data for any questions and you will not be able to cut-and-paste code or data from this page.
Questions
1. 10 marks
This question relates to a Csv ile called 七es七-da七a .Csv that contains counts of traic at 15-minute intervals at diferent locations around New zealand. The irst few lines of the ile are shown below. The ile contains thousands of lines like these.
03-SEP-2016,02010015,L,1,904 30-AUG-2016,01N00988,L,1,143 04-定UL-2016,01N00190,H,1,0 11-AUG-2016,00600122,L,1,82 09-定UL-2016,08500161,H,1,0.5 08-SEP-2016,09400240,H,1,0 |
This ile contains similar information to the iles that we have been using in labs for this course, but it is NOT exactly the same as any of the iles that we have used so far.
This question also relates to the following shell command:
awk -F, -e '$3 == "H" { prin七 $1","$2","$5 }' 七es七-da七a.csv |
EXPIain what the -F , part of the shell command is for.
EXPIain what the $1 " , "$2 " , "$5 part of the shell command is for.
write down the irst few lines of output from this code.
。 write a singIe sheII Command (based on this shell command)that would output the number of lines that are produced by this shell command.
The output of your code would be a single number.
write a singIe sheII Command (based on this shell command)that would take the output of this shell command and sort the lines in descending order by the last ield (where a comma is used as the ield separator)AND display just the irst six lines of that result.
2. 10 marks
This question relates to a ile called 七es七一da七a . 国son that contains the same data as 七es七一da七a .Csv , but in a JSON format. The irst few lines of the ile are shown below (there are thousands of lines in the complete ile).
{ "source": [ "waka Ko七ahi" ], "days": [ { "Da七e": "03-sEP-2016", "si七e": "02010015", "class": "L", "Direc七ion": 1, "coun七": 904 }, { "Da七e": "30-AUG-2016", "si七e": "01N00988", "class": "L", "Direc七ion": 1, "coun七": 143 }, { "Da七e": "04-责UL-2016", "si七e": "01N00190", "class": "H", "Direc七ion": 1, "coun七": 0 }, |
write R Code to read the ile 七es七一da七a . 国son into R and create a data frame called 七es七国son with ive columns.
your data frame would look like the output below.
head(七es七国son) |
|
This question also relates to a ile called 七es七一da七a .xml that contains the same data as
七es七一da七a .Csv , but in an XML format. The irst few lines of the ile are shown below (there are thousands of lines in the complete ile).
xml version="1.0" encoding="UTF-8"?> <七raffic>
<si七e id="00200954"> <class>Hclass> <direc七ion>2direc七ion>
si七e> <si七e id="08800004"> <class>Hclass> <direc七ion>2direc七ion>
si七e> <si七e id="00200951">
<direc七ion>1direc七ion>
si七e> <si七e id="01S20508">
<direc七ion>2direc七ion>
si七e> |
The following code reads the ile 七es七一da七a .xml into R and creates an R object.
library(xml2) xml <- read xml("七es七-da七a.xml") 一 days <- xml一find一all(xml , "//day") 七es七xml <- lapply(days ,
class <- xml find all(x , "si七e/class") 一 一 direc七ion <- xml find all(x , "si七e/direc七ion") 一 一 coun七 <- xml find all(x , "si七e/coun七 |
2023-08-22