Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

DATA2x02 - Assignment

Assignment

1 Overview

The ¦ rst assignment is a report based around the survey data we collected in class. The raw data can be downloaded here and the original format of the survey can be found here.

You should write your ¦ ndings in a report style, as if you were analysing a data set for a client. The client is not a statistician, but they are interested in the details of your work.

You will submit an document compiled using Quarto or R Markdown with 1 enabled so that your client can see ALL the code you used. However, your report should not rely on the client     understanding your code - the text should communicate everything that the client needs to know.

To help you think about this - one of your clients might be an analyst who understands the R code and   might want to check the details while the other is a manager who doesn’t have R skillz but still wants to have a good understanding of the data processing choices you’ve made and what you’ve done.

All of the standard writing expectations apply for statistical reports. For help with improving your writing see the Study Skills -

Writing page.

You can work on this in your computer lab in consultation with others but you will need to submit your own report. Your tutor can also provide feedback on your approach.

If you only submit a .qmd or a .rmd ¦le you will get a maximum of 2 out of 10. Your submission to Canvas must be a .html ¦le.


As this is a short release assignment no simple extensions will apply.

The questions outlined in Section 2.1 and Section 2.2 need to be addressed in the report.



You should address these questions in the section of your report.



1. Is this a random sample of DATA2X02 students?

2. What are the potential biases? Which variables are most likely to be subjected to this bias?

3. Which questions needed improvement to generate useful data (e.g. in terms of the way the question was phrased or response validation)?

FYI there are 675 students in DATA2002 and 84 students in DATA2902.




You should address these questions in the section of your report. The report will be more compelling if you can articulate a connection between the questions you select so that the report feels like a coherent body of work (rather than three unrelated tests).

Identify questions you can answer from the data and perform a hypothesis test for each question. The hypotheses should be of the same form as what we have covered in lectures. Give a motivation for  why you selected these questions. Be sure to report the hypothesis testing work§ow, interpret the

results and mention any limitations in the data that may impact your ¦ ndings. You may have mentioned this in general terms in the introduction, but be speci¦c in the results section.

There needs to be some variety in the types of tests you implement:

at least one test from module 1

at least one test from module 2

at least one test needs to be based on a resampling method (e.g. Monte Carlo or permutation test).

Additional requirements for DATA2902 students have been posted to the DATA2902 resources page.





Guide on importing and cleaning the data

Report writing guide



The two guides above provide essential information on how to succeed in this assessment task. You should

read them carefully as you go about writing up your report. You can use and adapt the code from the guide on

importing and cleaning the data in your report, just make sure you reference appropriately (see the bottom of the guide for an example of how to cite it).

See also the DATA2x02 policy on AI use.



The following YAML code can be used to make sure you meet the minimum criteria. The self contained and code folding options are particularly important.


If your ¦le ends in  .qmd you can adapt this:


---

title: "Your title here"

date: "`r Sys.Date()`"

author: "Your SID (don 't put your name, so that we can respect the anonymous

format:

html:

embed-resources: true # Creates a single HTML file as output

code-fold: true # Code folding; allows you to show/hide code chunks

code-tools: true # Includes a menu to download the code file

# code-tools are particularly important if you use inline R to

# improve the reproducibility of your report

table-of-contents: true # (Optional) Creates a table of contents

number-sections: true # (Optional) Puts numbers next to heading/subheadings

---

marking

policy)"

If your ¦le ends in  .rmd you can adapt this:



---

title: "Your title here"

date: "`r Sys.Date()`"

author: "Your SID (don 't put your name, so that we can respect the anonymous marking

output:

html_document:

self_contained: true # Creates a single HTML file as output

code_folding: hide # Code folding; allows you to show/hide code chunks

code_download: true # Includes a menu to download the code file toc: true # (Optional) Creates a table of contents!

toc_float: true # table of contents at the side

number_sections: true # (Optional) Puts numbers next to heading/subheadings

---





You should review and follow the advice in the guides above, but at a minimum, before submitting your report, check the following points:

Your assignment submission needs to be a that you have compiled using R Markdown or Quarto. upload the rmd or qmd ¦le (i.e. the code ¦le).

You use code folding (so we can see your code) and specify that the HTML ¦le is self

contained (otherwise all the formatting and images won’t be sent to Canvas). Using code folding is   also super handy to use as a check if your report is well written - your report should make sense and provide all the relevant information in the text when the code is hidden. Don’t rely on the reader to

understand your code. Also a good idea to enable  code-tools (quarto) or code_download (R Markdown) particularly if you use inline R chunks to enhance reproducibility.

Think about how your report is structured (e.g. introduction, results, conclusion).

Is there su¨cient text explaining what is being presented or are you relying on the reader being able  to understand and interpret the code and R output? Any output that you include needs to be explained in the text of the document.


Is it well presented (e.g. no unnecessary warnings or messages showing up)? If your code chunk generates unnecessary output, you need to suppress it using the chunk options.2

Are your ¦gures and tables discussed in the text? A graph doesn’t speak for itself. See details on how to cross reference ¦gures and tables in Quarto here.

Have you included su¨cient and appropriate ? This includes software, data, and other reference material. Examples of how to cite with Quarto here.3

1. Code folding in Quarto and R Markdown.

2. For details on how you can ¦ne tune the settings in your R Markdown document see the R Markdown Cookbook. This includes formatting, tables, captions and chunk options. Much of this is directly transferable to Quarto

documents too.

3. We are not prescriptive about the citation style that you use, but you should be consistent in whatever style you choose. See the Library website for more details about citations.