Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Methods of Data Analysis I

Preamble

Overview

“Methods”, “Data”, “Analysis”—we consider such loaded words in this course! What is data? What does it mean to do analysis? And what methods? The very core of statistical sciences!

This course develops in students an appreciation for how our world becomes data, what to do in the face of overwhelming options for the analysis of that data, and how to do all this in a way

that provides value to others.

It is concerned with statistical modelling, but also everything that comes before and after

modelling, and in doing so ensures modelling and analysis are placed on a firmer foundation. In assessment, students will conduct end-to-end data science projects using real-world data, enabling them to fully understand potential pitfalls, and build a portfolio.

The focus of the learning will be on:

1.   ac%vely reading and consider relevant literature;

2.   ac%vely using the sta%s%cal programming language R in real-world condi%ons;

3.   gathering, cleaning, and preparing datasets; and

4.   choosing and implemen%ng sta%s%cal models and evalua%ng their es%mates.

Essentially this course provides students with everything that they need to know to be able to do the most exciting thing in the world: use data to tell convincing stories.

FAQ

.     Can I audit this course? Sure, but it is pointless, because the only way to learn this stuff is to do the work.

.     Why is there so much assessment? The only way to learn this stuff is to actually do the work, and students only do the work when they are assessed. It is unfortunate, but there is no way around it.

.     How difficult is the course? Of students that enrol, the median student drops the course. But the mode overall grade at the end of the course is an A+. The course is not difficult,   but the hands-on-projects mean it is a great opporunity for you to do a lot of work. Past  students have said that it set them up for success in grad school applica%ons and job interviews.

.     What is the format of the class? There are rarely lectures because those are not

effec%ve. You should read the relevant chapter and aRempt the quiz before class. During class we will focus on examples, ac%vi%es and discussion. We will also have industry guests discuss their experience.

.     You are asking about X, but you didn’t teach that —what’s up with that? A key skill is

being able to teach yourself what you need. In general, I will probably have directed you to the materials that you should go over, but you’re welcome to ask for more pointers if I’ve not been clear enough.

Learning objec3ves

The purpose of the course is to develop the core skills to do with methods of data analysis that are applicable across academia and industry. By the end of the course, you should be able to:

1.   Engage cri%cally with ideas and readings in data analysis (demonstrated in all papers but also tutorials and quizzes).

2.   Conduct data analysis research in a reproducible and ethical way (demonstrated in all papers).

3.   Clearly communicate what was done, what was found, and why in wri%ng (demonstrated in all papers).

4.   Understand what cons%tutes ethical high-quality data analysis prac%ce, especially

reproducibility and respect for those that underpin our data (demonstrated in all papers and selected quizzes).

5.   Respec_ully iden%fy strengths and weaknesses in the data analysis research conducted by others (demonstrated in quizzes, and the peer review).

6.   Develop the ability to appropriately choose and apply sta%s%cal models to real-world situa%ons (demonstrated in the final paper)

7.   Conduct all aspects of the typical data analysis workflow (demonstrated in all papers).

8.   Reflect effec%vely on your own learning and professional development (demonstrated in some tutorials and quizzes).

Textbook

Telling Stories with Data

Languages

In this course you will use R, Python, Git and GitHub, and a little bit of SQL.

Content

Before class starts you should go through Chapter 1 “Telling stories with data” and Appendix A “R essentials” .

Week 1

. Drinking from a re hose

.     Guest: TBD

Week 2

. Reproducible workows

. Wri%ng research

.     Guest: TBD

Week 3

. Sta%c communica%on

.     Guest: TBD

Week 4

. Farm data

. Gather data

.     Guest: Steven Coyne - “Who Owns This? The Ethics of Copyright”

Week 5

. Hunt data

.     Guest: TBD

Week 6

. Clean and prepare

. Store and share

.     Guest: TBD

Week 7

. Missing data

. Linear models

.     Guest: TBD

Week 8

. Linear models

.     Guest: TBD

Week 9

. Directed Acyclic Graphs

. Generalized linear models

.     Guest: TBD

Week 10

. Generalized linear models

.     Guest: TBD

Week 11

. MRPandPredic%on

.     Guest: TBD

.     We will focus on trying to predict the upcoming US presiden%al elec%on, with a view to

students being able to write a final paper that could be submiRed to thePS: Poli%cal Science & Poli%cs special issue.

Week 12

. Produc%on

.     Guest: TBD

Assessment

Summary

Item

Weight (%)

Due date

Quiz

7

Tuesdays, noon, Weeks 1-12

Only best seven out of t

SQL quiz

1

Tuesday, noon, Week 6

You cannot pass the course if you do not get at least 70 per cent

Personal website

1

Tuesday, noon, Week 9

You cannot pass the course if you do not get at least 70 per

Create a personal website using Quarto and make it live via GitH a minimum, it must include a bio and a CV i

Tutorials

6

Tuesdays, noon, Weeks 1-12

Only best three out of t

Term papers

48

Tuesdays, noon, Weeks 3, 6, 9

Term Paper I: 23 January 2024

Term Paper II: 13 February 2024

Term Paper III: 13 March 2024

You must submit Term Paper I in order to pas

Only best two of three term p

Marking starts, noon, on the Thursday after submission, and yo until then i.e. submissions made by noon, Tuesday, Week 3 ca  until noon, Thursday, Week 3 (this is to allow you to incorporate comments). Please do not make any changes after ma

Term Paper I: Dona

Term Paper II: Ma

Term paper III: Pick one ofMurrumbidgee Paper, Sp or Spo




Item

Weight (%)

Due date

Conduct peer

review of

Term/Final

papers

3

Wednesdays, noon, Weeks 3, 6, 9, 12

Conduct peer review for six other term/final papers, by crea% Issue or Pull Request. Papers will be distributed by a spreadsheet to the Issue/PR to a term paper that does not have four other will only have 24 hou

Final paper

34

Tuesday, noon, Week 12 (3 April 2024)

You must subm

Marking starts, noon, Friday 19 April and you can update upda

i.e. submissions made by noon, Tuesday, Week 12 can be update Friday, 19 April (this is to allow you to incorporate peer review Please do not make any changes after ma

You must submit Term Paper 1. You must submit the Final Paper. You must submit and

get at least 70 per cent on both the SQL quiz and the Personal website.

Beyond that, you have scope to pick an assessment schedule that works for you. I will take your best three of the twelve tutorials for that six per cent, and your best seven of twelve quizzes for   that seven per cent. I take your two best papers from the three term papers for that 48 per cent

(24 per cent for each). You get up to three percentage points for conducting peer review of other student papers, (half a percentage point per review). There is 34 per cent allocated for the Final   Paper.

Additional details:

.     Quiz ques%ons are drawn from those in the Quiz sec%on that follows each chapter

of Telling Stories with Data. Some of them are mul%ple choice, and you should expect to know the mark within a few days of submission. Please do them before coming to class.

.     Tutorial ques%ons are drawn from those in the Tutorial sec%on that follows each chapter of Telling Stories with Data. The general expecta%on (although this differs from week to   week) is about two pages of wriRen content. You should expect to know the mark within a few days of the tutorial.

.     In general term papers require a considerable amount of work, and are due aqer the

material has been covered in quizzes and tutorials (i.e. you would draw on knowledge

tested in the quizzes, and poten%ally material could be re-used from the tutorial

material). In general, they require original work to some extent. Papers are taken from

the Papers appendix of Telling Stories with Data and students have access to the grading rubrics before submission.

.     If you already have a website, please communicate with me about this early in the term so that I can let you know whether it can be used for the purposes of this submission.

.     Rubric for tutorial is:

o  0 - Any typos, major gramma%cal errors, other table stakes issues for this level. Too short.

o  0.25 - Gramma%cal errors, if relevant: tables/graphs not properly labeled, no references, other aspects that affect credibility.

o  0.6 - Makes some interes%ng and relevant points, related to course material

(including required materials), but lacking in terms of structure and story/argument.

o  0.80 - Interes%ng paper that is well-structured, coherent, and credible.

o  1 - As with 0.80, but excep%onal in some way.

.     Only the best two of three term papers counts. This means each is worth 24 per cent.

.     Peer review will occur for Term Paper I, but it is a just an op%onal ‘prac%se’–students are typically not yet familiar enough with the expecta%ons of the course so as to be able to   provide valuable comments (other than no%cing whether R has been cited!).