关键词 > STA302H1F/1001H1F

STA302H1F/1001H1F Methods of Data Analysis 1

发布时间:2021-09-16

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit


Methods of Data Analysis 1

Department of Statistical Sciences

STA302H1F/1001H1F Fall 2021


COURSE OVERVIEW

How will this course operate? This course will be offered entirely online, with a combination of syn-chronous lectures and asynchronous video lectures. Each week, you should view the asynchronous course materials (posted in a Quercus module) and complete the knowledge check quiz. Then come to the syn-chronous classes which will occur through MS Teams. These sessions will focus on demonstrations of concepts and applications using statistical software, with opportunities for hands-on practice. It is your responsibility to make sure you are available during scheduled lecture times and stay on top of the course material and all relevant deadlines. Note that while the synchronous lecture time is a 2-hour block, we will not usually meet for the full duration but rather the time is available for additional question periods as needed.

Course Description: The course provides a solid introduction to data analysis with a focus on the theory and application of linear regression. Topics to be covered include: initial examination of data, correlation, simple and multiple regression models using least squares, inference for regression parameters for normally distributed errors, confidence and prediction intervals, model diagnostics and remedial measures when the model assumptions are violated, interactions and dummy variables, ANOVA, and model selection and validation. Statistical software will be used throughout and will be required for the completion of various assessments during the term.

Learning Outcomes: By the end of this course, all students should have a solid understanding of both the mathematical theory of linear regression analysis and its application in the form of a data analysis. Students should be prepared to show their understanding of the above through

● application of methods through problem-solving questions;

● description and explanation of concepts relating to the mathematical theory;

● derivation and proof of topics based on linear regression concepts and theory;

● practical application of methods on real data using statistical software, with appropriate justification of use of these methods;

● interpretation of data analysis results in clear and non-technical language

Pre-requisites: Pre-requisites are strictly enforced by the department, not the instructor. If you do not have the equivalent pre-requisites, you will be un-enrolled from the course. Students should have a second year statistics course, such as {STA238, STA248, STA255, or STA261}, a computer science such as {CSC108, CSC120, CSC121, or CSC148} and a mathematics course such as {MAT221(70%), MAT223, or MAT240} or equivalent preparation as determined by the department.


COURSE MATERIALS

Course Content: We have a common Quercus course page for all sections of this course. All lecture slides, recordings and materials will be posted on this Quercus course page. Further, any important an-nouncements will also be posted in Quercus. Please make sure to check it regularly.

Textbook: This course does not strictly follow any particular textbook, but rather merges material from a number of sources. All of the below recommended textbooks are freely available as an electronic copy through the University of Toronto Library. Our primary reference text will be Linear Models in Statistics, 2nd edition by Alvin C. Rencher and G. Bruce Schaalje (Wiley). Other helpful references from which practice problems may be assigned are:

● Applied Regression Modeling, 2nd edition, by Iain Pardoe (Wiley).

● Methods and Applications of Linear Models, 2nd edition, by Ronald R. Hocking (Wiley)

● A Modern Approach to Regression with R, by Simon J. Sheather

● Applied Linear Regression, 3rd edition, by Sanford Weisberg (Wiley).

These are all useful books, but may present the material in a different order or in a different way. They are still good for additional explanation and practice problems.

Statistical Software: We will be using RStudio for performing statistical analyses. R is a free software that can either be downloaded onto your personal computer or used in the cloud. If you choose to work with R on your personal computer, then installation will be a two step process:

1. The base R framework is available for download at http://cran.r-project.org/ for Windows, Mac and Linux operating systems.

2. Next, RStudio is a good integrated development environment to R (makes it simpler to work in R) and can also be downloaded for free at https://www.rstudio.com/products/rstudio/download/.

If you don’t want to download the program or run into problems with installation, you may want to con-sider using RStudio through the JupyterHub for University of Toronto. This will allow you to login with your official UofT credentials and use RStudio without the need for a local installation. More information about using RStudio in JupyterHub will be provided in the first class. R code shown in class will be available on the course page and, along with any additional resources, should be sufficient to complete any assessment involving data analysis.


COURSE COMPONENTS

Lectures: The majority of the core content for this course will be delivered via pre-recorded video lec-tures which will be posted to Quercus for each week’s module (ideally by Friday night the latest). It is the student’s responsibility to watch these videos in a timely fashion. Synchronous classes will occur through Microsoft Teams and will supplement these videos with additional demonstrations and data analyses. Syn-chronous lectures will also be recorded and posted a few days after the class.

Office Hours: Instructor and TAs will hold office hours online through Microsoft Teams. The office hour schedule will be posted on Quercus once finalized. It is recommended that you visit office hours whenever you have a question about the material. It is more important than ever in an online class to have material clarified as quickly as possible. Don’t wait until the last minute to ask your questions!

ED Discussion Board: We will be using the ED-STEM Discussion Board as an online discussion forum, which can be accessed through the Quercus course page. All questions about course material should be posted here or asked during TA/instructor office hours. The instructor and TAs will monitor the board and will help answer questions but students are encouraged to answer posts and help their fellow classmates.


COMMUNICATION

How your instructor will communicate with you: All communication will be made through Quer-cus announcements or during lectures. Please ensure that you check Quercus regularly so you don’t miss anything important.

Where to send content questions: We will be using the ED Discussion board to collect student ques-tions regarding course content, assignments, etc. All questions should be posted here.

When to email the instructor: The instructor will only respond to emails of a private or sensitive nature. If you email the instructor with content related questions, you will be asked to repost your ques-tion on the content board so the answer may benefit all students. Should you need to email the instructor about a sensitive or personal nature, please use your official mail.utoronto.ca email, include your full name and student number in the text. Send all course related emails to [email protected]. Please allow up to 48 hours for a reply. Emails may not be monitored on weekends.

A note on email and discussion board etiquette: Please make sure that you communicate politely and respectfully with all members of the teaching team and your fellow classmates. Written communica-tions can sometimes take a tone other than what was intended (e.g. can come off as dismissive, rude or insulting), so make sure you re-read or read out loud your email/post before sending it to make sure it has the tone you intended. For more tips on respectful communication, see professional communication tipsThe ED discussion board is a teaching and learning tool and therefore should only be used as such. Any posts that detract from the learning goal of the board will be removed to keep the board a safe space.


GRADING SCHEME

Both undergraduate and graduate students will be offered two grading schemes that will be used to calcu-late your final grade. Your final grade for the course will automatically be determined by the higher of the two grading schemes. Undergraduate students will have the grading scheme as outlined below. Graduate students will use the same grading schemes, with the exception that the Term Test will be worth 15% while the Final Written Report (Part 3) will be worth 25%.



MINIMUM PASSING REQUIREMENT

In order for the instructor to be able to reasonably assess the ability of each student with the course material, a minimum amount of work must be submitted to provide enough evidence of proficiency. To this end, students must submit the following assessments in order to be considered for a passing grade in the course: the video project, the term test, and the final project (all parts). As these are summative assessments, if a student fails to submit one or more of these assessments (even if all other assessments have been completed), it will not be possible to gauge the student’s proficiency with the material and will therefore not be able to pass the course.


EVALUATION BREAKDOWN

Quercus Discussion Participation: Bi-weekly group discussions will be conducted through the use of the Quercus discussion board. Every two weeks, a discussion topic will be posted based on content presented in the last few weeks of lectures (see schedule at end of document for exact deadlines). These topics will require students to discuss various applications of the course material and to think about how and why certain methods may be appropriate or not. Understanding the limitations of the statistical tools you use is what differentiates a good statistician from a great one! All students are encouraged to participate in these discussions for their participation grade.

● Topics will be open-ended (there is no one right answer - just join and engage with the discussion) and TAs and instructors will also be involved in these discussions.

● Each discussion will remain open for contribution for two weeks so it’s best not to wait until the last minute to contribute.

● A rubric will be posted explaining how this will be graded.

● The first discussion will open September 17 and will be due on October 1 (see schedule for remaining deadlines).

In-class Group Labs: There will be 5 synchronous class periods during which a small group activity will take place. The activities will focus on getting hands-on practice applying the methods using R and writing up results. You and your group members will work together to perform a small data analysis to answer a question.

● Each lab will need to be turned in on Crowdmark by Thursday at 11:59PM EST the week of the lab (see schedule for exact dates) to receive completion credit. The additional time is in case of tight class schedules or if groups wish to work a little more on the lab.

● Students will need to ensure that the names of all group members who were present during the class are listed on the lab - these will help us ensure everyone gets credit for the work.

● Only 3 out of 5 labs need to be submitted to receive the full 5% in the final grade calculation (although it is encouraged to attempt all labs), however the best of both grading schemes will still be used for grade calculations so there is no penalty for not attending lecture/submitting the labs.

Weekly Quizzes: These quizzes will be available once each weekly module opens, and students can take the quiz at any time up until the Monday at 6PM EST deadline. The quizzes will have a 1-hour time limit, although they should not take this long to complete. They will be multiple choice in nature and focus on the material covered in that week’s module. Therefore you should watch the asynchronous module materials prior to completing the quiz. Only the best 8 out of 10 quiz marks will be used to calculate a student’s final quiz grade. As such, no accommodations will be made for a missed quiz.

Reproducible Writing Exercise: This exercise is to highlight the importance of writing in science, specifically in a way that another independent researcher could reproduce what you have done based solely on a summary of your process. It also provides an opportunity for students to experience the scientific review and editing process. It will take place in three parts:

● Part 1 - Draft: Students will submit a draft summary of a data analysis process that they applied to a dataset, for completion points.

● Part 2 - Peer Feedback: Students will have their draft reviewed by another student (peer) who will attempt to replicate their analysis. The reviewer student will provide comments on what is good and what could be improved with the draft.

● Part 3 - Final Draft: Students will revise their original draft, taking into account the feedback provided by their peer reviewer and submit their final product for grades. They will also rate the feedback provided to them by their reviewer based on helpfulness.

Term Test: The term test will be conducted online during the scheduled synchronous class time (see top of page 1). The test will be 1 hour and 30 minutes long, with an additional 20 minutes available to upload your solutions to Crowdmark. A link with the test questions will be emailed to students at the start time of the test, and all submissions must be received before the deadline to be graded. More details on submission will be communicated closer to the test date. The term test will take place during the scheduled synchronous lecture times on

● Tuesday October 26 from 10:10AM-12PM EST for students enrolled in LEC9901/2999,

● Wednesday October 27 from 6:10-8PM EST for students enrolled in LEC9902 and

● Wednesday October 27 from 2:10-4PM EST for students enrolled in LEC9903.

As per the timetable delivery instructions, students must be available during this time. You will be required to write the term test in the section in which you are enrolled. The test will cover material from Modules 0-5.

Video Project: The purpose of the video project is to develop your data analysis skills which will be useful for the final project and future courses, in addition to your communication and statistical presentation skills. The video project will have a heavy focus on the use of statistical software (R specifically), and will involve applying the methods learned during lecture to a dataset. The format of the project will be:

● In groups of up to 2 students, use the methods taught in lecture to perform a small data analysis and then present your results to a general audience.

● To submit your results, you will be required to prepare a 5 minute presentation that you will need to record (using your computer, phone, etc.).

● You will need to display the results of your project in a logical way using slides or some other visual aid (e.g. PowerPoint, or other) and record you and your partner discussing these results, with a focus on why you chose to do certain things and interpretation of your results for a general (non-statistical) audience.

● Presentations should be submitted on time (i.e. by the deadline). Late submissions will receive a 10% penalty for each day that the project is late, up to 72 hours at which point the project will not be accepted.

● In general, extensions will not be given unless you are experiencing a serious medical or personal emergency. If this is the case, you will be asked to complete a form available on Quercus to be submitted as early as possible (ideally before the deadline, but no later than 3 days after the deadline) to request an extension.

● There is no make-up video project. A missed video project will be given a grade of 0.

Final Project: The final project will be due during the final assessment period (date to be confirmed as soon as possible) and will consist of a data analysis on a novel dataset of your choice. Students will be required to demonstrate their understanding of the methods taught in lecture by developing a reasonable regression model that addresses a valid research question using the techniques taught in class. The students will be responsible for choosing the correct methods to apply and providing appropriate justifications defending their choices. The final project is a scaffolded assessment involving 3 parts:

● Part 1- Research question and dataset selection: Students must find a dataset available online and define a research question that can be answered with this dataset using linear regression. Students will need to explain why their research question is important and how linear regression may be used to answer it. A short exploratory data analysis of the chosen dataset will also be required.

● Part 2 - Analysis Plan Flowchart: Students will be asked to put together a flowchart outlining the steps that they plan to take in their data analysis for the final project on their chosen dataset. This will help in developing a consistent analysis flow and make writing the final report easier.

● Part 3 - Final Project Report: Students will put together a scientific report that outlines the relevance of their proposed research question, the process of their analysis, the results of the performed data analysis, and a discussion of the meaning of the results as well as limitations of the analysis with respect to the statistical tools used/decisions made or the data used.

The final project will be done individually, and must be typed and submitted by the deadline. More detailed instructions will be provided at a later date. In order to pass the course, you must submit all three parts of the final project.


MISSED ASSESSMENT POLICY

If you experience a prolonged absence due to illness or emergency that prevents you from completing a number of assessments, please contact your registrar as soon as possible.

Missed Discussion Board Participation: Participation is open for two weeks at a time and does not require a large time commitment to receive full marks. Therefore, no accommodations will be made for missed participation marks.

Missed In-class Labs: Since only the best 3 out of 5 labs count towards your lab grade and the labs can have a weight of 0% of the final grade under Scheme 2, there will be no accommodations for missed labs.

Missed Weekly Quiz: Students may miss up to 2 weekly quizzes in the term. These will be accommo-dated by having only 8 out of the 10 quizzes count towards the Quiz component of the final grade. No accommodations will be provided for any additional missed quizzes.

Missed Writing Exercise: Due to the scaffolded nature of this exercise, there will be no extensions on Parts 1 or 2 of this exercise. However, if a student is experiencing serious illness or personal emergency, an extension may be granted for Part 3. Submit a request using a form available on Quercus to be submitted as early as possible (ideally before the deadline, but no later than 3 days after the deadline) to request an extension.

Missed Video Project: There are no accommodations for a missed video project. However extensions may be granted to students experiencing serious personal illness or emergency at the discretion of the instructor. In this case, please submit a form available on Quercus to be submitted as early as possible (ideally before the deadline, but no later than 3 days after the deadline) to request an extension. The video project must be submitted as part of the minimum work requirement.

Missed Term Test: If a student is experiencing a serious personal illness or emergency on the date of the test, the student must notify the teaching team using a form available on Quercus no later than one week after the date of the test. A make-up test may then be scheduled at a date and time determined by the instructor. The format of the make-up is at the discretion of the instructor and could be multiple choice or an oral exam. A few notes on missed term tests:

● To meet the minimum work requirement for this course, you must write the make-up if you missed the term test for a valid and documented medical reason, otherwise a meaningful grade cannot be calculated for you to pass the course.

● Since the term test is online and all students receive the test link, if you write the test or look at the test questions, you forfeit your eligibility to write the make-up test. Therefore you must make your decision about whether you are well enough to write the test before the test has begun.

Missed Final Project: The final project (all 3 parts) must be completed in order to meet the minimum work requirement to pass the course, so no accommodations will be made for missing the final assessment. Students will be given ample time to complete the assessment and extensions in general will not be granted.


REGRADE REQUESTS

Regrade requests will be accepted for an assessment worth 5% or higher (i.e. not the participation or weekly quizzes). Regrade requests must provide a justification for where there exists a grading error and/or how the work meets the grading rubric. These justifications must further be backed up with concrete references to the course material. All regrade requests will be accepted through a form available on the Quercus course page and will be accepted no later than one week after the grade for that assessment is released. No regrade requests will be accepted by email. The instructor further reserves the right to re-evaluate the assessment in its entirety (i.e. grades can go up, down, or remain unchanged).