PLS 202: Introduction to Data Analytics

Spring 2021


Course Description

Data analytics is a growing field that draws on many disciplines including Statistics, Computer Science, and Graphic Design. This course will provide an introduction to modern data analytics, with a focus on practical skills for social science research. The main goal of this course is to intro-duce R, a programming language designed specifically for statistics and data analysis. The course will cover basic programming concepts such as functions and data structures, as well as tools for data analysis and visualization. We will also cover methods for uncovering relationships in data, starting simple with cross-tabulations and correlations, before moving into more advanced topics like regression and linear models that are seen in most published pieces of scientific research.

This course is designed to be fun (at least to people like me!) and interactive. Many times you will likely run in to problems and find yourself googling errors and searching on stackoverflow and other forums for ways to do things. This is good! This is how even experts spend a lot of their time. I always joke that writing code is 80% googling (and really, it’s not a joke).

Inclusion and diversity are core values of Michigan State University and my classroom. As Spar-tans, we are dedicated to respecting people of all backgrounds, beliefs, and identity status. I am committed to creating a safe, supportive, and welcoming environment where all students can pur-sue academic and personal success. We all deserve each other’s respect, support, recognition, and protection.

Maintaining a respectful and inclusive community requires vigilance. We must, therefore, all stand up against derogatory and discriminatory language or actions whenever we see them. It is essential that we all work together to foster an inclusive community where Spartans of all backgrounds can study, work, and thrive.


Required Materials

In order to complete this course, you will need a computer, internet access, and access to D2L. We will be using Desire2Learn (D2L) as our online course management software, which is found at: https://d2l.msu.edu. You will be able to access course lectures, readings, quizzes, assignments, and your grades on D2L.

Any problems you encounter with D2L should be reported to MSU’s Distance Learning Services, which is available 24 hours a day, 7 days a week. Their local number is 517- 355-2345. Their toll-free number is 1-800-500-1554.

In addition to D2L, you need to install R (the language) and RStudio (a program for writing, test-ing, and publishing R code). To download R, go to https://cran.r-project.org/ and click the link for your operating system under the section titled “Download and Install R”. Then follow the instructions to install the version appropriate for your specific system version. To download RStu-dio, go to https://www.rstudio.com/products/rstudio/download/#download and click the link for your operating system. This will download an installer file. Once the installer is downloaded, double click it to run, and then follow the instructions to complete the installation. This will be covered in the first video lecture, as well.

Last, we’ll be using Slack so that you guys have a faster and more real-time way to interact with me and the TA. To join click on the following link: https://join.slack.com/t/pls202summ21sess1/shared_invite/zt-qgeszp7n-ks4cjGL8VTVCIyUL3bfVcA.

There are no required textbooks for this course, but there will generally be 1-2 assigned readings for each module. These are listed when relevant in the syllabus. If additional readings are posted on d2l, I will let you know by email.


Course Set Up

This course is broken down into five modules, each lasting a week. Each module typically includes 1-2 readings, and 2 video lectures. At the end of each module a quiz on D2L and an assignment are due by 11:59pm on the last day of the module. For this summer session, all lectures for a particular module will be posted on Monday by 11:59pm EST and the corresponding module assignment and quiz must be submitted to D2L on the next Monday by 11:59pm EST.

There will be a final assignment due the last week of class, as well. Below is an overview of the topics for the class, and their due dates. See the end of the syllabus for a complete schedule with readings and assignments for each module.


Lectures

Each class will involve me giving a lecture or walkthrough of how to handle some tasks in R, but they are also designed to be interactive. While this is an online course (thanks, covid), lectures are setup in a way for you to follow along. They begin with a slideshow to introduce topics, but will quickly move to a screen recording of me writing code in RStudio. I expect not simply watching, but working in RStudio alongside me, having your own copy of the script by the time the lecture is over. At the end of each module I will ask you to apply the tools and techniques that I have introduced. Learning to work in a statistical computing language like R involves doing more than listening.


Grading

Your final grade will be based on the following:

• Module Quizzes: 20%

• Assignments: 50%

• Final Project: 30%

The grading scale for the course is as follows:

• 4.0 (92%-100%)

• 3.5 (86%-91.9%)

• 3.0 (80%-85.9%)

• 2.5 (75%-79.9%)

• 2.0 (70%-74.9%)

• 1.5 (65%-69.9%)

• 1.0 (60%-64.9%)

• 0.0 (<60%)


Assignments

At the end of each module, a homework assignment will be due. Assignments must be turned in as a PDF source document (I show you how to do this in the second lecture of Module 1 lecture) on D2L. Your homework will be graded on the following criteria: Does the script run without errors? Does the program answer the question(s) given in the assignment? Does the program make correct use of the skills covered in the relevant course material?


Quizzes

Each module has a corresponding D2L quiz. The quiz asks questions based on the lectures from the module and any assigned reading from that same module. Each quiz contains around 10 questions. You are allowed two attempts to take each quiz, and your highest score will be saved.


Final Project

Rather than a final exam, the last week of class will be devoted to working on a final project. This project will ask you to apply what you have learned in class to a real-world application of data analysis. It will look a lot like an extended module assignment (think 1.5 to 2 assignments in one).


Course Policies

Course Communication

Course announcements will often be sent out via the email list provided by the registrar. It is your responsibility to make sure you can readily access any emails that are sent to your MSU email address. Announcements will also be posted in D2L when appropriate.


Instructor Availability

Questions may be communicated to me or the teaching assistant (TA) by email. My email address, as well as the course TA’s are listed at the top of this syllabus. All emails will receive a response within 24 hours (typically sooner). Please also include “PLS 202” in the subject line.


Late Work

Work that is not handed in by 11:59 PM EDT of the due date is late. Late assignments will receive a 10% penalty if submitted less than an hour late, a 20% penalty if 1-4 hours late, a 30% penalty if 4-12 hours late, and 50% penalty if 12-24 hours late; you will receive no credit for anything submitted later than 24 hours past the due date. Given the pace of a summer session class no exceptions will be made for assignment turned in late.


Grade Appeals

People make mistakes. If you think there is an error in your recorded grade, then you should email me as soon as possible.


Learning Needs

Any student who may need an accommodation because of any disability should contact MSU’s Resource Center for Persons with Disabilities (http://rcpd.msu.edu/) at 130 Bessey Hall within the first two weeks to provide me with information for proper accommodation. If you have any questions please feel free to contact them or ask me, all information and documentation will be kept confidential.


Final Caveat

This course hopefully will not deviate from what is written above, but the instructor reserves the right to modify anything within the syllabus as necessary to improve your learning experience. Any changes to the syllabus will be communicated via email and posted in D2L.


Semester Schedule

Find below a detailed overview of the semester, including the topics of lecture, assigned readings, and due dates for each module.


Introduction: May 17-21

Learning Goals

• Understand setup and expectations and of class

• Download and install the R programming language

• Download and install the application RStudio

Readings

• “We are All Social Scientists Now”

• “History and Overview of R” (Chp 2 of Roger Peng’s R Programming for Data Science).

Note: Intro


Module 1: May 24-28

Learning Goals

• Learn the basics of R

• Create objects and vectors

• Create very simple plots

Readings

• Starting out in R (Click here to access Chp 1 of Introduction to R by Paul Harrison)

• Wickham Style Guide

Quiz and Assignment Due May 31 by 11:59pm.


Module 2: May 31-June 4

Learning Goals

• Import data and summarize variables

• Subset variables and datasets

• Create new variables

• Deal with missing values

• Plot variables

Readings

• Data Types (Click here to access Chp 2.4 from Introduction to Data Science by Rafael A. Irizarry)

Quiz and Assignment Due June 7 by 11:59pm.


Module 3: June 7-11

Learning Goals

• Understand measures of central tendency and dispersion

• Difference in means tests and correlations

• Tests for statistical significance

Readings

Click here to access Statistical Significance – Introduction to Psychology

Quiz and Assignment Due June 14 by 11:59pm.


Module 4: June 14-18

Learning Goals

• Rules for effective data visualization

• Using package ggplot2 to make publication-quality graphics

Readings

• Click here to access A Brief Guide to Designing Effective Figures for the Scientific Paper

• Click here to access Do’s and Don’ts for Effective Graphs

Quiz and Assignment Due June 21 by 11:59pm.


Module 5: June 21-25

Learning Goals

• Quickly make multiple calculations using loops

• Create new variables with ifelse statements

• Use the apply function

• Introduction to tidyverse

Readings

Click here to access Intro to dplyr (Chp 3 from A Gradual Introduction to the Tidyverse by Ismay & Laderas)

Quiz and Assignment Due June 28 by 11:59pm.