Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

MS5318: Predictive Analytics with Excel & R

Semester B, 2022-2023

Objectives: This course emphasizes the practical use of methodologies and tools that are often associated with making predictions with data. This course begins with fundamental methods of statistical analysis (e.g. inference, simple regression), then adds both breadth (e.g. logistic regression) and depth (e.g. model selection) to the use of regression to find optimal predictions for business forecasting.  You will learn how to build predictive models with data sets in various structures (e.g. quantitative or categorical response/predictors). You will understand the trade-off between over-predicting versus under-predicting, and the tuning schemes to find more predictive models. You will practice utilizing the learned methods to solve data-based business decision problems (e.g. optimal pricing, fraud detection) through examples and case studies. You will also pick up the skills of using Excel and R to manipulate and visualize data, and generate predictions.  The methods and their application introduced in this course are part of the“tool kit”expected from professionals in business analytics.

Note:  this course does not dwell on details of computation or theoretical derivation – its main focus is on applying predictive methods to analyze data and interpreting data and the results of analysis.  No prior statistical knowledge is required, and you do not need to know Excel or R from the outset.

Main References: There is no textbook that exactly matches the content of this course and the style in which it will be taught. The following are recommended textbooks.

1. Statistics for Business:  Decision Making and Analysis, Second Edition, by Robert Stine and Dean Foster.

2. Business Statistics for Competitive Advantage with Excel 2013, Second Edition, by Cynthia Fraser.

3. Practical Regression and Anova using R (this is an online book), By Julian J. Faraway

4. Data Mining and Business Analytics with R, by Johannes Ledolter.

It is strongly recommended that you utilize lectures and class notes.  Students are expected to take notes during lecture or to get notes from another student in the course.

Software: Excel and R (Required)

1. We will be using Microsoft Excel with the add-in: Analysis Toolpak during the first half of the course. (PC - Excel 2013 or above; Mac - Excel 2016 or above.)

2. During the second half of the course, we will be using R for predictive modeling.

Prerequisites:  This course assumes no prior knowledge of statistics.  We advise that you have taken an introductory-level course in Statistics/Probability/Data Analysis before (e.g. you know what variance is).

Grading:  Homework (25%), In-class Test (20%), Final (30%), Group Project (15%), Classroom Norms (10%)

Homework:

• There will be about 4 short assignments. You may work in groups. Groups can be different for each assignment.

• Each assignment will be hand out in class and/or posted on Canvas and will usually be due two weeks later.

• No late homework will be considered. If you know that your assignment will be late, due to very special circumstances, please contact me in advance. It is possible to receive extensions for valid reasons.

• Assignments allow you to keep current with the course, practice with learned methods, and should help prepare you for exam problems. You are encouraged to seek help from the instructor if you have

questions. You may also work together and help each other.

• Scores for assignments are finalized one week after the graded copies are returned. Thereafter there will be no changes and no re-grading. Do not delay checking your graded homework to the end of the semester.

• Missing assignment will receive a score of zero.

Exams:

• Both of the in-class test and final exam are computer-based.

• Exams will include both qualitative and quantitative questions.

Project:

• There will be one required group project.

• The project will be a data analysis. Your group are responsible for choosing the topic of the project, collecting relevant data, and selecting analysis methods.

• Each group will give a presentation at the end of the semester.

Classroom Norms:

• Attendance is necessary and expected although it is not a formal component of your grade.  This is a fast-paced course. Once you miss a class and fall behind, it is not easy to catch up. For your own benefit, I urge you to attend every class. If you miss a meeting, it is your responsibility to obtain notes from a fellow student. Office hours are not meant for individual tutoring for classes missed.

• Be on time. Late entry or reentry is a disruption to your classmates and the instructor.

• No private conversations in class. Non-class use of laptops, phones and tablets is strictly prohibited.

The general principles of conduct in this class are: Fairness, Respect and Consideration.

Emergencies and Difficulties: Please, do make an effort to communicate with me if you find yourself in serious difficulty, such as family emergencies, health problems, demands from jobs you are holding to finance your studies, etc. It is possible to obtain extensions of deadlines, but your issues should rise to the level of serious difficulty, not just inconvenience. The final decision will be made based on the instructor’s judgment.

Tentative Course Outline:

Lecture      Date                Topic                                                                                           

                                             Part 1 - Predictive Data Analytics with Excel

1               Jan-9              Introduction; Looking at data distributions

2               Jan-16            Looking at data distributions with Excel; Sampling design and sampling distributions

Jan-23            Lunar New Year Break (no classes)

3               Jan-30            Hypothesis testing

4               Feb-06            Analysis of variance; Test of independence

5               Feb-13            Test of goodness of fit; Simple linear regression; Midterm Review

6               Feb-20            Midterm Exam; Project description

Part 2 - Predictive Data Analytics with R

7               Feb-27            R basics; Data manipulation with R

8               Mar-06           Multiple Regression; Multiple Regression Diagnostics

9               Mar-13           Fitting Curves; Categorical predictors

10              Mar-20           Categorical predictors;

11              Mar-27           Two way ANOVA; Variable selection;

12              Apr-03            Logistic regression

13              Apr-10            Group Presentation (holiday; not enough time this year)