Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

MET AD 699
Data Mining for Business Analytics

Course Description

Enterprises, organizations, and individuals create, collect, and use massive amounts of structured and unstructured data to convert the information into knowledge, to improve the quality and the efficiency of their decision-making process, and to better position themselves to the highly competitive marketplace. Data mining is the process of finding, extracting, visualizing, and reporting useful information and insights from both small and large datasets with the help of sophisticated data analysis methods. The students in this course will study the fundamental principles and techniques of data mining. They will learn how to apply advanced models and software applications for data mining.

Prerequisites: AD571, edX-based pre-analytics laboratory ADR100 (in some cases, AD699 may be taken concurrently with AD571)

Course Content, Learning Objectives and Outcomes

This course enables students to develop experience in the following areas:

1. Theoretical and practical understanding of core data mining concepts, techniques, and business applications

2. Systematic approach to framing and solving business analytics problems with the help of data mining methods and techniques

3. Ability to identify the right data mining tools and techniques for various business analytics problems

4. Hands-on experience in using the most popular business analytics and data mining tools and preparation for applying for job positions where familiarity with those tools is required

Course Norms

Throughout our journey together in AD699, I will ask you to adhere to the following set of course norms:

Assume the best: Some things in this course will be simple; others will be difficult.  Rest assured that nothing is intentionally designed to trick you.  I’m not perfect and I can make mistakes in haste.  If I post a file that says, “Assignment #2 prompt” and it’s actually my grocery shopping list, then please assume that I have made a careless error, and not that it’s part of some elaborate scheme to confuse students.

Monday Blackboard Announcements: Every Monday, I’ll make an announcement in Blackboard that will include a few bullet points that mention the topics that we’ll explore that week, along with upcoming due dates.

We’ll Always Have a Break in the Middle: Our class period is long.  It will never consist solely of me lecturing from the front of the room.   Also, we will always take a break of approximately 10 minutes, typically about an hour after we start.

We’ll Always Start on Time.  Slides will always be posted prior to start time: Class will always begin at exactly the official start time.  Slides for each class will be posted prior to the start, so that you can follow along on your laptop if you wish to.

< 24 hours turnaround on e-mails, homework scores in-between due dates: I will never go more than 24 hours without checking my BU e-mail.  During the semester, I will respond to all e-mails from current students in less than 24 hours.  I will sometimes respond much faster than this, but please allow up to 24 hours.  All submitted homework will be graded with comments prior to the next assignment’s submission deadline.  Homework might be graded in batches of 2 or 3, so if your friend’s homework was graded yesterday but yours still isn’t, there is no need to panic.

Effort matters: I realize that for many students, AD699 requires some steps outside of the “comfort zone.”  Students who maintain a positive attitude and who put forth a strong effort tend to do very well in this course, regardless of what their knowledge level was on Day 1.

Two things that might not be your fault but shouldn’t be showstoppers -- attendance & the book:  Life happens.  Between job interviews, illnesses, family events, etc. you might miss a class.  If you do, you can review the material by checking the slides on Blackboard.  You can seek me out for extra help with assignments.  “I missed that class” is not a valid excuse to simply not complete an assignment.  The same applies to the book.  Owning a copy of the textbook is a course requirement. How you fulfill that requirement is up to you.  However, “I don’t have the book” is a weak excuse for not being able to complete an assignment, especially because the homework assignments are not directly based on the book material.

No AD699 solution will rely on domain expertise:  We will use many different types of datasets in AD699, including material related to sports, finance, entertainment, and other topics.  The datasets are used to illustrate important concepts.  You will not need to possess arcane knowledge about sports statistics, finance, real estate, or any other topic to complete an assignment in this course.

If you see a msitake? Let me know.  Remember rule #1.  Every iteration of AD699 includes material that has never been included previously.

Murphy’s Law Things can always go wrong, and sometimes when you least expect them to.  My computer could decide on a forced Windows update 2 minutes prior to class starting.  The projector bulb in the room could burn out just before class.  Some of the power outlets in the room might not work.  If/when any of these things occur?  We won’t miss a beat -- we can always adjust, adapt, and overcome.

Course Materials

REQUIRED TEXT

Galit Shmueli at al: Data Mining for Business Analytics: Concepts, Techniques, and Applications in R, Wiley 2018, Hardback ISBN: 978-1-118-87936-8; e-Book ISBN: 978-1-118-87933-7.

Wickham, Hadley and Garrett Grolemund: R for Data Science, O’Reilly January 2017.  Free at http://r4ds.had.co.nz/

We will use the R For Data Science text for some in-class coding exercises.  If you prefer to learn from a physical book, I recommend that you buy a copy.  However, the free online version is identical to the paperback version.

SOFTWARE

R, version 4.3.1 (or any other version), RStudio

VIRTUAL LABORATORIES

For directions to get free remote access to our BU MET Virtual Labs, please visit http://metvlab.cloud.com

Grading Structure

Your performance in the course will be graded in the following areas:

Attendance, Participation, and Professionalism

10%

Quizzes (3 total)

50%

Individual Assignments (5 total)

20%

Group Project (Written Submission)

15%

Final Presentation

5%

Additional details for each grading component are provided below:

Assignments: Assignments will be graded based on a combination of accuracy of the analysis and quality of the report, with most of the weight being placed on the student’s ability to properly interpret the results.  More specific information for the format and the contents of the assignments is available on the course Blackboard page.

Quizzes: Each of the three quizzes will consist of 15 questions.  The quizzes will be completed in class, during a 60-minute block of time.  Quizzes will be open-note and open book.

Attendance, Participation, and Professionalism:  This is a 10-point grading component.  More will be said about this in class.  There is very little dispersion in this category.  No student will receive less than an 8 in this category without first being notified by the instructor.

Team Project: The team project will enable students to apply many of the data science tools and techniques covered in the course.  Students will work in teams on this project, which will involve a real-world dataset. More information about this project will be made available on Blackboard.

Final Presentation: Each team will deliver a 15-minute presentation during our last class session.  More specific information for the format and the content related to this presentation can be found in the Project folder on the course Blackboard site.  This folder will be made available several weeks after the start of the semester.

The overall grading distribution for the course will lead to a class average of approximately 3.4.  No student who regularly attends class, completes all assignments, takes all quizzes, and participates in the group project will earn any grade that could jeopardize his/her standing at BU MET.   More will be said about this grading policy during class.

Submission Format

Assignments may be submitted in any format that clearly displays the process that the student used, the answers found, and the interpretation statements for the questions that ask for explanation.  Students may submit assignments using R Markdown, but this is not a requirement.  More will be said about assignment format during class.

Timely Presentation of Materials Due

All work requests from the instructor (quizzes, assignments, contributions in the teamwork, etc.) have due dates. These are the last dates that stated material is due. This means that it is a good idea to set personal targets before then as your personal completion date to avoid difficulties.  Dates are often viewed by students as the date to turn in an assignment. We view assignment due dates as the last date on which to turn in an assignment.  With this caution, please note that we are not inclined to accept late work; if late work should be accepted it will be done only after considerable weighing of rationale, and with penalty.

Academic Integrity

Students are expected to adhere to the highest standards of honesty and integrity for this course.  University policy on academic integrity will be followed to the fullest. Students are encouraged to review the university policy on academic integrity including a detailed listing of activities warranting sanction. Anyone who fails to adhere to these requirements and/or otherwise engages in unethical behavior (including cheating on exams, false representation of self or one’s work efforts, use of unauthorized aids, etc.) will be referred to university administration for further action.  In particular, the university's policy and consequences regarding plagiarism are clearly described in the official Boston University documents and will be enforced without any compromises.

Request for Accommodations

If you have a disability and will be requesting accommodations for this course, please inform the instructor early in the semester.  Advance notice and appropriate documentation are required for accommodations.

Satisfaction of Department-Wide Goals

#

Goals

Category

Compliance

1

Critical and innovative thinking

Substantial

With the help of the assignments and individual exercises, students are expected to learn and choose the appropriate data mining model for problem solving and decision making.

2

International perspective

Some

The examples discussed in some data mining approaches and modules are applicable to both national and international organizations.

3

Communication skills

Substantial

Students are expected to participate in weekly group discussions, which support the development of communication skills.

4

Decision making

Substantial

Quantitative decision making is emphasized throughout the course.

5

Technical tools & techniques

Substantial

The course introduces a variety of tools and techniques including MS Excel based Frontline Analytic Solver Platform and R One.

6

Research skills & scholarship

Substantial

The course asks students to complete several assignments.  In each assignment, students are asked to construct data mining models and apply decision support tools.

7

Professional ethics & standards

Substantial

The importance of professional ethics and standards emphasized throughout the weekly discussions.

8

Creative & effective leaders

Substantial

Understanding data mining and other business analytics models and using them for decision-making is critical for becoming creative and effective leaders


Course Outline

Class Date:

Lectures & Topics

Readings (from Shmueli text)

23JAN

Topic 1: Course Intro; Identify Opportunities & Collect Data; Data Exploration in R

Ch. 1, 2

30JAN

Topic 2: Data Exploration & Analysis; Data Visualization Part I

Ch. 2, 3

06FEB

Topic 3:   Data Visualization, Part II; Simple Linear Regression

Ch. 3,6

13FEB

Topic 4:  Multiple Linear Regression; Model Evaluation

Ch. 5, 6

20FEB

Topic 5:  k-nearest neighbors, measuring distance between records

Ch. 7

27FEB

Topic 6:  Naive Bayes

Ch.8

05MAR

Topic 7:  Classification Trees

Ch. 9

12MAR

No Class – Spring Break

19MAR

Topic 8: Association Rules

Ch. 14

26MAR

Topic 9:  Clustering

Ch.15

02APR

Topic 10: Text Mining

Ch. 20

09APR

Topic 11:  Deep Learning in R

Ch.11

16APR

Topic 12:  Social Network Analytics

Ch. 19

23APR

Topic 13:  Summary, Lessons Learned, Next Steps

30APR

Topic 14:  Semester Presentations & Lessons Learned

Team Project Write-Ups Submitted by 11:59 p.m. on 29APR



Individual Assignment Due Dates:

Assignment #1:  Due by 11:59 p.m. Friday, 23FEB
Assignment #2:  Due by 11:59 p.m. Friday, 01MAR
Assignment #3:  Due by 11:59 p.m. Friday, 22MAR
Assignment #4:  Due by 11:59 p.m., Friday, 05APR
Assignment #5:  Due by 11:59 p.m., Friday, 19APR

Quiz Dates:

Quiz #1:                Tuesday, 20FEB
Quiz #2:                Tuesday, 26MAR
Quiz #3:                Tuesday, 23APR

All quizzes are open-note/open-book.