Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Data Mining and Data Warehousing

MASY1-GC 3510-200

Fall 2022

Description

In an increasingly competitive information age, data mining and data warehousing are essential in business decision-making. This course teaches students concepts, methods and skills for     working with data warehouses and mining data from these warehouses to optimize competitive business strategy. In this course, students develop analytical thinking skills required to identify  effective data warehousing strategies such as when to use outsource or in-source data             services. Students also learn to Extract, Transform and Load data into data warehouses (the    ETL process) and use the CRISP approach to data mining to extract vital information for data   warehouses. The course also teaches students how to secure data and covers the ethical         issues associated with the uses of data and data models for business decisions.

Prerequisites

1210 - Quantitative Models for Decision Makers

Learning Outcomes

At the conclusion of this course, students will be able to:

1.         Translate business requirements into a well-constructed, normalized conceptual and logical data models

2.          Apply logical database design and the relational model

3.          Apply the CRISP model to conduct successful data mining

4.           Establish a successful ETL process to load a data warehouse

5.          Write basic SQL statements including some advanced SQL features

6.         Employ appropriate data governance principles to assure data quality and security

Communication Methods

Be sure to turn on yourNYU Brightspace notificationsand frequently check the Announcements” section of the course site. This will be the primary method I use to communicate information critical to your success in the course. To contact me, send me an email. I will respond within 24 hours.

Credit students must use their NYU email to communicate. Non-degree students do not have NYU email addresses. Brightspace course mail supports student privacy and FERPA guidelines. The instructor will use the NYU email address to communicate with students. All email inquiries will be answered within 24 hours.

Students have the opportunity to add their pronouns, as well as the pronunciation of their names, into Albert. Students can have this information displayed to faculty in Albert, Brightspace, and other NYU systems. Students can also opt out of having their pronouns viewed by their instructors.

https://www.nyu.edu/students/student-information-and-resources/registration-records-and- graduation/forms-policies-procedures/change-of-student-information/pronouns-and-name- pronunciation.html

Structure | Method | Modality

There are 14 session topics in this course.

Active learning experiences and small group projects are key components of the course.     Assignments, papers, and exams will be based on course materials (e.g., readings, videos), lectures, and class discussions. Course sessions will be conducted synchronously on NYU  Zoom, which you can access from the course site inNYU Brightspace.

This  course  is  Online  (Sy)  and  will  meet  once  a  week  on  Monday,  with  assignments, announcements and emails being sent through Brightspace. Zoom is the remote instruction platform used at NYU. Students are expected to check email and/or Brightspace at least twice a week for announcements concerning assignments, class changes or cancellations, and other important information. The course will involve lecture/discussions/forum discussions as well as case studies. Two major papers/projects are required that will both be done on an individual basis.

Expectations

Learning Environment

You play an important role in creating and sustaining an intellectually rigorous and inclusive classroom culture. Respectful engagement, diverse thinking, and our lived experiences are central to this course and enrich our learning community.

Participation

You are integral to the learning experience in this class. Be prepared to actively contribute to class activities, group discussions, and work outside of class.

Assignments and Deadlines

Homework:

Homework assignments must be submitted on time within 1 week of date assigned (unless       otherwise instructed). Late submission will not be accepted altogether at instructor’s discretion. All homework must be submitted to the appropriate assignment folder online.

Group/Team Project:

There will be a group/team class project. The project will be a culmination of written, visual, and proper presentation skills. It will include the culmination of topics, concepts and competencies   learned in this class. The group project grade will be based on:

Student level of participation in the team project.

Student will be assessed both as an individual, and as part of the overall team

Individual contribution will be assessed by identifying the components of the project student    worked on and contributed to the overall project (Example database creation, data preparation and load, etc.)

Group contribution will be assessed on overall project depth of content, write-up, and delivery. For the group assessment portion, all individuals within the group will receive the same grade. Fulfilment of all requirements stated for the project defined under final project” on the course web site.

All groups have the same group assignment

All requirements for the group project are defined on the course web site.

Midterm Exam:

There will be a midterm exam. The exam will be an open book, open notes/internet style exam. The exam will test the student's acquisition of topics, concepts and competencies learned in this class up to mid-term.

Final Exam:

There will be a final exam. The exam will be an open book, open notes/internet style exam. The exam will test the student's acquisition of topics, concepts and competencies learned in this      class. The final exam will only cover material covered in the second half of the term.

Course Technology Use

We will utilize multiple technologies to achieve the course goals. I expect you to use technology in ways that enhance the learning environment for all students. All class sessions require use of Zoom. All class sessions require use of technology (e.g., laptop, computer lab) for learning        purposes.

IT Service Desk

(212)-998-3333

24 hours a day, 7 days a week Email:AskIT@nyu.edu

Zoom Support

•   NYU Zoom Guide for Students

•    Make sure you are usingNYU Zoomto log-in for class

•    Check theNYU Zoom siteoften for updates. (To update Zoom, you can also open from your desktop and click menu, then Check for Updates.”)

Brightspace Support

•    Log-in to theBrightspaceplatform or visit theStudent Trainingwebsite.

•    Video on how toNavigate the Bright Space Learning Environment

Feedback and Viewing Grades

I will provide timely meaningful feedback on all your work via our course site in NYU Brightspace. You can access your grades on the course site Gradebook.

Attendance

I expect you to attend all class sessions. Attendance will be taken into consideration when determining your final grade. Refer to theSPS Policies and Procedures pagefor additional information about attendance.

Excused absences are granted in cases of documented serious illness, family emergency,      religious observance, or civic obligation. In the case of religious observance or civic obligation, this should be reported in advance. Unexcused absences from sessions may have a negative impact on a student’s final grade. Students are responsible for assignments given during any  absence.

Each unexcused absence or being late may result in a student’s grade being lowered by a fraction of a grade.  A student who has three unexcused absences may earn a Fail grade. University Calendar Policy on Religious Holidays:

https://www.nyu.edu/about/policies-guidelines-compliance/policies-and-guidelines/university- calendar-policy-on-religious-holidays.html

Students who join the course during add/drop are responsible for ensuring that they identify what assignments and preparatory work they have missed and complete and submit those per the syllabus.

Textbooks and Course Materials

Required:

The Kimball Group Reader: Relentlessly Practical Tools for Data Warehousing and Business Intelligence Remastered Collection

Authors - Ralph Kimball, Margy Ross

Publisher Wiley; 2nd edition (February 1, 2016)

ISBN ISBN- 978- 1- 119-21659-9 or, ASIN: B01BEUOY4C

Students can purchase these items through the NYU Bookstore.

We will be using Oracle Data Modeler, MySQL community server database, and MySQL            workbench client for assignments and labs in this course. The below software(s) downloads are free for educational use.

MySQL Community sever (Database):https://dev.mysql.com/downloads/mysql/        MYSQL Workbench (Database client):https://dev.mysql.com/downloads/workbench/

Oracle SQL Developer Data Modelerhttp://www.oracle.com/technetwork/developer- tools/datamodeler/overview/index.html

Recommended:

Data Mining: Concepts, Models, Methods, and Algorithms, 3rd Edition

Authors - Mehmed Kantardzic

Publisher - Wiley-IEEE Press, 2019

ISBN – 978- 1- 119-51607- 1

Grading | Assessment

Your grade in this course is based on your performance on multiple activities and assignments.       Since all graded assignments are related directly to course objectives and learning outcomes,         failure to complete any assignment will result in an unsatisfactory course grade. Please carefully     read all assignments and follow instructions thoroughly, proof-read your written assignments before submitting them for a grade. Students can have multiple submission towards same assignment.      The latest submission will be considered for the grading.

DESCRIPTION

PERCENTAGE

Class Participation -

14% (Attendance is a prerequisite to participation)

Homework -

16%

Team Project -

30% (15% for individual contribution, 15% for group contribution)

Midterm  Exam -

20%

Final Exam -

20%

=====================

Total                           100%

See theGrades section of Academic Policiesfor the complete grading policy, including the letter grade conversion, and the criteria for a grade of incomplete, taking a course on a pass/fail basis,   and withdrawing from a course.

Course Outline

Start/End Dates: 9/12/2022 - 12/12/2022 | Mondays

Time: 8:00 am - 10:35 am ET

No Class Date(s): No class date: Labor Day - Monday, September 5, 2022 & Monday, 10/10/2022 - Legislative Day

Special Notes: Legislative Monday: Classes will meet according to a Monday schedule on Tuesday, October 11, 2022

Session 1 - 09/12/22

Topic Description: Introduction to Data Warehousing

Introduction to Data Warehousing

Relationship of Data Mining and Data Warehousing

What is a Data Warehouse?

Data Warehousing ROI

DSS - Decision Support Systems

Operational vs. Analytical Systems

Evolution of DSS and Data Warehousing

OLTP - Online Transaction Processing

Characteristics of a Data Warehouse

What is a Data Mart? Creating a Data Mart

Data Comparison Chart

OLAP - Online Analytical Processing

Assignments: (due one week from today)

Reading: Chapter 1 & 2 (The Kimball Group Reader)

HW1: Individual Group Project Proposal

Session 2  09/19/22

Topic description  Planning and Building the Data Warehouse

Planning & Building the Data Warehouse

Sponsorship and Cost Justification

Project Prerequisites

Barriers, Challenges and Risks

Preparing for Implementation

Developing the Data Warehouse

SDLC Methodologies - Waterfall vs. RUP Approach

Planning & Project Management

Analysis

Implementation and Deployment

Operations

Assignments: (due one week from today)

Reading: Chapter 3 & 4 (The Kimball Group Reader)

HW2: Logical Data Model

Group Project: Week 3 – Project Proposal (2%)

Session 3  09/26/22

Topic description  Data Warehouse Design

Data Warehouse Design

Drivers for Multi-Dimensional Analysis

Limitations of Relational Models

The Data Cube

What is dimensional modeling?

Advantages of Dimensional Models

Logical and Physical Design

Data Normalization

Benefits and Drawbacks of Data Normalization

De-Normalizing of Data

Characteristics of a Data Warehouse

Assignments: (due one week from today)

Reading: Chapter 5 (The Kimball Group Reader)

HW3: Basic SQL

Session 4  10/03/22

Topic description  Data Warehouse Schemas

Data Warehouse Schemas

Dimensions and Dimension Tables

Facts and Fact Tables

The Star Schema

The Snowflake Schema

Degenerate and Junk Dimensions

The Data Warehouse Bus Architecture

Conformed Dimensions and Standard Facts

Data Granularity

Changing Dimensions

Assignments: (due one week from today)

Reading: Chapter 6 & 7 (The Kimball Group Reader)

HW4: Enhanced SQL

Group Project: Week 5 – Transactional Database (3%)

Session 5  10/11/22

Topic description  Components of a Data Warehouse

Components of a Data Warehouse

Source Systems, Staging Area, Presentation, Access Tools

Building the Data Matrix

The Four Steps Process

Multiple Fact Tables in a single Data Mart

Chain, Heterogeneous, Transaction/Snapshot & Aggregate Facts

Fact and Dimension Table Detail

Identifying Source for each Fact & Dimension

Mapping from Source to Target

Assignments: (due one week from today)

Reading: Chapter 8 & 9 (The Kimball Group Reader)

HW5: Physical Data Model

Session 6  10/17/22

Topic description  The ETL Process

The ETL Process

Extracting the Data into the Staging Area

The Challenge of Extracting from Disparate Platforms

Full vs. Incremental Extracts

Detecting Changes to Data

Transforming the Data

Complexity of Data Integration

Dealing with Missing & Dirty Data

Data Transformation Tasks

Loading the Data

Timing and Job Control of Data Loads

Assignments: (due one week from today)

Reading: Chapter 11 (The Kimball Group Reader)

Session 7  10/24/22

Topic description  Midterm Exam

Assignments: (due one week from today)

Group Project: Week 8 – Data Warehouse & ETL Process (5%)

Session 8  10/31/22

Topic description  Introduction to Data Visualization

Introduction to Data Visualization

Tableau Environment

Tableau connection to Data Warehouse

Assignments: (due one week from today)

Reading: Online web research and reading

HW6: Tableau Data Visualization

Session 9  11/07/22

Topic description  Introduction to Data Mining

Why Data Mining?

What Is Data Mining?

A Multi-Dimensional View of Data Mining

What Kind of Data Can Be Mined?

What Kinds of Patterns Can Be Mined?

What Technology Are Used?

What Kind of Applications Are Targeted?

Major Issues in Data Mining

Assignments: (due one week from today)

Reading: Chapter 1 & 2 (Data Mining: Concepts, Models, Methods, and Algorithms) HW7: Tableau Lobbying

Session 10  11/14/22

Topic description  Getting to Know Your Data

Data Objects and Attribute Types

Basic Statistical Descriptions of Data

Data Visualization

Measuring Data Similarity and Dissimilarity

Assignments: (due one week from today)

Reading: Chapter 3 & 4 (Data Mining: Concepts, Models, Methods, and Algorithms) HW8: Tableau Data Mining

Group Project: Week 11 – Report and Visualization (5%)

Session 11  11/21/22

Topic description: Data Preprocessing

Data Preprocessing: An Overview

Data Quality

Major Tasks in Data Preprocessing

Data Cleaning

Data Integration

Data Reduction

Data Transformation and Data Discretization

Assignments: (due one week from today):

Reading: Chapter 5 (Data Mining: Concepts, Models, Methods, and Algorithms) HW9: Tableau Data Mining

Session 12  11/28/22

Topic description   Data Mining Techniques

Data Mining Techniques

Predictive Modeling

Classification, Regression, Similarity Matching, Co-occurrence Grouping

Clustering/Segmentation

Data Mining and Statistics Terminologies

Supervised vs. Unsupervised

Data Mining Statistical Techniques

Clustering, Segmentation and Nearest Neighbor Techniques

Keys to commercial success of Data Mining

Assignments: (due one week from today)

Reading: Chapter 6, 9 (Data Mining: Concepts, Models, Methods, and Algorithms) HW10: Tableau Data Mining

Group Project: Week 13 – Final Presentation (15%)

Session 13  12/05/22

Topic description  Group Presentations

Group Presentations

Group Data Warehouse Project Due

Session 14  12/12/22

Topic description  Final Day

Final Exam

NOTES:

The syllabus may be modified to better meet the needs of students and to achieve the learning outcomes.

The School of Professional Studies (SPS) and its faculty celebrate and are committed to inclusion, diversity, belonging, equity, and accessibility (IDBEA), and seek to embody the IDBEA values. The School of Professional Studies (SPS), its faculty, staff, and students are committed to creating a    mutually respectful and safe environment (from theSPS IDBEACommittee).

New York University School of Professional Studies Policies

1. Policies - You are responsible for reading, understanding, and complying withUniversity         Policies and Guidelines,NYU SPS Policies and Procedures, andStudent Affairs and Reporting.

2. Learning/Academic Accommodations - New York University is committed to providing equal   educational opportunity and participation for students who disclose their dis/ability to theMoses Center for Student Accessibility. If you are interested in applying for academic accommodations, contact theMoses Centeras early as possible in the semester. If you already receive                  accommodations through the Moses Center, request your accommodation letters through the

Moses Center Portalas soon as possible ([email protected]| 212-998-4980).

3. Health and Wellness - To access the University's extensive health and mental health               resources, contact theNYU Wellness Exchange. You can call its private hotline (212-443-9999), available 24 hours a day, seven days a week, to reach out to a professional who can help to       address day-to-day challenges as well as other health-related concerns.

4. Student Support Resources - There are a range of resources at SPS and NYU to support       your learning and professional growth. For a complete list of resources and services available to SPS students, visit theNYU SPS Office of Student Affairs site.

5. Religious Observance - As a nonsectarian, inclusive institution, NYU policy permits members of any religious group to absent themselves from classes without penalty when required for       compliance with their religious obligations. Refer to theUniversity Calendar Policy on Religious Holidaysfor the complete policy.

6. Academic Integrity and Plagiarism - You are expected to be honest and ethical in all            academic work. Moreover, you are expected to demonstrate how what you have learned         incorporates an understanding of the research and expertise of scholars and other appropriate experts; and thus recognizing others' published work or teachings—whether that of authors,    lecturers, or one's peers— is a required practice in all academic projects.

Plagiarism involves borrowing or using information from other sources without proper and full     credit. You are subject to disciplinary actions for the following offenses which include but are not limited to cheating, plagiarism, forgery or unauthorized use of documents, and false form of        identification

Turnitin, an originality detection service in NYU Brightspace, may be used in this course to check your work for plagiarism.

Read more about academic integrity policies at the NYU School of Professional Studies on the

Academic Policies for NYU SPS Studentspage.

7. Use of Third-Party Tools - During this class, you may be required to use non-NYU

apps/platforms/software as a part of course studies, and thus, will be required to agree to the “Terms of Use” (TOU) associated with such apps/platforms/software.

These services may require you to create an account but you can use a pseudonym (which may not identify you to the public community, but which may still identify you by IP address to the      company and companies with whom it shares data).

You should carefully read those terms of use regarding the impact on your privacy rights and     intellectual property rights. If you have any questions regarding those terms of use or the impact on the class, you are encouraged to ask the instructor prior to the add/drop deadline.