Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Data Mining and Data Warehousing

MASY1-GC 3510- 100| Summer 2023 | 7/10/23 – 8/16/23 | 3 Credits

Modality: In-Person

Course Site URL: https://brightspace.nyu.edu

General Course Information

Name/Title: Amit Patel, Adjunct Instructor, He/Him/His

NYU Email: [email protected]

Class Meeting Schedule: 7/10/23 – 8/16/23 | Mondays & Wednesdays / 6:20pm - 9:20pm

Class Location: TBD

Office Hours: Tuesday 7:30PM via Zoom meeting. Please email me at least a day before to schedule the zoom meeting.

Description

In an increasingly competitive information age, data mining and data warehousing are essential in business decision-making. This course teaches students concepts, methods and skills for working with data warehouses and mining data from these warehouses to  optimize competitive business strategy. In this course, students develop analytical          thinking skills required to identify effective data warehousing strategies such as when to use outsource or in-source data services. Students also learn to Extract, Transform and Load data into data warehouses (the ETL process) and use the CRISP approach to        data mining to extract vital information for data warehouses. The course also teaches     students how to secure data and covers the ethical issues associated with the uses of data and data models for business decisions.

Prerequisites

1210 - Quantitative Models for Decision Makers

Learning Outcomes

•    At the conclusion of this course, students will be able to:

•   Translate business requirements into a well-constructed, normalized conceptual and logical data models

•    Apply logical database design and the relational model

•    Apply the CRISP model to conduct successful data mining

•    Establish a successful ETL process to load a data warehouse

•    Write basic SQL statements including some advanced SQL features

•   Employ appropriate data governance principles to assure data quality and security

Communication Methods

Be sure to turn on yourNYU Brightspace notificationsand frequently check the “Announcements” section of the course site. This will be the primary method I use to communicate information critical to your success in the course. To contact me, send me an email. I will respond within 24 hours.

Credit students must use their NYU email to communicate. Non-degree students do not have  NYU  email  addresses.  Brightspace  course  mail  supports  student  privacy  and FERPA guidelines. The instructor will use the NYU email address to communicate with students. All email inquiries will be answered within 24 hours.

Structure | Method | Modality

There are 12 session topics in this course.

Active learning experiences and small group projects are key components of the course. Assignments, papers, and exams will be based on course materials (e.g.,      readings, videos), lectures, and class discussions. Course sessions will be conducted synchronously on NYU Zoom, which you can access from the course site inNYU Brightspace.

This course is in-person and will meet twice a week on Monday and Wednesday, with assignments, announcements and emails being sent through Brightspace. Students are expected to check email and/or Brightspace at least twice a week for announcements concerning   assignments,   class   changes   or   cancellations,   and   other   important information. The course will involve lecture/discussions/forum discussions as well as case studies. Two major papers/projects are required that will both be done on an individual basis.

Expectations

Learning Environment

You play an important role in creating and sustaining an intellectually rigorous and  inclusive classroom culture. Respectful engagement, diverse thinking, and our lived experiences are central to this course and enrich our learning community.

Participation

You are integral to the learning experience in this class. Be prepared to actively contribute to class activities, group discussions, and work outside of class.

Assignments and Deadlines

Homework:

Homework assignments must be submitted on time within 1 week of date assigned (unless otherwise instructed). Late submission will not be accepted altogether at instructor’s discretion. All homework must be submitted to the appropriate assignment folder online.

Group/Team Project:

There will be a group/team class project. The project will be a culmination of written, visual, and proper presentation skills. It will include the culmination of topics, concepts and competencies learned in this class. The group project grade will be based on: Student level of participation in the team project.

Student will be assessed both as an individual, and as part of the overall team

Individual contribution will be assessed by identifying the components of the project student worked on and contributed to the overall project (Example database creation, data preparation and load, etc.)

Group contribution will be assessed on overall project depth of content, write-up, and delivery.

For the group assessment portion, all individuals within the group will receive the same grade.

Fulfilment of all requirements stated for the project defined under final project” on the course web site.

All groups have the same group assignment

All requirements for the group project are defined on the course web site.

Midterm Exam:

There will be a midterm exam. The exam will be an open book, open notes/internet style exam. The exam will test the student's acquisition of topics, concepts and competencies learned in this class up to mid-term.

Final Exam:

There will be a final exam. The exam will be an open book, open notes/internet style exam. The exam will test the student's acquisition of topics, concepts and competencies learned in this class. The final exam will only cover material covered in the second half  of the term.

Course Technology Use

We will utilize multiple technologies to achieve the course goals. I expect you to use       technology in ways that enhance the learning environment for all students. All class sessions require use of Zoom. All class sessions require use of technology (e.g., laptop, computer lab) for learning purposes.

Feedback and Viewing Grades

I will provide timely meaningful feedback on all your work via our course site in NYU Brightspace. You can access your grades on the course site Gradebook.

Attendance

I expect you to attend all class sessions. Attendance will be taken into consideration when determining your final grade. Refer to theSPS Policies and Procedures pagefor additional information about attendance.

Excused absences are granted in cases of documented serious illness, family emergency, religious observance, or civic obligation. In the case of religious observance or civic obligation, this should be reported in advance. Unexcused absences from  sessions may have a negative impact on a student’s final grade. Students are responsible for assignments given during any absence.

Each unexcused absence or being late may result in a student’s grade being lowered by a fraction of a grade. A student who has three unexcused absences may earn a Fail grade.

University Calendar Policy on Religious Holidays:

https://www.nyu.edu/about/policies-guidelines-compliance/policies-and-

guidelines/university-calendar-policy-on-religious-holidays.html

Students who join the course during add/drop are responsible for ensuring that they identify what assignments and preparatory work they have missed and complete and submit those per the syllabus.

Textbooks and Course Materials

Required:

The  Kimball  Group  Reader:  Relentlessly  Practical Tools for  Data Warehousing and Business Intelligence Remastered Collection

Authors - Ralph Kimball, Margy Ross

Publisher Wiley; 2nd edition (February 1, 2016)

ISBN ISBN- 978- 1- 119-21659-9 or, ASIN: B01BEUOY4C

Students can purchase these items through the NYU Bookstore.

We will be using Oracle Data Modeler, MySQL community server database, and MySQL workbench client for assignments and labs in this course. The below      software(s) downloads are free for educational use.

MySQL Community sever (Database):https://dev.mysql.com/downloads/mysql/

MYSQL Workbench (Database client):

https://dev.mysql.com/downloads/workbench/

Oracle SQL Developer Data Modeler

http://www.oracle.com/technetwork/developer-

tools/datamodeler/overview/index.html

Recommended:

Data Mining: Concepts, Models, Methods, and Algorithms, 3rd Edition

Authors - Mehmed Kantardzic

Publisher - Wiley-IEEE Press, 2019

ISBN – 978-1-119-51607-1

Grading | Assessment

Your grade in this course is based on your performance on multiple activities and assignments. Since all graded assignments are related directly to course objectives and learning outcomes, failure to complete any assignment will result in an unsatisfactory course grade. Please carefully read all assignments and follow instructions thoroughly, proof-read your written assignments before submitting them for a grade. Students can have multiple submission towards same assignment. The latest submission will be considered     for the grading. 

DESCRIPTION                                        PERCENTAGE

Class Participation -            14% (Attendance is a prerequisite to participation)

Homework -                16%

Team Project -

Midterm Exam -

Final Exam -

=====================

Total                        100%

See the Grades section of Academic Policiesfor the complete grading policy, including  the letter grade conversion, and the criteria for a grade of incomplete, taking a course on a pass/fail basis, and withdrawing from a course.

Course Outline

Start/End Dates: 7/10/23 – 8/16/23 | Mondays & Wednesdays

Time: 6:20 pm - 9:20 pm

Summer Session Two: 6W2

No Class Date(s): N/A

Special Notes: N/A

Number of Sessions: 12

Session 1 - 07/10/23

Topic Description: Introduction to Data Warehousing

Introduction to Data Warehousing

Relationship of Data Mining and Data Warehousing

What is a Data Warehouse?

Data Warehousing ROI

DSS - Decision Support Systems

Operational vs. Analytical Systems

Evolution of DSS and Data Warehousing

OLTP - Online Transaction Processing

Characteristics of a Data Warehouse

What is Data Mart? Creating a Data Mart

Data Comparison Chart

OLAP - Online Analytical Processing

Assignments: (due next Wednesday)

Reading: Chapter 1 & 2 (The Kimball Group Reader)

HW1: Individual Group Project Proposal

Session 2  07/12/23

Topic description – Planning and Building the Data Warehouse

Planning & Building the Data Warehouse

Sponsorship and Cost Justification

Project Prerequisites

Barriers, Challenges and Risks

Preparing for Implementation

Developing the Data Warehouse

SDLC Methodologies - Waterfall vs. RUP Approach

Planning & Project Management

Analysis

Implementation and Deployment

Operations

Assignments: (due next Monday)

Reading: Chapter 3 & 4 (The Kimball Group Reader)

HW2: Logical Data Model

Group Project: Week 3 – Project Proposal (2%)

Session 3  07/17/23

Topic description Data Warehouse Design

Data Warehouse Design

Drivers for Multi-Dimensional Analysis

Limitations of Relational Models

The Data Cube

What is dimensional modeling?

Advantages of Dimensional Models

Logical and Physical Design

Data Normalization

Benefits and Drawbacks of Data Normalization

De-Normalizing of Data

Characteristics of a Data Warehouse

Assignments: (due next Wednesday)

Reading: Chapter 5 (The Kimball Group Reader)

HW3: Basic SQL

Session 4  07/19/23

Topic description Data Warehouse Schemas

Data Warehouse Schemas

Dimensions and Dimension Tables

Facts and Fact Tables

The Star Schema

The Snowflake Schema

Degenerate and Junk Dimensions

The Data Warehouse Bus Architecture

Conformed Dimensions and Standard Facts

Data Granularity

Changing Dimensions

Assignments: (due next Monday)

Reading: Chapter 6 & 7 (The Kimball Group Reader)

HW4: Enhanced SQL

Group Project: Week 5 – Transactional Database (3%)

Session 5  07/24/23

Topic description Components of a Data Warehouse

Components of a Data Warehouse

Source Systems, Staging Area, Presentation, Access Tools

Building the Data Matrix

The Four Steps Process

Multiple Fact Tables in a single Data Mart

Chain, Heterogeneous, Transaction/Snapshot & Aggregate Facts

Fact and Dimension Table Detail

Identifying Source for each Fact & Dimension

Mapping from Source to Target

Assignments: (due next Wednesday)

Reading: Chapter 8 & 9 (The Kimball Group Reader)

HW5: Physical Data Model

Session 6  07/26/23

Topic description  The ETL Process

The ETL Process

Extracting the Data into the Staging Area

The Challenge of Extracting from Disparate Platforms

Full vs. Incremental Extracts

Detecting Changes to Data

Transforming the Data

Complexity of Data Integration

Dealing with Missing & Dirty Data

Data Transformation Tasks

Loading the Data

Timing and Job Control of Data Loads

Assignments: (due next Monday)

Reading: Chapter 11 (The Kimball Group Reader)

Session 7  07/31/23

Topic description Midterm Exam

Assignments: (due next Wednesday)

Group Project: Week 8 – Data Warehouse & ETL Process (5%)

Session 8  08/02/23

Topic description Introduction to Data Visualization

Introduction to Data Visualization

Tableau Environment

Tableau connection to Data Warehouse

Assignments: (due next Monday)

Reading: Online web research and reading

HW6: Tableau Data Visualization

Session 9  08/07/23

Topic description Introduction to Data Mining

Why Data Mining?

What Is Data Mining?

A Multi-Dimensional View of Data Mining

What Kind of Data Can Be Mined?

What Kinds of Patterns Can Be Mined?

What Technology Are Used?

What Kind of Applications Are Targeted?

Major Issues in Data Mining

Assignments: (due next Wednesday)

Reading: Chapter 1 & 2 (Data Mining: Concepts, Models, Methods, and Algorithms) HW7: Tableau Lobbying

Session 10  08/09/23

Topic description Getting to Know Your Data

Data Objects and Attribute Types

Basic Statistical Descriptions of Data

Data Visualization

Measuring Data Similarity and Dissimilarity

Topic description: Data Preprocessing

Data Preprocessing: An Overview

Data Quality

Major Tasks in Data Preprocessing

Data Cleaning

Data Integration

Data Reduction

Data Transformation and Data Discretization

Assignments: (due next Monday)

Reading: Chapter 3, 4, 5 (Data Mining: Concepts, Models, Methods, and Algorithms)

HW8: Tableau Data Mining

HW9: Tableau Data Mining

Group Project: Week 11 – Report and Visualization (5%)

Session  08/14/23

Topic description – Data Mining Techniques

Data Mining Techniques

Predictive Modeling

Classification, Regression, Similarity Matching, Co-occurrence Grouping

Clustering/Segmentation

Data Mining and Statistics Terminologies

Supervised vs. Unsupervised

Data Mining Statistical Techniques

Clustering, Segmentation and Nearest Neighbor Techniques

Keys to commercial success of Data Mining

Assignments: (due next Wednesday)

Reading: Chapter 6, 9 (Data Mining: Concepts, Models, Methods, and Algorithms) HW10: Tableau Data Mining

Group Project: Week 13 – Final Presentation (15%)

Session  08/16/23

Topic description Final Day

Final Exam 

NOTES:

The syllabus may be modified to better meet the needs of students and to achieve the learning outcomes.

The School of Professional Studies (SPS) and its faculty celebrate and are committed to     inclusion, diversity, belonging, equity, and accessibility (IDBEA), and seek to embody the    IDBEA values. The School of Professional Studies (SPS), its faculty, staff, and students are committed to creating a mutually respectful and safe environment (from theSPS IDBEA      Committee).

New York University School of Professional Studies Policies

1. Policies - You are responsible for reading, understanding, and complying with      University Policies and Guidelines,NYU SPS Policies and Procedures, andStudent Affairs and Reporting.

2. Learning/Academic Accommodations - New York University is committed to providing equal educational opportunity and participation for students who disclose their dis/ability to theMoses Center for Student Accessibility. If you are interested in applying for           academic accommodations, contact theMoses Centeras early as possible in the           semester. If you already receive accommodations through the Moses Center, request    your accommodation letters through theMoses Center Portalas soon as possible          ([email protected]| 212-998-4980).

3. Health and Wellness - To access the University's extensive health and mental health resources, contact theNYU Wellness Exchange. You can call its private hotline (212-   443-9999), available 24 hours a day, seven days a week, to reach out to a professional who can help to address day-to-day challenges as well as other health-related

concerns.

4. Student Support Resources - There are a range of resources at SPS and NYU to support your learning and professional growth. For a complete list of resources and services available to SPS students, visit theNYU SPS Office of Student Affairs site.

5. Religious Observance - As a nonsectarian, inclusive institution, NYU policy permits members of any religious group to absent themselves from classes without penalty    when required for compliance with their religious obligations. Refer to theUniversity   Calendar Policy on Religious Holidaysfor the complete policy.

6. Academic Integrity and Plagiarism - You are expected to be honest and ethical in all  academic work. Moreover, you are expected to demonstrate how what you have           learned incorporates an understanding of the research and expertise of scholars and    other appropriate experts; and thus recognizing others' published work or teachings—   whether that of authors, lecturers, or one's peers— is a required practice in all academic projects.

Plagiarism involves borrowing or using information from other sources without proper and full credit. You are subject to disciplinary actions for the following offenses which include but are not limited to cheating, plagiarism, forgery or unauthorized use of       documents, and false form of identification

Turnitin, an originality detection service in NYU Brightspace, may be used in this course to check your work for plagiarism.

Read more about academic integrity policies at the NYU School of Professional Studies on theAcademic Policies for NYU SPS Studentspage.

7. Use of Third-Party Tools - During this class, you may be required to use non-NYU   apps/platforms/software as a part of course studies, and thus, will be required to agree to the Terms of Use” (TOU) associated with such apps/platforms/software.

These services may require you to create an account but you can use a pseudonym    (which may not identify you to the public community, but which may still identify you by IP address to the company and companies with whom it shares data).

You should carefully read those terms of use regarding the impact on your privacy rights and intellectual property rights. If you have any questions regarding those terms of use  or the impact on the class, you are encouraged to ask the instructor prior to the

add/drop deadline.