Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Data Mining and Data Warehousing

MASY1-GC 3510 | 200 | Fall 2023 | 9/6/2023 - 12/13/2023| 3 Credit

Modality: Online (Sy)

Course Site URL:https://brightspace.nyu.edu/

General Course Information

Name/Title: Farid Razzak

NYU Email: [email protected]

Class Meeting Schedule: 9/6/2023 - 12/13/2023 | Wednesday | 07:00pm -- 09:35pm

Class Location: Online(Sy)

Office Hours: By appointment via NYU Zoom Please email farid.razzak@nyu.eduat least 8    hours before the upcoming class to arrange potential office hours. Office hours are subject to availability.

Description

In an increasingly competitive information age, data mining and data warehousing are

essential in business decision-making. This course teaches students concepts, methods and skills for working with data warehouses and mining data from these warehouses to optimize

competitive business strategy. In this course, students develop analytical thinking skills

required to identify effective data warehousing strategies such as when to use outsource or in-

source data services. Students also learn to Extract, Transform and Load data into data

warehouses (the ETL process) and use the CRISP approach to data mining to extract vital   information for data warehouses. The course also teaches students how to secure data and covers the ethical issues associated with the uses of data and data models for business decisions.

Prerequisites

1210 – Quantitative Models

Learning Outcomes

At the conclusion of this course, students will be able to:

•   Translate business requirements into a well-constructed, normalized conceptual and logical data models

•   Apply logical database design and the relational model

•   Apply the CRISP model to conduct successful data mining

•    Establish a successful ETL process to load a data warehouse

•   Write basic SQL statements including some advanced

•   SQL features Employ appropriate data governance principles to assure data quality and security

Communication Methods

Be sure to turn on your NYU Brightspace notificationsand frequently check the

“Announcements” section of the course site. This will be the primary method I use to communicate information critical to your success in the course. To contact me, send me an email. I will respond within 24 hours.    

Credit students must use their NYU email to communicate. Non-degree students do not have NYU email addresses. NYU Classes course-mail supports student privacy and FERPA guidelines. All email inquiries will be responded to within 24 hours. I will respond to you using NYU email.  

Structure | Method | Modality

There are 14 session topics in this course. The session topics are organized into three (3)

areas of study: 1) History, Abstraction and Theory, 2) Learning Principles, and 3) Instructional Design in Practice.

Active learning experiences and small group projects are key components of the course.

Assignments, papers, and exams will be based on course materials (e.g., readings, videos), lectures, and class discussions. Course sessions will be conducted synchronously on NYU   Zoom, which you can access from the course site inNYU Brightspace.     

Expectations

Learning Environment

You play an important role in creating and sustaining an intellectually rigorous and inclusive classroom culture. Respectful engagement, diverse thinking, and our lived experiences are  central to this course, and enrich our learning community.    

Participation

You are integral to the learning experience in this class. Be pre  pared to actively contribute to class activities, group discussions, and work outside of class.   

Assignments and Deadlines

Please submit all assignments to the appropriate section of the course site in NYU  Brightspace. If you require assistance, please contact me BEFORE the due date.  

Course Technology Use

We will utilize multiple technologies to achieve the course goals. I expect you to use technology in ways that enhance the learning environment for all students.   

Feedback and Viewing Grades

I will provide timely meaningful feedback on all your work via our course site in NYU Brightspace. You can access your grades on the course site Gradebook.   

Attendance

I expect you to attend all class sessions. Attendance will betaken into consideration when  determining your final grade. Attendance will be taken randomly and counted towards your grade.  Late arrival, webcam issues/absence or being in a non-learning environment may   constitute as a absence. Refer to the SPS Policies and Procedures pagefor additional

information about attendance.

Textbooks And Course Materials

Required

•     Data Warehouse Essentials (4th edition)

o  Editor – Julio Bolton

o  Publisher – Larsen and Keller Education (June 28, 2019)

o  ISBN-10 : 1641720735

o  ISBN-13 : 978-1641720731

•    PC/Mac Laptop (With Appropriate Administrative Privileges & Technological Capabilities)

•    Instructor may also provide session by session content, which will be posted online.

•   Assigned Harvard Business Review Case Studies (Course packet may be created containing the necessary case studies).

•   Oracle SQL Developer –

http://www.oracle.com/technetwork/developer-tools/sql-

developer/overview/index.html

•   Oracle SQL Developer Data Modeler

http://www.oracle.com/technetwork/developer-

tools/datamodeler/overview/index.html

•    Google Collab Python Notebooks (

 colab.research.google.com

•    PyCharm for Education

 https://www.jetbrains.com/pycharm-edu/

•    puTTy (for windows users)    o  https://www.putty.org/

Recommended (Complimentary Resources)

•     The Data Warehouse Lifecycle Toolkit (3rd Edition) – Available through Amazon

o  Authors – Kimball, Ross, Thornthwaite, Mundy & Becker

o  Publisher – Wiley, ’2013

o  ISBN-13 : 978-1118530801

o  ISBN-10 : 1118530802

•     Introduction to Data Mining (2nd Edition)

o  Authors - Pang-Ning Tan, Michael Steinbach, Vipin Kumar

o  Publisher - TBS, 2016

. ISBN-10 : 0273769227

. ISBN-13 : 978-0273769224

Grading | Assessment

Your grade in this course is based on your performance on multiple activities and assignments. Since all graded assignments are related directly to course objectives and learning outcomes,   failure to complete any assignment will result in an unsatisfactory course grade. All written assignments are to be completed using APA format and must be typed and double-spaced.

Grammar, punctuation, and spelling will be considered in grading. Please carefully proof-read

your written assignments before submitting them for a grade. I will update the grades on the course site each time a grading session has been completed— typically three (3) days

following the completion of an activity.

DESCRIPTION

Assigned Activities

Quizzes & Case Studies

Participation

Midterm Exam

Final Exam

Final Group Project

PERCENTAGE

(total of 4 Lab and 1 Group )

(total of 3, Exam Reviews and/or Case studies) (Attendance is prerequisite to participation)

20%

20%

10%

20%

20%

10% 

TOTAL POSSIBLE                                                                                                          100%  

See the Gradessection of Academic Policiesfor the complete grading policy, including the letter grade conversion, and the criteria for a grade of incomplete, taking a course on a

pass/fail basis, and withdrawing from a course.

Course Outline

Start/End Dates: 9/6/2023 - 12/13/2023 / Wednesdays

Time: 07:00pm -- 09:35pm

No Class Date(s): Wednesday, 11/22/23, Fall Break

Special Notes: N/A

Session 1, 09/06/23

Introduction to Data Warehousing

•    Relationship of Data Mining and Data Warehousing

•   What is a Data Warehouse?

•    Data Warehousing ROI

•    DSS - Decision Support Systems

•   Operational vs. Analytical Systems

•    Evolution of DSS and Data Warehousing

•   OLTP - Online Transaction Processing

•   Characteristics of a Data Warehouse

•   What is a Data Mart? Creating a Data Mart

•    Data Comparison Chart

•   OLAP - Online Analytical Processing

Assignments (due one week from today)

•   Suggested Reading:

•   Chapter 1 (from both Data Warehouse Lifecycle Toolkit, and Building the Data Warehouse).

•   Skim thru glossary (of Data Warehouse Lifecycle Toolkit)

•   Complete Lab Assignment

Session 2, 09/13/23

Data Warehouse Design

•    Data Warehouse Design

•    Drivers for Multi-Dimensional Analysis

•    Limitations of Relational Models

•   The Data Cube

•   What is dimensional modeling?

•   Advantages of Dimensional Models

•    Logical and Physical Design

•    Data Normalization

•    Benefits and Drawbacks of Data Normalization

•    De-Normalizing of Data

•   Characteristics of a Data Warehouse

•   Subject Oriented, Integrated, Time Variant, Non-Volatile

•   The Star Schema

Assignments (due one week from today):

•   Suggested Reading: Data Warehouse Essentials (Preface, Introduction chapters)

•   Complete Lab Assignment

Session 3, 09/20/23

Data Warehouse Schemas

•    Data Warehouse Schemas

•    Dimensions and Dimension Tables

•    Facts and Fact Tables

•   The Star Schema

•   The Snowflake Schema

•    Degenerate and Junk Dimensions

•   The Data Warehouse Bus Architecture

•   Conformed Dimensions and Standard Facts

•    Data Granularity

•   Changing Dimensions

Assignments (due one week from today):

•   Suggested Reading: Data Warehouse Essentials (Dimensional Modeling, STAR Schema Chapters)

•   Complete Lab Assignment

•   Group Membership for Course Project Due

Session 4, 09/27/23

Components of a Data Warehouse

•   Components of a Data Warehouse

•   Source Systems, Staging Area, Presentation, Access Tools

•    Building the Data Matrix

•   The Four Steps Process

•    Multiple Fact Tables in a single Data Mart

•   Chain, Heterogeneous, Transaction/Snapshot & Aggregate Facts

•    Fact and Dimension Table Detail

•    Identifying Source for each Fact & Dimension

•    Mapping from Source to Target

Assignments (due one week from today):

•   Suggested Reading: Data Warehouse Essentials (OLAP cubes, MOLAP, ROLAP Chapters)

•   Complete Lab Assignment:

•   Case Study Review Assignment:

•    Data Warehousing and Multi-dimensional Data Modeling

•   https://store.hbr.org/product/data-warehousing-and-multi-dimensional-data- modelling/A00181

Session 5, 10/04/23

The ETL Process

•   The ETL Process

•    Extracting the Data into the Staging Area

•   The Challenge of Extracting from Disparate Platforms

•   Transforming the Data

•   Complexity of Data Integration

•    Data Transformation Tasks

•    Loading the Data

•   Aggregating Data

•   Goals and Risks of Data Aggregation

•    Deciding What to Aggregate

•    Design Requirement for Aggregates

Assignments (due one week from today):

•   Complete Lab Assignment

•   Suggested Reading: Data Warehouse Essentials (Data Aggregation Chapter)

•   Online Midterm Exam**

Session 6, 10/11/23

Introduction to Data Mining & Techniques

•    Data Mining

•   What is Data Mining Good For?

•   Statistics, Artificial Intelligence & Machine Learning

•    Data Mining Examples and Tools

•   Connection between Data Mining and Data Warehousing

•    Retrospective Reporting vs. Predictive

•    Data Mining Applications

•    Data Mining vs. Statistics vs. OLAP

•    Data Mining Statistical Techniques (Sampling, Regression & Decision Trees)

•   Clustering, Segmentation and Nearest Neighbor Techniques

•    Keys to commercial success of Data Mining

•    Predictive Modeling

•   Clustering/Segmentation

•    Data Mining and Statistics Terminologies

•   Supervised vs. Unsupervised

•   Tree Induction

•    Large Language Models

Assignments (due one week from today):

•   Suggested Reading: Chapter 1,2,3 (Introduction to Data Mining)

•   Complete Lab Assignment

•   Case Study Review Assignment:

•    Data Science and the Art of Persuasion

•   https://store.hbr.org/product/data-science-and-the-art-of-persuasion/R1901K

Session 7, 10/18/23

Data Mining : Regression

•   Simple Linear Regression

•   Assumptions

•    Multivariate- Regression

•    Logistic Regression

•    Impact on Data Analytics and Datawarehouses today

Assignments (due one week from today):

•   Suggested Reading: Appendix D (Introduction to Data Mining)

•   Complete Lab Assignment

•   Group Assignment 1 Due

Session 8, 10/25/23

Data Mining: Classification

•         Naïve Bayes Classifier

•         Hunts Algorithm

•         Decision Trees: Tree Induction

•         How it relates to Data Warehouses and Data Analytics

Assignments (due one week from today):

•         Suggested Reading: Chapter 4,5 (Introduction to Data Mining)

•         Complete Lab Assignment

•         Case Study Review Assignment:

•         Miroglio Fashion (A)

•        https://store.hbr.org/product/miroglio-fashion-a/519053

Session 9, 10/30/23

Data Mining : Clustering

•    Distance Matrix, Proximity Matrix

•    K-Means, K-Medoids

•    DBSCAN

•    How it relates to Data Warehouses and Data Analytics

Assignments (due one week from today):

•   Suggested Reading: Chapter 8 (Introduction to Data Mining)

•   Complete Lab Assignment

Session 10, 11/01/23

Data Mining: Text Mining

•   Term Frequency Matrix

•    Natural Language Processing

•   Stopwords

•   Stemming

•    Document Similarities

•   Topic Modeling •    LDA

•   Text Classification

•   Text Clustering

Assignments:

•   Complete Lab Assignment

Session 11, 11/08/23

Data Mining: Anomaly Detection

•    Model based Approaches : Supervised and Unsupervised

•   Graphical Approaches : Boxplots, Convex Hull Method

•   Statistical Approaches:

•    Distance based Approaches: K-NN

•    Density based Approaches: LOF Approach

•   Clustering Approaches

•    How it relates to Data Warehouses and Data Analytics •

Assignments:

•   Suggested Reading: Chapter 10 (Introduction to Data Mining)

•   Complete Lab Assignment

•   Case Study Review Assignment:

•    Data Science at Target

•   https://store.hbr.org/product/data-science-at-target/118016

Session 12, 11/15/23

Data Warehousing and Data Mining: Current Trends and Special Topics:

•   SaaS

•   ChatGPT (LLMs)

•   Snowkflake (Cloud based Data Warehouses)

Session 13, 11/28/23

Special Topic or Guest Speaker.

Session 14, 12/06/23

•   Online Final Exam

         •    Group Course Project Due and Presented

 

NOTES:

The syllabus may be modified to better meet the needs of students and to achieve the learning outcomes.

The School of Professional Studies (SPS) and its faculty celebrate and are committed to

inclusion, diversity, belonging, equity, and accessibility (IDBEA), and seek to embody the

IDBEA values. The School of Professional Studies (SPS), its faculty, staff, and students are committed to creating a mutually respectful and safe environment (from theSPS IDBEA

Committee).

New York University School of Professional Studies Policies

1. Policies - You are responsible for reading, understanding, and complying withUniversity Policies and Guidelines,NYU SPS Policies and Procedures, andStudent Affairs and Reporting.

2. Learning/Academic Accommodations - New York University is committed to providing equal educational opportunity and participation for students who disclose their dis/ability to the Moses Center for Student        Accessibility. If you are interested in applying for academic accommodations, contact theMoses Centeras early as possible in the semester. If you already receive accommodations through the Moses Center,

request your accommodation letters through theMoses Center Portalas soon as possible

([email protected]| 212-998-4980).

3. Health and Wellness - To access the University's extensive health and mental health resources, contact

theNYU Wellness Exchange. You can call its private hotline (212-443-9999), available 24 hours a day,

seven days a week, to reach out to a professional who can help to address day-to-day challenges as well as other health-related concerns.

4. Student Support Resources - There are a range of resources at SPS and NYU to support your learning and professional growth. For a complete list of resources and services available to SPS students, visit the NYU SPS Office of Student Affairs site.

5. Religious Observance - As a nonsectarian, inclusive institution, NYU policy permits members of any

religious group to absent themselves from classes without penalty when required for compliance with their religious obligations. Refer to the University Calendar Policy on Religious Holidaysfor the complete policy.

6. Academic Integrity and Plagiarism - You are expected to be honest and ethical in all academic work.

Moreover, you are expected to demonstrate how what you have learned incorporates an understanding of

the research and expertise of scholars and other appropriate experts; and thus recognizing others' published work or teachings—whether that of authors, lecturers, or one's peers— is a required practice in all academic  projects.

Plagiarism involves borrowing or using information from other sources without proper and full credit. You are subject to disciplinary actions for the following offenses which include but are not limited to cheating,

plagiarism, forgery or unauthorized use of documents, and false form of identification

Turnitin, an originality detection service in NYU Brightspace, maybe used in this course to check your work for plagiarism.

Read more about academic integrity policies at the NYU School of Professional Studies on theAcademic Policies for NYU SPS Studentspage.

7. Use of Third-Party Tools - During this class, you may be required to use non-NYU

apps/platforms/software as a part of course studies, and thus, will be required to agree to the “Terms of Use” (TOU) associated with such apps/platforms/software.

These services may require you to create an account but you can use a pseudonym (which may not identify you to the public community, but which may still identify you by IP address to the company and companies   with whom it shares data).

You should carefully read those terms of use regarding the impact on your privacy rights and intellectual

property rights. If you have any questions regarding those terms of use or the impact on the class, you are encouraged to ask the instructor prior to the add/drop deadline.