Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Module code and Title

IFB214TC Data Mining Applications

School Title

School of Intelligent Finance and Business

Assignment Title

CW2

Submission Deadline

9th January 2024

Final Word Count

If you agree to let the university use your work anonymously for teaching and learning purposes, please type “yes” here.

General Information

1. Submission Deadline:

-     The assignment must be submitted by 5:00 PM on Tuesday, 9th January 2024.

2. Weight:

-     This assignment holds a weightage of 50% towards your final grade.

3. Assessment Type:

-     Individual

3. Format Requirements:

-     Your assignment should include a cover page with your student ID.

-     The report format should encompass an introduction, response to each task, conclusion, and references. Appendix A provides a comprehensive report structure.

-     Ensure that all assignments are well-typed, proofread, and maintain a professional appearance.

-     Use 'Times New Roman', 'Arial', or 'Calibri' font, size 12, with 1.5 spacing.

-     Set left and right margins to 'Justified'.

-     Employ Harvard Referencing when citing sources and creating references.

4. Submission Requirements:

-     All students are required to submit one document:

-     PDF file of the individual report.

-     Submit your files through Learning Mall Online in the designated drop box. Only electronic submissions are accepted.

-     Ensure to name each file as IFB214TC-CW2-Your Student ID.

-     After submission, download your file and confirm its viewability. Please note that document corruption during uploading (e.g., due to slow internet connections) is the student's responsibility. Submitted files should be functional and correct for assessment purposes.

5. Word Limit:

-     Please ensure your report stays within the range of - 10% to +10% of the 1,500-word limit. Utilize tables and charts to enhance conciseness while avoiding excessive length.

Assessment Tasks

In the era of online streaming platforms, movie recommendation systems play a pivotal role in enhancing user experience. These systems rely on data mining techniques to analyze customer preferences and provide tailored movie suggestions. As a data analyst, you are tasked with exploring a movie recommendation dataset that includes customer IDs, movie IDs, and movie ratings. By employing Python and data mining techniques, you can uncover valuable insights that contribute to better movie recommendations and user satisfaction.

Your mission is to thoroughly analyze the provided movie recommendation dataset using Python and various data mining techniques. The dataset consists of customer IDs, movie IDs, and movie ratings, representing the interactions between customers and movies. For each task, you are required to provide clear  explanations, Python code implementations, and relevant visualizations. Ensure to include the generated Python code screenshots in the report for reference.

Task 1: Customer Preferences and Ratings (10 Marks)

1. Identify and list movies that have the highest average ratings across all customers.

2. Detect customers who consistently rate movies positively or negatively.

Task 2: Movie Popularity and Ratings (15 Marks)

1. Determine the top 10 most popular movies based on the number of ratings they received.

2. Investigate whether there is a correlation between a movie's popularity (number of ratings) and its average rating.

Task 3: Outlier Detection and Anomalies (10 Marks)

Identify customers who consistently provide extreme ratings (e.g., always giving the lowest or highest ratings). Explore the potential impact of these outliers on the recommendation system.

Task 4: Clustering Analysis (20 Marks)

Apply the K-Means clustering algorithm to group customers based on their movie ratings. Use techniques like Principal Component Analysis (PCA) for dimensionality reduction and visualize the resulting clusters using scatter plots or other appropriate visualization methods.

Task 5: Apriori Algorithm for Association Rules (25 Marks)

1. Preprocess the dataset to prepare it for the Apriori algorithm.

2. Apply the Apriori algorithm to discover frequent itemsets, representing movies that are frequently rated together.

3. Provide a sample of the discovered frequent itemsets and associated association rules.

4. Discuss the insights gained from the results and propose how these insights could be utilized to enhance movie recommendations.

Overall Presentation and Code Quality (20 Marks)

Evaluate the clarity and coherence of your analysis, and ensure proper documentation and organization of your code. Pay attention to code readability and efficiency.

Note: Each student will be allocated a unique xls file. The file will be uploaded in LMO in due course. The sample data structure is shown in Table 1.

Table 1

User_ID

Product_ID

Rating

Purchase_Date

1

1

5

11/06/2022

1

2

4

12/06/2022

2

1

4

15/06/2022

……

……

……

……