Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

MATH 456 — Mathematical Modeling

Project #2: Movie Recommendation System

Goal. The goal of the project is to use past movie ratings to predict how users will rate movies they haven’t watched yet. This type of prediction algorithm forms the underpinning of recommendation engines, such as the one used by Netflix. We’re giving you a large set of ratings from real movie- rating data, but holding back 200 ratings for you to predict.  (In machine-learning parlance, the data we provide is labeled training data”; you will use it to come up with rating predictions for the 200 “unlabeled” data items.)

Data. The following five data files are available for download on moodle. Files movies and allData are in TSV format (tab-separated values), since movie names may have commas in them. You can use the read csv with the sep specified as ’/t’in pandas to load the files

• users .csv - Information about 2353 movie watchers. Each line has three fields: userID, age (see notes below), gender (”F”or ”M”)

• movies .tsv - Information about 1465 movies. Each line has six fields: movieID, name, year, genre1, genre2, genre3. If a movie has fewer than three genres the extra fields are blank.

• ratings .csv - 31,620 movie ratings. Each line has three fields: userID, movieID, rating. The userID and movieID correspond to those in the users.csv and movies.tsv files, respectively. Ratings are integers in the range 1 to 5 (from worst to best).

• allData .tsv - For those who prefer having everything in one place, this file contains the combined information from the previous three files.  Each line has 10 fields:  userID, age, gender, movieID, name, year, genre1, genre2, genre3, rating

• predict .csv - Ratings for you to predict.   Each line has three fields:  userID, movieID, rating, with all ratings set initially to 0. There are no ratings for these userID-movieID pairs in ratings.csv.

This data is real: it’s a subset of the movie ratings data from MovieLens, collected by GroupLens Research, which Stanford CS102 anonymized. Start by browsing the schema and tables using head to see what’s in the various fields. Note that the age field in the users.csv file doesn’t contain exact ages but instead one of seven bucketized values as follows:  1 (age is under 18), 18 (age is 18-24), 25 (age is 25-34), 35 (age is 35-44), 45 (age is 45-49), 50 (age is 50-55), 56 (age is 56 or older).

Numerous algorithms for recommend system are available in the literature. You are allowed to use and adapt any model in the literature.  However, you are responsible for the originality of your codes and the following:

• formulating the model(s) in a precise mathematical language;

• developing and implementing algorithms for computing your model(s);

• documenting and discussing your results in a short report (less than 10 pages), and presenting the results as a group.