Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

COMP5338: Advanced Data Models

Sem.  2/2025

MongoDB Project (Individual, 20%)

Introduction

The purpose of this assignment is to test your practical understanding of MongoDB query formulation, indexing, and schema design. You will work with a real dataset and produce executable . js scripts that can be run directly in mongosh.

The assignment is worth 20 points in total and consists of three tasks:

Task 1: Implement five query workloads (7.5 points).

Task 2: Create and evaluate indexes (4 points).

Task 3: Evaluate and redesign the schema (3.5 points).

Task 4: Implement the five query workloads using the new schema ( 5 points)

Dataset

You will use the sample_airbnb dataset provided by MongoDB. The dataset can be down- loaded from Canvas.   It  contains a single JSON file listingsAndReviews. json.   The MongoDB documentation (linked above) contains description of the data and a sample document showing all fields.

Task 1: Query Workloads (7.5 points)

You must import the sample data into a single collection using the following command:

mongoimport  - -db  airbnb  - -collection  listings  --file  listingsAndReviews. json

Implement the following five workloads against the listings collection. Each work- load should be implemented as a single MongoDB find or aggregate command and writ- ten in its own script, some workload need to be parameterised using variables at the top of the file (see Deliverables for naming convention and template). f

1.1 Most Recent Reviews by a Reviewer

Aim: Retrieve the most recent 10 reviews written by a given reviewer.

Parameters:

REVIEWER_ID (string)

Requirements:

• Find reviews with the given reviewer_id.

Sort by review date indescending order.

• Return the most recent 10 reviews, including listing_id, listing_name, country, market, and review_date.

1.2 Countries with Highest Average Listing Price

Aim: Compare countries by their average listing prices.

Parameters:

ACCOMMODATES_MIN (integer)

Requirements:

Consider only properties that accommodate at least the given number of guests.

Group by country, compute average price and total listings.

• Return the top 10 countries by average price, with fields country, avgPrice, and totalListings.

1.3 Reviewers Active in Multiple Countries

Aim: Find reviewers who have written reviews in multiple countries.

Requirements:

Identify all reviewers with reviews in at least 2 distinct countries.

• For each reviewer, collect a set of distintct countries they have reviewed, compute the total number of reviews and the number of distinct countries.

• Return only the top 10 reviewers by number of countries, the total reviews and the reviewer’s ID.

Fields returned: reviewer_id, countryCount, countryList,totalReviews.

1.4 Listings with Specific Attributes and Recent Reviews

Aim: Retrieve listings that match multiple conditions and return only recent reviews.

Parameters:

COUNTRY (string)

MARKET (string)

PROPERTY_TYPE (string)

ACCOMMODATES_MIN (integer)

AMENITY (string, single amenity required)

Requirements:

Listing must belong to the specified country and market.

Property type must match and accommodate at least the given number of guests.

• The amenities array must contain the specified amenity (case-insensitive) as a whole or part of the amenity description.

Only include listings with at least 5 reviews.

• For each matching listing, return _id, name, listing_url, and the most recent 5 reviews (reviewer id, name, date).

Sort the final set of listings by _id ascendingly

Limit the output to 5 listings after sorting.

1.5 Most Popular Listing in a Market with Market Average Price

Aim: For a given country and market, identify the most popular listing by the number of reviews of that property, and contextualise it with market-wide statistics.

Parameters: COUNTRY (string), MARKET (string).

Requirements:

•  Restrict the search to listings in the specified COUNTRY and MARKET.

• Return the single most popular listing  (highest  reviewerCount);  in case of ties, choose the one with the smallest _id.

• The output of the chosen listing must include: _id, name, listing_url, and reviewerCount.

• Additionally, compute the average price of listings in the same market and the total number of listings in this market.

Final output should therefore include two parts:

Details of the most popular listing, which include its _id, listing_url, name, price and reviewerCount

Market statistics: avgPrice and marketListings.

Task 2: Indexing and Performance (4 points)

Create a set of indexes that could improve performance for the queries in Task 1.

For each query, describe the index(es) that it could utilize

• Use the explain("executionStats") method to show evidence that the index has been used by each query.

Task 3: Schema Evaluation and Redesign (3.5 points)

Schema Evaluation (2 points)

The current schema follows certain schema design patterns, but also leaves room for im- provement.  Identify at least one pattern that it follows and explain how it follows the pattern.  Identify also one aspect where a schema design pattern could be adopted and explain your justification.

Schema Redesign (1.5 points)

Propose an alternative schema design that uses at least two collections to represent all of the information currently stored in the Airbnb dataset.  In your redesign, you should identify and address at least one limitation of the current schema, and provide a clear justification of how the new design better supports particular data features and frequent query patterns. The data feature or query pattern you choose for justification does not need to come directly from the Task 1 workloads; you may introduce any reasonable example. Your discussion should also consider the trade-offs of the proposed schema, including the potential impact on write operations, query performance, and data consistency.

Task 4: Query Workloads with New Schema (5 points)

Provide queries that migrate data from the old schema to the new one. Re-implement the five query workloads in task 1 using the new schema.

Deliverables

Your submission must consist of exactly two files uploaded to Canvas:

1. Source code ( .zip)

A single zip file containing all required JavaScript files organised into two folders:

task1/ Five scripts implementing the Task 1 query workloads:

task1/q1_ 1. js

task1/q1_2. js

task1/q1_3. js

task1/q1_4. js

task1/q1_5. js

Parameter files may also be included if required (e.g., q1_ 1. params. js).  All scripts must load header. js and print results using printjson.  Markers will execute queries as:

mongosh  q1_ 1. params. js  q1_ 1. js

task4/ Scripts related to the redesigned schema:

task3/q4_build. js

task3/q4_ 1. js

task3/q4_2. js

task3/q4_3. js

task3/q4_4. js

task3/q4_5. js

The file q4_build. js contains queries to perform migration from the original schema. Make sure you DO NOT override the original listing collection. Each of q4_ 1. js to q4_5. js re-implements the Task 1 workloads against the new schema. You may hard code parameter values in this version.

2. Written report (.pdf)

A single PDF document combining Task 2 and Task 3:

Task 2 (Indexing and Performance): This section should cover required con- tent as described for task 2.

Task 3 (Schema Evaluation and Redesign): The section should cover required content as described for task 3.

Example and Template Minimal example files demo_params_amenity. js and

demo_params_amenity. params. js are provided to demonstrate how to declare variables, perform a case-insensitive amenity search, and print query results using printjson.

A shared utility file header. js is also provided.  This file defines a small helper func- tion param(name,  fallback) that retrieves parameter values from the corresponding pa- rameter files.  It also initialises a connection to the airbnb database and the listings collection, so that every query script can access the collection through the variable coll. All workload scripts must load this file at the top using load("header. js");.

In addition, we provide template files and their corresponding parameter files (except for Workload 3, which does not require parameters) for each Task 1 workload.  You are encouraged to use these templates to submit your implementations of task 1.  The same templates may also be used for Task 4, but you must update the collection names to match your redesigned schema.