DSME6751 Database and Big Data Management
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
DSME6751 Database and Big Data Management
Individual Final Requirements
(40 points in total)
Submission due by 11:59 pm on March 5 (Sun)
Note
Please prepare your answers in Word document or PDF format and submit to the “Individual Final Submission” under “Course Content” at CUHK’s Blackboard website.
Part 1: Big Data Concepts (2 pts)
Please answer the following questions by providing appropriate explanation in a few short sentences clearly.
A. NoSQL (e.g., MongoDB) has a number of favorable features that make it more suitable for managing big data. However, Relational Database (e.g., MySQL) is still widely used in the banking industry. Why is that?
B. You are hired as a project manager of a startup company providing a messaging service (e.g., WeChat Messages) and payment service between users (e.g., WeChat Pay). You have to decide the database solutions to support these two applications. What types of database solutions will you propose? And , why?
Requirements:
For each question above, please prepare your answers in Word/PDF document.
Part 2: MongoDB Data Modeling – Instagram (13 pts)
In the Midterm Test, you designed and proposed the relational database for the Instagram application. Now, Instagram wants to replace their relational database with NoSQL database. Let’s assume that you are working for Instagram and the company wants to utilize MongoDB to efficiently manage data about users (followers &
followings), posts (photo & description), comments, and likes.
Your task is to propose collections and the best document structures.
The following eight application behaviors should be considered when you propose the document structures:
• User Account – When a user account is accessed, we want to display the user information and the user’s first set of recent posts instantly. When scrolled down, another set of more posts should be displayed as fast as possible. (See screenshot #1)
• Followers – When someone clicks “Followers” on a user account, we want to display the first set of user’s recent followers instantly. When scrolled down, another set of more followers should be displayed as fast as possible. (See screenshot #2)
• Followings – When someone clicks “Following” on a user account, we want to display the first set of other users whom the user recently started to follow instantly. When scrolled down, another set of more other users being followed should be displayed as fast as possible. (See screenshot #3)
• Post – When someone clicks a post on a user account, we want to display the post information, all the photos, all the hashtags, and the first set of recent comments instantly. Up to 5 photos can be included in each post. Up to 10 hashtags can be included in each post. (See screenshot #4)
• More Comments – When “view all comments” option is clicked on a post, we want to display the list of recent comments as fast as possible. When scrolled down, another set of more comments should be displayed as fast as possible. We also want to keep each comment’s hashtag information. Up to 10 hashtags can be included in each comment. (See screenshot #5)
• Likes – When “likes” option is clicked on a post, we want to display the list of recent users who liked the post as fast as possible. When scrolled down, another set of more users who liked the post should be displayed. (See screenshot #6)
• User-tagging in a Photo – When “user-tags” option is clicked on a photo in a post, we want to display all the users tagged in the photo instantly. Up to 5 users can be tagged in each photo. (See screenshot #7)
• Hashtag – When a hashtag is clicked in a post, we want to display the list of recent posts that contain the hashtag as fast as possible. When scrolled down, another set of more posts that contain the hashtag should be displayed as fast as possible. (See screenshot #8)
Please check the following screenshots for more details of information displayed in the Instagram application.
You can also access the Instagram at:https://www.instagram.com
Your document structures should include all the information shown on the above screenshots such as:
• Information shown on the screenshot #1: Username Bio (textual description) Number ofposts Number offollowers Number offollowings • Information shown on the screenshot #4: Photo(s) Post description Number of likes Number of comments Posting date • Information shown on the screenshot #5: Comment text Comment date Number of likesfor each comment • Information shown on the screenshot #8: Number of posts containing the hashtag |
Note. You don’t need to model “reply” features in comments.
Requirements:
Your task is to propose collections and the best document structures and
Note that, when we want to display information instantly, we want to make sure that necessary information to display is readily available. When we want to display information as fast as possible, we want to make sure that document structures allow MongoDB to search necessary information as quickly as possible. All in all, we want
to design the document structures which are the most efficient in terms of showing necessary information to users. Please prepare your answers in Word/PDF document.
Part 3: MongoDB Query & Aggregation – sakila (9 pts)
We will use primer-dataset.json file (restaurants collection) which we used in the class and for Group Assignment 4. If you already have this collection in MongoDB, you do not need to import it again. Even though you already executed an update query for Group Assignment 4, it will not affect solving questions presented here.
If you do not have restuarants collection ready, please follow the steps below.
To prepare the database, download primer-dataset.json file from the Blackboard course page and put it into C:\ folder. Next, open the command prompt execute the script by typing the following (please make it sure that you have the database named test in MongoDB; also make it sure that your MongoDB is active): "C:\<specify-MongoTools-location>\mongoimport.exe" -d test -c restaurants C:\primer-dataset.json For example, if you installed your MongoTools in a default folder, then you can type: "C:\Program Files\MongoDB\Tools\100\bin\mongoimport.exe" -d test -c restaurants C:\primer-dataset.json
You will see the above message “25359 document(s) imported successfully” . Then, you should be able to see restaurants collection in the test database when you connect to your MongoDB with Robo 3T (please make it sure that you refresh test database). In the Mac, similarly you can type the following in the shell prompt: mongoimport -d test -c restaurants <specify-download-location>/primer-dataset.json For example, if you downloaded primer-dataset.json file into ~/ folder (i.e., user folder), then you can type: mongoimport -d test -c restaurants ~/primer-dataset.json |
The below screenshot shows a sample document structure of each restaurant document inside restaurant collection. The important fields that you might be using are marked in red box.
Now, use Studio 3T to solve the following questions. Use restaurants collection.
A. For restaurants on street “Prince Street” in borough “Manhattan”, find the number of restaurants in each cuisine category. For example, 4 restaurants in cuisine “Italian”, 3 in “French”, and so on. (There are 11 different cuisines in total in this area.; Use aggregate function)
B. For restaurants serving cuisine “American” in borough “Manhattan”, find a zipcode which has the highest number of restaurants. For example, there are 125 restaurants in one zipcode area . (Use aggregate function)
C. For restaurants serving cuisine “Chinese”, calculate the average of all the scores given across all the restaurants in each borough category. Then, display only those borough categories with an average score higher than 10. (Use aggregate function)
Note that, given that two restaurants in one borough, if one restaurant received two scores (e.g., 8 and 10) and the other restaurant received one score (e.g., 6), then the average is 8 (= (6+8+10)/3). For example, Bronx’s average is around 11.
D. Across all the restaurants whose name is “Wendy’S”, find the total number of grade “A” given in each year. For example, across all the “Wendy’S”, 3 “A” grades were given in 2015, 53 “A” given in 2014, and so on. (Use aggregate function)
Requirements:
For each query, turn in both the MongoDB query (the code which you made) and the answer to question. For the MongoDB queries, simply copy and paste them into the Word/PDF document. For the answer of the question, please type (or screenshot) them and put them (right after/below each query) into the Word/PDF document.
Part 4: MongoDB Query Performance (6 pts)
Please answer the following questions by providing appropriate explanation in a few sentences clearly.
(1) When designing an aggregation query involving multiple operators in MongoDB (using aggregate function), some argue that $matching operator should be used first (or as early as possible) for better performance (i.e., faster speed). Is this argument true or not? Explain why.
(2) When designing an aggregation query involving multiple operators in MongoDB (using aggregate function), some argue that $project operator should be used first (or as early as possible) for better performance (i.e., faster speed). Is this argument true or not? Explain why.
(3) When designing an aggregation query involving multiple operators in MongoDB (using aggregate function), some argue that $unwind operator should be used first (or as early as possible) for better performance (i.e., faster speed). Is this argument true or not? Explain why.
Requirements:
For each question above, please prepare your answers in Word/PDF document.
Part 5: Indexing (3 pts)
Let’s refer to the “Group Assignment 3 Solution” on Blackboard. As shown below, for Q1 of Assignment 3, the sample solution proposed document structures for “user” and “page” collections, which guarantees the best performance (i.e., fast “read” operations).
2023-03-06