Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

DSME6751 Database and Big Data Management

Individual Final Requirements

(40 points in total)

Submission due by 11:59 pm on March 5 (Sun)

Note

Please prepare your answers in Word document or PDF format and submit to the Individual Final Submission” under “Course Content” at CUHK’s Blackboard website.

Part 1: Big Data Concepts (2 pts)

Please answer the following questions by providing appropriate explanation in a few short sentences clearly.

A.   NoSQL (e.g., MongoDB) has a number of favorable features that make it more suitable for managing big data. However, Relational Database (e.g., MySQL) is still widely used in the banking industry. Why is that?

B.   You are hired as a project manager of a startup company providing a messaging service (e.g., WeChat Messages) and payment service between users (e.g., WeChat Pay). You have to decide the database solutions to support these two applications. What types of database solutions will you propose? And , why?

Requirements:

For each question above, please prepare your answers in Word/PDF document.

Part 2: MongoDB Data Modeling Instagram (13 pts)

In the Midterm Test, you designed and proposed the relational database for the Instagram application. Now, Instagram wants to replace their relational database with NoSQL database. Let’s assume that you are working for Instagram and the company wants to utilize MongoDB to efficiently manage data about users (followers &

followings), posts (photo & description), comments, and likes.

Your task is to propose collections and the best document structures.

The following eight application behaviors should be considered when you propose the document structures:

User Account When a user account is accessed, we want to display the user information and the user’s first set of recent posts instantly. When scrolled down, another set of more posts should be displayed as fast as possible. (See screenshot #1)

Followers When someone clicks Followers” on a user account, we want to display the first set of user’s recent followers instantly. When scrolled down, another set of more followers should be displayed as fast as possible. (See screenshot #2)

Followings When someone clicks Following” on a user account, we want to display the first set of other users whom the user recently started to follow instantly. When scrolled down, another set of more other users being followed should be displayed as fast as possible. (See screenshot #3)

Post When someone clicks a post on a user account, we want to display the post information, all the photos, all the hashtags, and the first set of recent comments instantly. Up to 5 photos can be included in each post. Up to 10 hashtags can be included in each post. (See screenshot #4)

More Comments When view all comments” option is clicked on a post, we want to display the list of recent comments as fast as possible. When scrolled down, another set of more comments should be displayed as fast as possible. We also want to keep each comment’s hashtag information. Up to  10 hashtags can be included in each comment. (See screenshot #5)

Likes When likes” option is clicked on a post, we want to display the list of recent users who liked the post as fast as possible. When scrolled down, another set of more users who liked the post should be displayed. (See screenshot #6)

User-tagging in a Photo When user-tags” option is clicked on a photo in a post, we want to display all the users tagged in the photo instantly. Up to 5 users can be tagged in each photo. (See screenshot #7)

Hashtag When a hashtag is clicked in a post, we want to display the list of recent posts that contain the hashtag as fast as possible. When scrolled down, another set of more posts that contain the hashtag should be displayed as fast as possible. (See screenshot #8)

Please check the following screenshots for more details of information displayed in the Instagram application.

You can also access the Instagram at:https://www.instagram.com

Your document structures should include all the information shown on the above screenshots such as:

Information shown on the screenshot #1:

Username

Bio (textual description)

Number ofposts

Number offollowers

Number offollowings

Information shown on the screenshot #4:

Photo(s)

Post description

Number of likes

Number of comments

Posting date

Information shown on the screenshot #5:

Comment text

Comment date

Number of likesfor each comment

Information shown on the screenshot #8:

Number of posts containing the hashtag

Note. You dont need to model reply” features in comments.

Requirements:

Your task is to propose collections and the best document structures and

Note that, when we want to display information instantly, we want to make sure that necessary information to display is readily available. When we want to display information as fast as possible, we want to make sure that document structures allow MongoDB to search necessary information as quickly as possible. All in all, we want

to design the document structures which are the most efficient in terms of showing necessary information to users. Please prepare your answers in Word/PDF document.

Part 3: MongoDB Query & Aggregation sakila (9 pts)

We will use primer-dataset.json file (restaurants collection) which we used in the class and for Group Assignment 4. If you already have this collection in MongoDB, you do not need to import it again. Even though you already executed an update query for Group Assignment 4, it will not affect solving questions presented here.

If you do not have restuarants collection ready, please follow the steps below.

To prepare the database, download primer-dataset.json file from the Blackboard course page and put it into C:\ folder. Next, open the command prompt execute the script by typing the following (please make it sure that you have the database named test in MongoDB; also make it sure that your MongoDB is active):

"C:\<specify-MongoTools-location>\mongoimport.exe" -d test -c restaurants C:\primer-dataset.json

For example, if you installed your MongoTools in a default folder, then you can type:

"C:\Program Files\MongoDB\Tools\100\bin\mongoimport.exe" -d test -c restaurants C:\primer-dataset.json

You will see the above message “25359 document(s) imported successfully” . Then, you should be able to see restaurants collection in the test database when you connect to your MongoDB with Robo 3T (please make it sure that you refresh test database).

In the Mac, similarly you can type the following in the shell prompt:

mongoimport -d test -c restaurants <specify-download-location>/primer-dataset.json

For example, if you downloaded primer-dataset.json file into ~/ folder (i.e., user folder), then you can type:

mongoimport -d test -c restaurants ~/primer-dataset.json

The below screenshot shows a sample document structure of each restaurant document inside restaurant collection. The important fields that you might be using are marked in red box.

Now, use Studio 3T to solve the following questions. Use restaurants collection.

A.   For restaurants on street Prince Street” in borough Manhattan”, find the number of restaurants in each cuisine category. For example, 4 restaurants in cuisine Italian”, 3 in French”, and so on. (There are 11 different cuisines in total in this area.; Use aggregate function)

B.   For restaurants serving cuisine American” in borough Manhattan”, find a zipcode which has the highest number of restaurants. For example, there are  125 restaurants in one zipcode area . (Use aggregate function)

C.   For restaurants serving cuisine Chinese”, calculate the average of all the scores given across all the restaurants in each borough category. Then, display only those borough categories with an average score higher than 10. (Use aggregate function)

Note that, given that two restaurants in one borough, if one restaurant received two scores (e.g., 8 and 10) and the other restaurant received one score (e.g., 6), then the average is 8 (= (6+8+10)/3). For example, Bronx’s average is around 11.

D.   Across all the restaurants whose name is Wendy’S”, find the total number of grade A” given in each year. For example, across all the Wendy’S”, 3 “A” grades were given in 2015, 53 “A” given in 2014, and so on. (Use aggregate function)

Requirements:

For each query, turn in both the MongoDB query (the code which you made) and the answer to question. For the MongoDB queries, simply copy and paste them into the Word/PDF document. For the answer of the question, please type (or screenshot) them and put them (right after/below each query) into the Word/PDF document.

Part 4: MongoDB Query Performance (6 pts)

Please answer the following questions by providing appropriate explanation in a few sentences clearly.

(1)  When  designing an  aggregation query involving multiple operators in MongoDB  (using aggregate function), some argue that $matching operator should be used first (or as early as possible) for better performance (i.e., faster speed). Is this argument true or not? Explain why.

(2)  When  designing an  aggregation query involving multiple operators in MongoDB  (using aggregate function), some argue that $project operator should be used first (or as early as possible) for better performance (i.e., faster speed). Is this argument true or not? Explain why.

(3)  When  designing an  aggregation query involving multiple operators in MongoDB  (using aggregate function), some argue that $unwind operator should be used first (or as early as possible) for better performance (i.e., faster speed). Is this argument true or not? Explain why.

Requirements:

For each question above, please prepare your answers in Word/PDF document.

Part 5: Indexing (3 pts)

Let’s refer to the Group Assignment 3 Solution” on Blackboard. As shown below, for Q1 of Assignment 3, the sample solution proposed document structures for user” and page” collections, which guarantees the best performance (i.e., fast read” operations).