关键词 > Database

Database HW1

发布时间:2025-09-23

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Database HW1

In this homework, you will be asked to write SQL queries to answer a series of questions about a data set. You find this data set at chocolate_bars.csv.

It contains ca. 2,500 reviews about chocolate bars as well as associated information on the chocolate bars.

Each row in the data describes one review. You can load the data into an SQL table, created by the following DDL statement:

create table if not exists bars(id int, manufacturer char varying,

company_location char varying, year_reviewed int,

bean_origin char varying, bar_name char varying,

cocoa_percent numeric, num_ingredients numeric,

ingredients char varying,

review char varying, rating numeric);

If using Postgres, assuming the file chocolate_bars is located in the current directory, you can now load the data using the command

\copy bars from 'chocolate_bars.csv' with csv header;

If you do not have Postgres installed, you can copy the CoLab notebook link. Run the upper cell to create a database with SQLITE3 (an alternative to Postgres) and to load the data. Then, replace the SQL query in the second cell by whatever query you are interested in and execute the cell to display the query result.

The following questions are meant to be answered by writing SQL queries. The final question (no points) asks you to provide the SQL queries you used to answer each of the questions. We may subtract points retroactively if your SQL queries do not match the results you submit for the other questions.

Q1. The column "rating" contains a rating for a bar, on a scale from 1 to 4. What is the average rating (averaging over all reviews)?

Q1 Answer (Numeric)

Q2. The column "manufacturer" stores the manufacturer of the bar to which the review refers. Which manufacturer received most reviews? (just write the name of the manufacturer, use no spaces or quotes)?

Q2 Answer (Fill in the Blank)

Q3. Count the number of reviews that contain the word "earthy" (all lowercase) in the "review" column!

Q3 Answer (Numeric)

Q4. Only one manufacturer has more than 25 reviews for bars with a cocoa percentage of more than 71. Which one is it? Answer (just provide the name of the manufacturer, no quotes or spaces):

Q4 Answer (Fill in the Blank)

Q5. How many bars (i.e., number of distinct bar names) exist in the database that did not receive any review in 2018?

Q5 Answer (Numeric)

Q6. How many pairs of reviews can we find such that both refer to the same bar (column "bar_name"), the second review was written after the first review (i.e., "year_reviewed" is higher for the second review), and the rating assigned by the second review (column "rating") is higher?

Q6 Answer (Numeric)

Q7. How many manufacturers produce at least one bar where the bean origin (column "bean_origin") is the same as the company location (column "company_location")?

Q7 Answer (Numeric)

Q8. How many pairs of reviews exist such that both have exactly the same review text but refer to different chocolate bars? Hint: make sure to count each pair of reviews only once (i.e., do not double-count reviews A,B as A,B and B,A).

Q8 Answer (Numeric)

Q9. Is the average rating higher for chocolate bars with a cocoa percentage above average (compared to bars with a percentage below average)?

Q9 Answer :  True/False

Q10. Count the number of bars (number of distinct bar names) that received a review in every year (column "year_reviewed") present in the data set.

Q10 Answer (Numeric)

For each of the previous question, provide the SQL query (or queries) you used to solve the exercise. Make sure to clarify which query belongs to which question.

Q1

Q2

Q3

Q4

Q5

Q6

Q7

Q8

Q9

Q10