Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

EECS 484 Homework #5

(70 points)

Due: Tuesday, December 5, 2023 at 11:55 pm

(EST)

Please read the following instructions before starting the homework:

This homework must be completed individually and should be submitted on Gradescope.

No late days for homework! If you miss the due date, you get 0 points. If your PDF gets modified after the due date, you get 0 points. No exceptions on this.

Honor Code:

By submitting this homework, you are agreeing to abide by the Honor Code:

I have neither given nor received unauthorized aid on this assignment, nor have I concealed any violations of the Honor Code.

Question 1 Transactions (34 points)

Check what kinds of conflicts exist in the following schedules. Determine if the following schedules are: serializable, and/or conflict serializable.

1.1 Schedule A [8 points]

T1

T2

T3

 

W(A)

 

R(A)

 

 

W(A)

 

 

COMMIT

 

 

 

 

R(A)

 

 

COMMIT

 

R(A)

 

 

COMMIT

 

1.  [2 points] What conflicts exist between T1 and T3?

2.  [3 points] Is the schedule conflict serializable?

3.  [3 points] Is the schedule serializable?

1.2 Schedule B [8 points]

T1

T2

T3

W(B)

 

 

R(A)

 

 

 

W(B)

 

 

 

R(A)

 

 

W(A)

 

R(A)

 

 

COMMIT

 

 

 

COMMIT

W(A)

 

 

COMMIT

 

 

1.  [2 points] What conflicts exist between T1 and T2?

2.  [3 points] Is the schedule conflict serializable?

3.  [3 points] Is the schedule serializable?

1.3 Schedule C [8 points]

T1

T2

T3

 

R(A)

 

 

 

R(B)

R(B)

 

 

W(B)

 

 

 

 

W(A)

COMMIT

 

 

 

W(B)

 

 

W(A)

 

 

COMMIT

 

 

 

ABORT

1.  [2 points] What conflicts exist between T1 and T2?

2.  [3 points] Is the schedule conflict serializable?

3.  [3 points] Is the schedule serializable?

1.4 Schedule D [10 points]

T1

T2

T3

T4

R(B)

 

 

 

 

 

 

R(C)

 

R(A)

 

 

 

 

 

W(B)

 

 

R(B)

 

 

 

W(C)

 

R(C)

 

 

 

W(C)

 

 

 

COMMIT

 

 

 

 

W(A)

 

 

 

W(B)

 

 

 

COMMIT

 

 

 

 

 

R(C)

 

 

 

W(C)

 

 

 

W(A)

 

 

 

COMMIT

 

 

COMMIT

 

1.  [2 points] What conflicts exist between T1 and T4?

2.  [2 points] What conflicts exist between T3 and T4?

3.  [3 points] Is the schedule conflict serializable?

4.  [3 points] Is the schedule serializable?

Question 2. Left-Deep Plan (36 points)

Consider the following schemas:

Team(tid, tname, sport,

ranking) Game(tid, vid, date,

result) Venue(vid, vname)

Consider the following query for question 1:

SELECT T.tname

FROM Team T, Game G, Venue V

WHERE T.tid = G.tid AND

G.vid = V.vid AND

T.sport = ‘football’

AND

V.vname = ‘Michigan Stadium’;

Q2.1 [3 points] Given the above schema and no information on data sizes, select the plans that are valid left-deep plans for the query (avoid plans that cause a join to degenerate into a cross-product) from the plans given below.

Q2.2 [33 points] Consider the following plan for the questions that follow later in this problem. This plan only shows the join operators. Assume that selections and projections are applied on the edges as appropriate so that they are moved inside to reduce table sizes as much as possible prior to any joins.

For example, between relation T and the join operator, you would want to do a selection on T.sport = 'football'and also a projection to remove columns that are not going to be used. You can assume that projections and selections can be done together as tuplesof T are being read into the buffer. Similarly, you will want to insertselections and projection operators along other edges as needed.

Consider the following information.

Team Table: 600 data pages

● There are 5 types of sports, one of which is football. All teams play a sport and each sport has the same number of teams that play it.

● Index on Team Table: Clustered B+ tree index on attribute sport. The entire tree, including the leaves, lives in memory.

Venue Table: 300 data pages

● There are 4 uniquely named venues, the best of which is “Michigan Stadium”. Every venue has the same number of records. (Note that each uniquely named venue can have multiple vids)

● No indexes are available on Venue.

Game Table: 1200 data pages

● Every team has the same number of venues where it participated in a game as any other team.

● Every venue has the same number of teams that played in it as any other venue. ● No indexes are available on Game.

System Info:

● Page size = 512 bytes

● Each attribute is 64 bytes

Additional assumptions:

1. Ifthere are selections and/or projections after a join operation, they will be pipelined   (done in memory as the final result is being produced). However, you should assume that results from a previous join (with the pipelined selection and

projection having been applied) are materialized to disk for any subsequent join operation.

2. Similarly, if a selection and/or projection is done on input tables (e.g., T) prior to the first join with that table, assume that you would materialize the result after the selection and  projection.

3. Use an index for selection whenever there is an index match on selection predicates (not join predicates).

4. There is sufficient memory so that the read cost to join any two relations A and B can be assumed to be |A| + |B| (Think about why this is a reasonable estimate). 5. Ignore the cost of writing the final result (but not intermediate results).

Answer the following questions regarding the query plan:

1. [3 points] After applying a selection predicate on the edge from Team to the Join, what is the number of qualified tuples? If no selection predicate is needed, assume that the   selection predicate is true.

2. [3 points] After applying a projection operation (if any) on the edge from the Team to the Join, what is the size of each tuple in the resulting intermediate relation? If no projection is needed, then give the size of the entire tuple as your answer.

3. [6 points] What is the total cost (read and write) of applying both selection and projection to the Team relation before the first join that involves it? Note that the intermediate table after selection and projection is required to be materialized prior to the join (see

additional assumption #2), so consider the cost of materialization in your answer. (Hint: See additional assumption #3)

4. [6 points] What is the total cost (read and write) of applying selection and projection (if     any) to the Venue relation before any join, including the cost of materializing the result if any operation was applied?

5. [3 points] If we join Team and Game together based on tid after selections and projections on both tables (if any), how many estimated matching tuples are there in the join result?

6. [6 points] What is the additional cost (read and write) of joining Team and Game as the  first step of the plan, assuming that all the required intermediate inputs to the join are    already on disk? Include the cost of materializing the intermediate result, which will be used in the second join.

7. [6 points] What is the total estimated cost (without the final write) of the plan ((Team ⨝ Game) ⨝ Venue)?