Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Problem Set 3: Query Evaluation and Optimization
COMP SCI 339
Due: February 21, 2022
Name:
1 Buffer Manager and Simple Query Execution
1. 
(3 points)
Consider a buffer pool with a LRU page replacement policy. It has 10 pages. The buffer
pool is initially empty.
Recall the relation S – from Problem Set 2 – with the following schema:
S(a int, b char(10), c char(20))
and the following SQL query:
SELECT S.a
FROM S
A simple physical query plan for this query is the following:
π a
Sequential Scan
Query Processor
Storage Manager
Physical Query
Plan
Heap File Access Method
Buffer Pool
Manager
S   Disk
We show the query plan in the context of the relevant DBMS architecture components.
To execute the query, the system will call open() and then next() on the project
operator. We ignore hasNext() in this exercise.
Consider that relation S is stored in a heap file on disk.
(a) (1 point) Explain how the execution of this query will proceed as the system calls
open() and then next() on the topmost, project operator. You only need to
describe what happens on the call to open() and then on the first call to next().
You do not need to describe subsequent calls to next().
Your explanation should describe the control flow (who calls whom and when)
between (1) the project operator, (2) the sequential scan operator, (3) the heap file
access method, and (4) the buffer pool manager. Keep in mind that the buffer pool
manager will call the heap file to actually read a page from disk.
(b) (0.5 point) What will be the content of the buffer pool after the first next() call
on the project operator returns a tuple.
(c) (0.5 point) What will be the content of the buffer pool after the second next() call
on the project operator returns a tuple.
(d) (0.5 point) Assuming S contains 10,000 tuples, what will be the content of the
buffer pool after 1000 next() calls on the project operator.
(e) (0.5 point) Assuming S contains 10,000 tuples, what will be the content of the
buffer pool after the query completes?
2 Query Plan Cost Estimation
2. (4 pts) Consider the following relations and physical query plan:
R(a,b,c) S(d,e,f,g) T(h,i,j)
TCARD(R) = 1000 TCARD(S) = 1000 TCARD(T) = 1000
NCARD(R) = 10,000 NCARD(S) = 10,000 NCARD(T) = 10,000
Min(R,a) = 0 ICARD(S,e) = 10 ICARD(T,h) = 100
Max(R,a) = 9000 ICARD(S,f) = 100
ICARD(R,b) = 100 ICARD(S,d) = 50
ICARD(R,c) = 20 ICARD(S,g) = 40
R S
c = d
(Clustered index on a) (Unclustered index on (e,f) )
(Hash join)
(Index Nested Loop)
(d) σ b=100 ∨ b=101
(c)
(e)
g=h
T (Clustered index on h )
(a) σ a > 100 ∧ a <= 1000
(Use B+ tree index )
(b) σ e = 10 ∧ f = 1000
(Use B+ tree index )
π a,b,c,g,I,j
R1
R2
R3
R4
R5
(f)
(a) (1 point) Compute the selectivity of the following predicates:
1. a > 100 ∧ a ≤ 1000
2. e = 10 ∧ f = 1000
3. c = d
4. b = 100 ∨ b = 101
5. g = h
(b) (1.5 points) Compute the cardinality of all intermediate relations labeled R1 through
R5 and the final result, call it R6.
(c) (1.5 points) Compute the cost of this query plan in terms of number of pages read
from disk or written to disk. Assume that all of the operators after the hash join
are pipelined - i.e., you do not need to store intermediate results on disk. Only
count data pages, not index pages, in your lookups.
3 Query Optimization
3. (4 points)
Consider the following three relations:
R(a,b,c) S(d,e) W(f,g,h)
TCARD(R) = 100 TCARD(S) = 1,000 TCARD(W) = 10
NCARD(R) = 1,000 NCARD(S) = 10,000 NCARD(W) = 100
Consider the following SQL Query:
SELECT *
FROM R, S, W
WHERE R.a = S.d
AND R.c = W.h
(a) (2 points) Assume that all relations are stored in heap files, there are no indexes,
only page-at-a-time nested-loop joins can be used, and the selectivity of each join
predicate is 0.1%.
Show the query plan selected by a Selinger-style, bottom-up, dynamic programming
optimizer. Use the number of disk I/O operations as the cost function.
Hints:
• Remember that a Selinger-style optimizer will try to avoid Cartesian products
in the plan. So do NOT consider such plans.
• The Selinger optimizer will only consider left-deep plans.
Draw the selected plan and show how it is derived. You can use the following
table to help you but do NOT worry about computing the exact cost and size values
if you don’t need exact values to prune plans. In the table, P/K indicates the choice
to either prune or keep the subplan. Hint: When joining tuples, keep in mind that
the tuples get bigger.
Subquery Cost Size of output Plan P/K
R 100 page I/Os 1K records on 100 pages Sequential scan of R K
S 1K page I/Os 10K records on 1K pages Sequential scan of S K
W 10 page I/Os 100 records on 10 pages Sequential scan of W K
RS . . . 10K records on 2K pages . . .
SR . . . . . .
. . .
Space for your answer:
(b) (1 point) Consider the following small modification to the above query and consider
that we add an unclustered B+ tree index on S.d as well as a clustered B+ tree
index on R.b. Without re-computing the new best plan, explain how these changes
affect the optimization process for this query.
SELECT *
FROM R, S, W
WHERE R.a = S.d
AND R.c = W.h
AND R.b > 100
(c) (1 point) What is an interesting order? Give one or two examples and explain how
they can be useful in getting a better physical query plan.