Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit


COMP9120 Relational Database Systems

Tutorial Week 11 Solution: Query Processing


Introduction

Suppose we have a schema Rel1(A, B, C) and Rel2(C, D). Each A field occupies 4 bytes, each B field occupies 12 bytes, each C field occupies 8 bytes, each D field occupies 8 bytes. Rel1 contains 100,000 records, and Rel2 contains 50,000 records. There are 100 different values for A represented in the database, 1000 different values for B, 50,000 different values for C, and 10,000 different values for D. Rel1 is stored with a primary (sparse, clustered) B+ tree index on the pair of attributes (A,B); assume this index has 2 levels, including the root page and excluding the leaf (record) pages. Rel2 is stored with a primary (sparse, clustered) B+ tree index on C; assume this index has 2 levels, including the root page and excluding the leaf (record) pages.


General features: assume that each page is 4K bytes, of which 250 bytes are taken for header and array of record pointers. Assume that no record is ever split across several pages. Assume that index entries in any index use the format of (search key, rowid), where rowid uses 4 bytes.


Exercise 1. Block-Nested Loops Join

Consider the following query:

SELECT Rel1.A, Rel1.B, Rel1.C

FROM Rel1, Rel2

WHERE Rel1.C = Rel2.C AND Rel2.D = 16;

and consider the query plan which calculates this as follows:

Form the equi-join of Rel1 and Rel2 by using a block-nested loops join with Rel1 as the outer relation, and then filter each tuple of the join to see if the value of D is 16; if so, output the values of A, B and C from that tuple of the join.

How many page I/Os are needed to compute this plan (assume that we have only the minimal space, say 2 pages worth, for buffering in memory)?


First, we compute the storage structure information:

A page has 4096 bytes, of which 3846 are available to store data records. Since each data record of Rel1 takes 24 bytes, we can fit 160 records in a page; the remaining 6 bytes in each page are wasted. If we fill each page as much as possible, the 100,000 records thus take 100000/160 = 625 pages. If, as is a reasonable assumption in a B+ tree, we have on average 75% of the records possible, that would be 120 records per page, and so 100,000/120=833 pages would be used to store the data records at the leaf level of the B+ tree index.

For Rel2, each data record takes 16 bytes (8 for C and 8 for D), so we can fit 240 data records in a page. If each page is as full as possible, there would be 50000/240=209 pages (rounding up); if we assume each leaf page is 75% full, we would have 180 data records in each page and 50000/180=278 leaf (record) pages.

Now we can think about the costs of the query processing: to do the block-nested loops join, we read each data page of the outer relation once, and we read each data page of the inner relation as many times as there are pages in the outer relation (because we scan the inner relation for every page of the outer one). Note that this processing does not use or examine the higher levels of the index; only the leaves which contain the data records are scanned. Thus, the block-nested loops join takes 833+833*278=232407 I/Os. The filtering and output then take place in memory, incurring no extra I/O cost.


The following shows how the B+ tree indexes are constructed, and can be skipped.

Each entry in the primary index of Rel1 takes 20 bytes (16 for the search key (A,B) and 4 for a rowid). Each page of the index has 4096-250 = 3846 bytes available, which can be used for 3846/20=192 entries (recall that we must round the fraction down, as we can’t put part of an entry into a page, and leave part for another page). If we assume that on average, pages are 75% full, then each page in the index will point to 144 pages on the next level down. Thus, the level above the leaves has 833/144, that is about 6 pages; then there is a root page which is one level above this.

Each entry in the primary index of Rel2 takes 12 bytes (8 for the search key C and 4 for a rowid), so each page of index has room for 3846/12=320 entries; if each index page is 75% full, then each page in the index will point to 240 pages on the next level down. Thus, we can expect 2 pages in the level above the leaves (to point to the 278 leaf pages), and a root page above that.


Exercise 2. Index-Nested Loops Join

Reconsider the query in the above Exercise. But, now consider a different query plan, where the join is processed as an index-nested loops join (using the primary index on Rel2.C), and then each tuple of the join is filtered to check the value of D. How does this affect the cost?


In this plan, we scan Rel1, reading each of its record pages once. For each record in Rel1, we do a lookup on Rel2 for a given value of C. Each lookup costs 3 I/Os. Thus, the cost of the join is 833+100000*3=300833 I/Os. Again, filtering and output are done in memory without further I/O cost.