CS250 Homework 2 (100 pts total)
Handed out: Feb. 7, 2020
Due: Feb. 14, 2020 midnight

Reminder: All written homework assignment must be submitted on GradeScope, following the instructions given in Lab 0.

For this homework, you need to consult the lecture slides and Chapter 12 of the textbook. If the descriptions of the same term differ between the slides and the textbook, please follow the slides.

1. (20 pts) Suppose the memory word address is 30-bit long and the cache size is 4K words for data. (We don’t count the tag portion of the cache when the cache size is presented.)

(1.1) If the cache is direct-mapped and has the block size is one word. How many bits should the block index have? How many bits should the word offset have? How many bits should the tag have? What is the total number of tag bits for this cache? If we define the ratio of total number of tag bits over total number of data bits as the tag-storage overhead, assuming each data word of size 64 bits, what is the tag-storage overhead?

(1.2) Repeat (1.1) under the assumption that the cache is 4-way set-associative, and the block size is still one word.

(1.3) Repeat (1.1) under the assumption that the cache is direct-mapped, but the block size is 4 words.

(1.4) Repeat (1.1) under the assumption that the cache is 4-way set-associative and the block size is 4 words.

2. (15 pts) Assume we have a 4-way set-associative cache with the block size of one word. The following diagram shows all its current tags, with the italicized (red) tag in each row being the newest entry for the corresponding set. We assume the FIFO replacement policy, with the newest entry starts from the leftmost position and rotates to the right, as illustrated in the lecture. tag1 data tag2 data tag3 data tag4 data

Please mark on each of the memory addresses in the following memory reference sequence whether it is a hit (marked H) or a miss (marked M):

Please re-draw the table to show all the tags in the cache immediately after the last memory reference in the sequence given above is completed.

3. (15 pts) Under the same assumptions in Problem 2, suppose the program issues two iterations of the given memory sequence and no other memory references. What is the hit ratio (i.e. the hit rate) of the program at this cache?

4. (15 pts) We re-do a problem from HW1 under a different assumption about the cache design. In the following doubly-nested loop, suppose none of the array elements are allocated to registers, but all scalar variables are in registers.

Suppose the cache is 4-way set-associative, with the FIFO replacement policy and the block size of one word. The total cache size (for data) to is 8K words. What is the highest possible cache miss ratio when the processor executes the following doubly-nested loop? What is the highest possible cache hit ratio when the processor executes the following loop? You must clearly explain how you derive your answer, taking into account of possibilities of data layout in practice. (We assume that initially the cache is cold.)

5. (5 pts) Suppose a 64-bit wide memory bus has a latency of 100 CPU cycles before transferring the data. Suppose the memory bus has a bandwidth of 2 gigabytes per second. Assuming the CPU clock frequency to be 2 GHz, how much total time (in the number of nanoseconds) does it take to fill a cache block of 8 words in the event of a cache miss? (We assume a single-level cache.) How many CPU cycles is this total time equal to? (It is worth noting that the time you derived will be the worst-
case cache miss cost, Cm.)
for (i=1; i for (t=0; t<256; t++) {
D[i] = (A[i]+B[i])/C[i];
A[i]= (E[i]-D[i])/3.0;

}

6. (15 pts) We now assume a 4-way set-associative cache that has the block size of 8 words and the FIFO replacement policy. The total cache size (for data) has 8K words. In the loop in Problem 4, suppose none of the array elements are allocated to registers, but all scalar variables are in registers. We assume the cache is initially cold. For simplicity, we assume that at each cache miss, the requested word is counted as a cache miss, but until the cache block containing that word is replaced, subsequent references to any word in that block is counted as a hit. What is the highest possible cache miss ratio when the processor executes the loop? What is the highest possible cache hit ratio when the processor executes the loop? You must   clearly explain how you derive your answer, taking into account of possibilities of data layout in practice.

7. (15 pts) We assume the same cache in Problem 6. In the following loop, suppose none of the struct fields are allocated to registers, but all scalar variables are in registers. What is the highest possible cache miss ratio when the processor executes the following loop? What is the highest possible cache hit ratio when the processor executes the following loop? You must clearly explain how you derive your answer, taking into account of possibilities of data layout in practice.

We assume that each field in the struct declared below has the size of one word.
struct record {
long f1, f2, f3, f4, f5, f6, f7, f8;
}
struct record my_record[100], your_record[100];
for (i=0; i<100; i++) {
my_record[i].f1 = your_record[i].f1;
}