闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Suppose there are 3 cache designs

Design 1:

Direct-mapped cache.

Each cache block is 1 byte.

Cache has 10-bits for index and 6-bits for tag field.

Design 2:

2-way set associative cache.

Each cache block is 1 word (4 bytes).

Cache has 7-bits for index and 7-bits for tag field.

Design 3:

fully associative cache with 256 cache blocks.

Each cache block is 1 word.

Cache has a 14-bit tag.

a. What is the size of each cache? The size here comprises only the cached data (excluding tag bits or valid/dirty bits)

b. How many total number of bits (or Kibits) does each cache need to store tags?

c. Which cache design has the maximum number of conflict misses? Which has the least? Make sure to upload readable image of your work!

Design1 :

Index=10bits. Therefore, no of blocks=2^10=1Ki. Cache size= 1Ki * 1 B = 1KiB.

Design2:

Index =7bits. No. of blocks= 2^7 * 2 = 256 (*2 because of the 2 way SA cache). Cache size = 256 * 4B = 1KiB Design3:

No. of blocks=256. Cache size =256 * 4 B = 1KiB

Design 1 :

6-bits for tag field. Total tag space= 6 * no. of blocks= 6* 1Ki= 6Kibits

Design 2:

7-bits for tag field. Total tag space= 7 * no. of blocks = 7 * 256= 1792 bits = (1782/1024) Kibits=1.75 Kibits

Design 3:

Cache has a 14-bit tag. Total tag space = 14 * no. of blocks=14* 256= 3.5 Kibits

A direct mapped cache will have the maximum number of conflict misses, therefore it will be Design 1. Design 3 (Fully associative) will have 0 conflict misses This question(c) won’t feature in F2022 fnal exam

This question examines the accuracy of 1-bit dynamic branch predictors for the following repeating patterns (e.g., in a loop) of branch outcomes. Answer each question (1-4) for each sequence (a-b):

(a) T, T, T, T, NT

(b) T, NT, NT, NT, NT

1. What is the accuracy (“score card” in lec 15-16) of a 1-bit dynamic predictor for these sequences of branch outcomes (a and b) provided we encounter this branch outcome sequence only once? Assume this predictor starts in the "Predict not taken" state for both a and b.

Make sure to upload readable image of your work! You will need to show the prediction behavior using the following table format:

Branch Prediction	Actual Branch behavior	Prediction Accuracy

(a)

Branch Prediction	Actual Branch behavior	Prediction Accuracy
NT	T	❌

T	T	✔
T	T	✔
T	T	✔
T	NT	❌

Score card= 3/5

(b)

Branch Prediction	Actual Branch behavior	Prediction Accuracy
NT	T	❌
T	NT	❌
NT	NT	✔
NT	NT	✔
NT	NT	✔

Score card=3/5

This question examines the accuracy of 2-bit dynamic branch predictors for the following repeating patterns (e.g., in a loop) of branch outcomes. Answer each question (1-4) for the following sequence

T, T, T, T, NT

1. What is the accuracy (“score card” in lec 15-16) of the 2-bit predictor for this sequence of branch outcomes provided we encounter this branch outcome sequence only once, assuming that the predictor starts off in the “Strongly Predict Not Taken” state (SNT)?

2. What is the accuracy (“score card” in lec 15-16) of a 2-bit predictor if the pattern is repeated forever? Assume this predictor starts in the “Strongly Predict Not Taken” state (SNT).

Branch Prediction	Actual Branch behavior	Prediction Accuracy

Branch Prediction	Actual Branch behavior	Prediction Accuracy
SNT	T	❌
WNT	T	❌
WT	T	✔
ST	T	✔
ST	NT	❌

Score card= 2/5

1st pass already covered.

2nd pass:

Branch Prediction	Actual Branch behavior	Prediction Accuracy
WNT	T	❌
WT	T	✔
ST	T	✔
ST	T	✔
ST	NT	❌

Score card= 3/5

3rd pass:

Branch Prediction	Actual Branch behavior	Prediction Accuracy
WT	T	✔
ST	T	✔
ST	T	✔
ST	T	✔

❌

Score card= 4/5

Henceforth the pattern repeats. Therefore, for n passes, the score card is:

(2 +3+ (n-2)*4)/ (5*n) .In infinity limit, this becomes 4/5

Given the following processor configuration of P1.

Assume base CPI=1 without memory stalls; assume L1 hit occurs within base CPI. L1 access time=1 cycle (accounted for within base CPI), L1 miss rate=10%. L2 access time=10 cycles, L2 miss rate=40%. L3 access time=50 cycles, L3 miss rate=80%, Main memory access time = 100 cycles.

80% of instructions access L1 data cache.

1. Find the AMAT of P1

2. Find the overall CPI of P1

Make sure to upload readable image of your work!

1. AMAT = 1 + 0.1*10 + 0.1*0.4*50 + 0.1*0.4*0.8*100 = 7.2 cycles

2. Data cache access cycles d= 1 + 0.8*(0.1*10 + 0.1*0.4*50 + 0.1*0.4*0.8*100 )= 5.96 cycles CPI = Additional Cycles spent in accessing instruction + d

= 0.1*10 + 0.1*0.4*50 + 0.1*0.4*0.8*100 +d

= 12.16 cycles

A 4-processor systems implements cache coherence with a snoopy MSI protocol. For each access in the sequence below (to the same BLOCK address), list the coherence states (M/S/I) for each processor’s cache after the access. Assume all the cache states begin in the Invalid (I) state.

Event	P1 state	P2 state	P3 state	P4 state
	I	I	I	I
P1 reads
P1 writes
P2 reads
P4 reads

You may only upload an image of the completed table shown above. You do not need to show any other accompanying work.