闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

CSE 560 – Practice Problem Set 3 Solution

Three of these problems come from Hennesy & Patterson’s Computer Architecture: A Quantitative Approach, 3rd edition.

1. The decode pipeline stage is a poorly chosen name. Considering that it does more than decoding, what else does it do? What would you propose as an alternative name?

Yes, the decode stage is when we are decoding the meaning of the instruction, but it seems that the more important thing (i.e., more computationally difficult thing) is reading the register file in that stage.

To be sure, deeper pipelines regularly separate instruction decode from register file read, making those two things two different pipeline stages.

2. This exercise asks how well hardware can find and exploit instruction level parallelism (i.e., pipelining). Consider the following four RISC machine code fragments each containing two instructions:

i. addi r1

load r2, 7(r1)

ii. add r3

store r2, 7(r1)

iii. breq r1, place

store r1, 7(r1)

iv. store r3, 17(r10)

load r2, 12(r8)

(a) For each code fragment (i) to (iv) identify each dependence that exists or that may exist (a

fragment may have no dependencies).

(b) For each code fragment, indicate whether data forwarding is sufficient to resolve the

dependence or if stall cycles are required. Indicate the number of stall cycles.

Code Fragment	Dependence (a)	Resolution (b)
i	True dependence on r1	Data forwarding sufficient
ii	No dependence	None required
iii	No data dependence (but control dependence)	Stall cycles to resolve branch depends upon microarchitecture
iv	Potential dependence on memory	Will execute in order in M stage

3. Consider the following RISC assembly code.

(1) load r1,45(r2)

(2) add r7

(3) sub r8

(4) or r9

(5) brneq r7, target

(6) add r10

(7) xor r2

(a) Identify each dependence (both data and control); list the two instructions involved; identify

which instruction is dependent; and, if there is one, name the storage location involved (register or memory) (e.g. register r1 or memory address 4(r1)).

• Inst (2) is dependent on inst (1) through r1

• Inst (3) is dependent on inst (1) through r1

• Inst (4) is dependent on inst (1) through r1

• Inst (5) is dependent on inst (2) through r7

• Inst (6) is dependent on inst (3) through r8

• Inst (6) is dependent on inst (5) through control

• Inst (7) is dependent on inst (5) through control

(b) Using the 5-stage pipeline from class, which of the dependences that you found in part (a)

become hazards and which do not.

The first dependency, (2) dependent on (1), is a load-use dependency that will necessitate a one clock cycle delay so that the results of the load (available after stage M) are available to the execution of the add (in stage X). None of the rest are hazards.

	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16
(1)	F	D	X	M	W
(2)		F	D	d*	X	M	W
(3)			F	p*	D	X	M	W
(4)					F	D	X	M	W
(5)						F	D	X	M	W
(6)							F	D	s*
(7)								F	s*

In the above diagram, the branch is assumed to be predicted NT and was taken. This shows an example with the insertion of stall cycles.

4. Increasing the size of a branch prediction buffer means that it is less likely that two branches in a program will share the same predictor. A single predictor predicting a single branch instruction is generally more accurate than is that same predictor serving more than one branch instruction. (a) List a sequence of branch taken and not taken actions to show a simple example of 1-bit

predictor sharing that reduces misprediction rate.

Consider two branches, B1 and B2, that are executed alternately. During the execution of the program, B1 and B2 each alternate taken/not taken. If they each had a 1-bit predictor, each would always be mispredicted.

In the table below, columns labeled P show the value of a 1-bit predictor shared by B1 and B2. Columns labeled B1 and B2 show the actions of the branches (each alternating taken/not taken). Time increases to the right. T stands for taken, NT for not taken. The predictor is initialized to NT.

2022-10-17

Java

物理(Physical)

LINUX

C++

Python

Processing

sas

ios

maths

maple

C语言