Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

CST308 COMPUTER ARCHITECTURE

Problem Solving Exercise 1

1Describe the key concepts of von Neumann architecture.

The key characteristics of a von Neumann architecture are:

•    It has four major subsystems. These are: Control Unit, Arithmetic Logic Unit (ALU), Memory, Input/Output.

•    Stored Program Concept: Data and instructions are stored in a single readwrite memory

•    Sequential instruction processing: Execution occurs in a sequential fashion from one instruction to another.

2          What is a stored program computer?

In a stored program computer, programs are represented in a form suitable for storing in memory alongside the data.

The computer gets its instructions by reading them from memory, and a program can be set or altered by setting the values of a portion of memory.


3          What are the four main components of any general-purpose computer? See Answer of question 1!


4 Availability is the most important consideration for designing servers, followed closely by scalability and throughput.


a) We have a single processor with a failure in time (FIT) of 100. What is the mean time to failure (MTTF) for this system?

Failures in time (FIT) is traditionally reported as failure per billion (1 × 109) hours of operation.

1 Billion = 109

Failure rate (in hours) = 100 / (109)

MTTF = 1/ (100/109) = 109/100 = 107 hours


b)  If it takes 1 day to get the system running again, what is the availability of the system? 1 day = 24 hours

Availability ofthe system = MTTF/ (MTTF + MTTR) = 107/ (107+ 24) = 0.999 ~ 1 (MTTR - Mean Time to Repair)

c)   Imagine that the government, to cut costs, is going to build a supercomputer out of inexpensive computers rather than expensive, reliable computers.

What is the MTTF for a system with 1000 processors (of the same processor stated in part (a))?

FIT1000 processors  = # of processors × FIT per processor =  1000  ×  100 =  105 MTTF1000 processors = 1/(105/109) = 109/105 = 104 hours

5          Assume a disk subsystem with the following components and MTTF:


•    10 disks, each rated at 1,000,000-hour MTTF

•    1 ATA controller, 500,000-hour MTTF

•    1 power supply, 200,000-hour MTTF

•    1 fan, 200,000-hour MTTF

•    1 ATA cable, 1,000,000-hour MTTF


Using the simplifying assumptions that the lifetimes are exponentially distributed and that failures are independent, compute the failure rate and MTTF of the system as a whole.

The sum of the failure rates is:


FIT

or 23,000 failure in 1 billion hours or 23,000 FIT

The MTTF for the system is just the inverse of the failure rate:

or just under 5 years.


6          Let’s redo the reliability example from the question 5 above after improving the reliability   of   the   power   supply   via   redundancy   from   200,000-hour   to 830,000,000hour MTTF, or 4150 times better. What is the reliability improvement?

There are two ways to do this.

First way: The calculation of the failure rates of the new subsystem is:

Failure ratenew system :


FITnew system = 18,001.2 FIT

MTTFnew system

The reliability improvement would be:


Despite an impressive 4150 times improvement in reliability of one module, from the system’s perspective, the change has a measurable but small benefit of about 28%.


Second way: Using Amdahl’s law (see Q17).


Server farms such as Google and Yahoo! provide enough compute capacity for the highest request rate of the day. Imagine that most of the time these servers operate at only 60% capacity. Assume further that the power does not scale linearly with the load; that is, when the servers are operating at 60% capacity, they consume 90% of maximum power. Also, assume that the servers are operating at 100% capacity,

consume 100% ofmaximum power. The servers could be turned off, but they would take too long to restart in response to more load.

a)  How much power savings would be achieved by turning off 40% ofthe servers? Assume the rest 60% have to operate at 100% capacity to process the requests.

Assume the maximum power consumption is P, and the amount of servers be N.

In the old  system (without turning off servers), the power consumption is 0.9NP.

In the new system, the power consumption is 0.6N1P = 0.6NP Power saving = (0.9NP – 0.6NP) / 0.9NP = 1/3 ≈ 33.3%

b)  A new system has been proposed that allows for a quick restart but requires 20% of the maximum power while in this barely alive” state. How much power savings would be achieved by placing 40% of the servers in the barely alive” state (still consume 20% of P)? Assume the rest  60% still operate at  60% capacity as the barely alive” servers can quickly restart.

In the new system, the power consumption is 0.6N0.9P + 0.4N0.2P = 0.62NP

Power saving = (0.9NP –0.62NP) / 0.9NP    0.311= 31. 1%


c)  How much power savings would be achieved by placing 20% of the servers in the barely alive” state and 20% off? Assume the rest 60% have to operate at 80% capacity and consume 97% of maximum power.

In the new system, the power consumption is 0.2N0.2P + 0.6N0.97P = 0.622NP

Power saving = (0.9NP –0.622NP) / 0.9NP    0.309 = 30.9%


Consider the execution of a program which results in the execution of 2 million instructions on a 400-MHz processor. The program consists of four major types of instructions. The instruction mix and the CPI for each instruction type are given below based on the result of a program trace experiment:


Compute the average CPI when the program is executed on a uniprocessor with the above trace results. Compute the CPU time to execute the program.


The average CPI when the program is executed on a uniprocessor with the above trace results is:

CPI = 0.6 + (2 × 0. 18) + (4 × 0. 12) + (8 × 0. 1) = 2.24.



Therefore,  CPU  time  =  1/(400  x  106)  ×  2.24  ×  2×106   =  0.0112  sec  =  11.2 milliseconds = 11.2 x 10-3  sec


Suppose we have two implementations of the same instruction set architecture. Computer A has a clock cycle time of 250 ps and a CPI of 2.0 for some program, and computer B has a clock cycle time of 500 ps and a CPI of 1.2 for the same program. Which computer is faster for this program and by how much?

We know that each computer executes the same number of instructions for the program; let’s call this number I.

First, find the number of processor clock cycles for each computer:


CPU clock cyclesA = × 2 CPU

clock cyclesB = × 1.2

Now we can compute the CPU time for each computer:


CPU timeA =  CPU clock cyclesA × Clock cycle time = I ×2 ×250 ps = I ×500 ps Similarly, CPU time of B can be calculated as:

CPU timeB =  CPU clock cyclesB × Clock cycle time = I ×1.2 ×500 ps = I ×600 ps Clearly, computer A is faster. The amount faster is given by the ratio of the         execution times:


We can conclude that computer A is  1.2 times as fast as computer B for this program.



10 A compiler designer is trying to decide between two code sequences for a particular

computer. The hardware designers have supplied the following facts:

For a particular high-level language statement, the compiler writer is considering

two code sequences that require the following instruction counts:

Which code sequence executes the most instructions? Which will be faster? What is the CPI for each sequence?

Sequence 1 executes 2 + 1 + 2 = 5 instructions. Sequence 2 executes 4 + 1 + 1 = 6 instructions. Therefore, sequence 1 executes fewer instructions. We can use the    equation for CPU clock cycles based on instruction count and CPI to find the total number of clock cycles for each sequence:

This yields

CPU clock cycles1 = (2 × 1) + (1 × 2) + (2 × 3) = 2 + 2 + 6 = 10 cycles CPU clock cycles2 = (4 × 1) + (1 × 2) + (1 × 3) = 4 + 2 + 3 = 9 cycles

So code sequence 2 is faster, even though it executes one extra instruction. Since code sequence 2 takes fewer overall clock cycles but has more instructions, it must have a lower CPI. The CPI values can be computed by:




11        We wish to compare the performance of two different machines: M1 and M2. The following measurements have been made on these machines:



Programs

Computer M1 (sec)

Computer M2 (sec)

P1

10

5

P2

3

4


a)  Which machine is faster for each program and how much?





b)  If the  following  additional  measurements  were  made,  find  the  instruction

execution rate (instruction per second) for each machine when running P1.


Program

Instructions executed on M1

Instructions executed on M2

P1

200 × 106

160 × 106


c)   If the clock rates of the machines M1 and M2 are 200 MHz and 300 MHz, respectively, find the clock cycles per instruction (CPI) for M1 and M2.


(a) For program P1, M2 is (10/5) = 2.0 times faster than M1.

For program P2, M1 is (4/3) = 1.33 times faster than M2.

(b) Since we know the number of instructions executed and the time it took to execute the instructions; we can  easily calculate the number  of instructions per  second while running program P1 as:

Instructions per second for M1: (200x106)/10s = 20x106

Instructions per second for M2: (160x106)/5s = 32x106

(c) CPI can be calculated as:


For M1,

For M2,


12        Assume that multiply instructions take 12 cycles (CPI) and account for 10% of the instructions in a typical program and that the other 90% of the instructions require an average of 4 cycles for each instruction (CPI). What percentage of time does the CPU spend doing multiplication?



Another, alternative way to look into this:

Assume 100 instructions, then the number of cycles will be 90 × 4 + 10 × 12 = 480 cycles. Among these, 120 cycles are spent doing multiplication, and thus 25% of the time is spent doing multiplication.


13        Suppose a program (or a program task) takes 1 billion instructions to execute on a processor running at 2 GHz. Suppose also that 50% of the instructions execute in 3 clock cycles, 30% execute in 4 clock cycles, and 20% execute in 5 clock cycles. What is the execution time for the program or task?


We have the instruction count: 109 instructions.


The clock cycle time can be computed quickly from the clock rate as:

1⁄(2 × 109) =  0.5 × 10 −9seconds.


Execution time = Instruction Count × CPIoverall  × Clock cycle time =  109 × (0.5 × 3 + 0.3 × 4 + 0.2 × 5) × 0.5×10-9  sec = 1.85 sec.


14        Suppose the processor in the previous question is redesigned so that all instructions that initially executed in 5 cycles now execute in 4 cycles. Due to changes in the circuitry, the clock rate has to be decreased from 2.0 GHz to 1.9 GHz. No changes are made to the instruction set. What is the overall percentage improvement?




15        A  common  transformation  required  in  graphics  processors  is  square  root.    Implementations   of  floating-point   (FP)   square   root   vary   significantly   in    performance,  especially among processors  designed  for  graphics.  Suppose  FP    square root (FPSQR) is responsible for 20% of the execution time of a critical    graphics benchmark. One proposal is to enhance the FPSQR hardware and speed       up this operation by a factor of 10. The other alternative is just to try to make all FP  instructions in the graphics processor run faster by a factor of 1.6; FP instructions are responsible for half of the execution time for the application. The design team           believes that they can make all FP instructions run 1.6 times faster with the same       effort as required for the fast square root. Compare these two design alternatives.

SpeedupFPSQR

SpeedupFP


16        Complete the following table. Show your working towards the final answer.


What is the overall speedup if you make  10% of a program  90 times faster?

Speedupoverall

What is the overall speedup if you make  90% of a program  10 times faster?

Speedupoverall




The new CPU is  20 times faster on search queries than the old processor. The old processor is busy with search queries 70% of the time, what is the speedup gained by integrating  the enhanced CPU?

Speedupoverall

We are considering an enhancement to the processor ofa server. The new CPU 10X faster. It is an I/O bound server, so 60% time waiting for I/O.

Speedupoverall


17        Solve Question 6 using Amdahl’s law.


The calculation of the failure rates of the original disk subsystem (in question 5 of tut 1) was:

1                     1                   1                   1                    1

Therefore, the fraction ofthe failure rate that could be improved from power supply improvement is 5 per million hours out of 23 for the whole system, or about 0.22.


The reliability improvement would be (Amdahl’s Law):

Improvement power supply pair

Despite an impressive 4150 times improvement in reliability of one module, from the system’s perspective, the change has a measurable but small benefit of about 28%.

18        A machine needs to be enhanced, and there are two possible improvements: either (a) makes multiply instructions run 4 times faster than before, or (b) make memory access instructions run 2 times faster than before. A program is repeatedly run and found  that  it  takes  100  seconds  to  execute.  Of this  time,  20%  is  used  for multiplication, 50% for memory access instructions, and 30% for other tasks.


a)  What will be the speedup if the multiplication is improved only?

b)  What will be the speedup if the memory access is improved only?

c)  What will be the speedup if both improvements are made?


a. Speedup if we improve only multiplication:

Speedup multiplication Speedup multiplication