CSE142 S223 A00 Assignment #1
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
CSE142 Assignment 1
0.1 Question 1
Consider the following C function
and its compileed x86 code
Please answer the following questions.
0.1.1 1.A.
If the variable size is set to 1,000,000, how many dynamic instructions will the function take when finishing?
size =1,000,000
Instructions before the loop (outside .L3): endbr64, testl %edc,%edx,jle .L2,leal - 1(%rdx),%ecx,xorl %eax,% Instructions inside the loop (.L3):movl(%rdi,%rax,4),%edx,addl %edx,(%rsi,%rax,4),movq %rax,% Instructions after the loop (.L2):ret
assembly language code:(ignore).cfi startproc,.c fi endproc
Total Instructions =6×1,000,000+6=6,000,006
0.1.2 1.B.
Assume: 1. the processor runs at 3 GHz 2. the movl instruction takes 5 cycles to execute 3. the addl instruction with memory access takes 6 cycles to execute 4. each branch instruction takes 3 cycles to execute 5. other instructions take 1 cycle to execute each
Please estimate the total execution time of vadd function when size is set to 1,000,000.
Hint: Execution Time =InstructionCount x Cycles Per Instruction x Cycle Time
Inner loop:
movl:1,000,000×5=5,000,000
addl:1,000,000×6=6,000,000
jne:1,000,000×3=3,000,000
other:1,000,000×3=3,000,000
outside loop:
jle:1×3=3
times
Execution
else:5×1=5
total:5,000,000+6,000,000+3,000,000+3,000,000+5+3=17,000,008
cycles:3
time:
0.1.3 1.C.
Continued from B. If the processor company decided to redesign the ISA and break up each addl and movl instructions into a memory instruction and an arithmetic instruction. The new memory instruction takes 4 cycles and the arithmetic instruction takes one cycle. If such change allows the clock rate to go up at 4GHz, please estimate the total execution time of vadd function when size is set to 1,000,000.
movl:4
addl:4
cycles +1 arithmetic instruction
cycles+1 arithmetic instruction
=1,000,000×(4+1)=5,000,000
=1,000,000×(4+1)=5,000,000
jne:1,000,000×3=3,000,000
other:1,000,000×3=3,000,000
Tota inLoop =5,000,000+5,000,000+3,000,000+3,000,000=16,000,000
outside loop:
jle:1×3=3
else:5×1=5
Total=16,000,000+5+3=16,000,008
0.1.4 1.D.
Continued from B and C. What's the speedup of C over B?
Speedup= New Execution Time / Old Execution Time = 0.005666669333s / 0.004000002s =
Hint: Execution Time =InstructionCount x Cycles Per Instruction x Cycle Time
Inner loop:
movl:1,000,000×5=5,000,000
addl:1,000,000×6=6,000,000
jne:1,000,000×3=3,000,000
other:1,000,000×3=3,000,000
outside loop:
jle:1×3=3
times
Execution
else:5×1=5
total:5,000,000+6,000,000+3,000,000+3,000,000+5+3=17,000,008
cycles:3
time:
0.1.3 1.C.
Continued from B. If the processor company decided to redesign the ISA and break up each addl and movl instructions into a memory instruction and an arithmetic instruction. The new memory instruction takes 4 cycles and the arithmetic instruction takes one cycle. If such change allows the clock rate to go up at 4GHz, please estimate the total execution time of vadd function when size is set to 1,000,000.
movl:4
addl:4
cycles +1 arithmetic instruction
cycles+1 arithmetic instruction
=1,000,000×(4+1)=5,000,000
=1,000,000×(4+1)=5,000,000
jne:1,000,000×3=3,000,000
other:1,000,000×3=3,000,000
Tota inLoop =5,000,000+5,000,000+3,000,000+3,000,000=16,000,000
outside loop:
jle:1×3=3
else:5×1=5
Total=16,000,000+5+3=16,000,008
0.1.4 1.D.
Continued from B and C. What's the speedup of C over B?
Speedup= New Execution Time / Old Execution Time = 0.005666669333s / 0.004000002s =
1.4166666249
Hint: Execution Time =InstructionCount x Cycles Per Instruction x Cycle Time
Inner loop:
movl:1,000,000×5=5,000,000
addl:1,000,000×6=6,000,000
jne:1,000,000×3=3,000,000
other:1,000,000×3=3,000,000
outside loop:
jle:1×3=3
times
Execution
else:5×1=5
total:5,000,000+6,000,000+3,000,000+3,000,000+5+3=17,000,008
cycles:3
time:
0.1.3 1.C.
Continued from B. If the processor company decided to redesign the ISA and break up each addl and movl instructions into a memory instruction and an arithmetic instruction. The new memory instruction takes 4 cycles and the arithmetic instruction takes one cycle. If such change allows the clock rate to go up at 4GHz, please estimate the total execution time of vadd function when size is set to 1,000,000.
movl:4
addl:4
cycles +1 arithmetic instruction
cycles+1 arithmetic instruction
=1,000,000×(4+1)=5,000,000
=1,000,000×(4+1)=5,000,000
jne:1,000,000×3=3,000,000
other:1,000,000×3=3,000,000
Tota inLoop =5,000,000+5,000,000+3,000,000+3,000,000=16,000,000
outside loop:
jle:1×3=3
else:5×1=5
Total=16,000,000+5+3=16,000,008
0.1.4 1.D.
Continued from B and C. What's the speedup of C over B?
Speedup= New Execution Time / Old Execution Time = 0.005666669333s / 0.004000002s =
1.4166666249
0.2 Question 2
When a program is adapted to run on multiple processors in a multiprocessor system, the execution time on each processor is comprised of computing time and the overhead time required for locked critical sections and/or to send data from one processor to another. Assume a program requires t =100 s of execution time on one processor. When run p processors, each processor requires t/p s, as well as an additional 4 s of overhead, irrespective of the number of processors. Compute the per-processor execution time for 2, 4,8, 16, 32, 64, and 128 processors. For each case, list the corresponding speedup relative to a single processor and the ratio between actual speedup versus ideal speedup (speedup if there was no overhead).
Please update the following table and show your work here
Hint:You may start by thinking and extending the following equation.
0.3 Question 3
Consider the following C function
and its x86 code
Please answer the following questions.
0.3.1 3.A.
Assume: 1. the processor runs at 3 GHz 2. the movss instruction takes 5 cycles to execute 3. the addss instruction with memory access takes 10 cycles to execute 4. each branch instruction takes 3 cycles to execute 5. other instructions take 1 cycle to execute each
Please estimate the total execution time of vadd function when size is set to 1,000,000.
vadd: LFB39: cfi startproc endbr64 # 1 cycle testl %edx,%edx # 1 cycle jle .L2 # 3 cycles leal- 1(%rdx),%ecx # 1 cycle xorl %eax,%eax # 1 cycle .p2align 4,,10 .p2align 3 .L3: movss (%rsi,%rax,4),%xmm0 # 5 cycles addss(%rdi,%rax,4),%xmm0# 10 cycles movq %rax,%rdx # 1 cycle movss %xmm0,(%rsi,%rax,4)# 5 cycles addq $1,%rax #1 cycle cmpq %rcx,%rdx#1 cycle jne .L3 #3 cycles .L2: ret #1 cycle .cfiendproc
Processors Real Ideal |
||
128 |
|
128 |
Hint:You may start by thinking and extending the following equation.
0.3 Question 3
Consider the following C function
and its x86 code
Please answer the following questions.
0.3.1 3.A.
Assume: 1. the processor runs at 3 GHz 2. the movss instruction takes 5 cycles to execute 3. the addss instruction with memory access takes 10 cycles to execute 4. each branch instruction takes 3 cycles to execute 5. other instructions take 1 cycle to execute each
Please estimate the total execution time of vadd function when size is set to 1,000,000.
vadd: LFB39: cfi startproc endbr64 # 1 cycle testl %edx,%edx # 1 cycle jle .L2 # 3 cycles leal- 1(%rdx),%ecx # 1 cycle xorl %eax,%eax # 1 cycle .p2align 4,,10 .p2align 3 .L3: movss (%rsi,%rax,4),%xmm0 # 5 cycles addss(%rdi,%rax,4),%xmm0# 10 cycles movq %rax,%rdx # 1 cycle movss %xmm0,(%rsi,%rax,4)# 5 cycles addq $1,%rax #1 cycle cmpq %rcx,%rdx#1 cycle jne .L3 #3 cycles .L2: ret #1 cycle .cfiendproc
IC=1,000,000 iterations(7 instructions per iteration)
CPI for each instruction type:
Total Cycles Per Iteration =1.43+1.43+0.429+0.429=3.718
Total Cycles =IC×Total Cycles Per Iteration =7×10⁶×3.718=26,026,000
conds
0.3.2 3.B.
Continued from 3.A, what's the giga-floating-point-operations per second (GFLOPS) for this func- tion?
[]:
IC=1,000,000 iterations(7 instructions per iteration)
CPI for each instruction type:
Total Cycles Per Iteration =1.43+1.43+0.429+0.429=3.718
Total Cycles =IC×Total Cycles Per Iteration =7×10⁶×3.718=26,026,000
conds
0.3.2 3.B.
Continued from 3.A, what's the giga-floating-point-operations per second (GFLOPS) for this func- tion?
[]:
2023-08-22
Computer architecture