sections in this module | City
College of San Francisco - CS270 Computer Architecture Module: Background |
module list |
Sample
Problems
In this section we will go through a few sample problems that are similar to some of the book problems in Chapter 1.
Problem 1:
Given a display with dimensions 1200x1600, how much RAM must be dedicated to hold a framebuffer for 24-bit color (8bits/pixel for each of red, green and blue)?
Problem 2:
Given two processors CPU1 and CPU2 with three instruction classes, their clock speeds and their CPIs as follows
Processor |
CPI
Class1 |
CPI
Class2 |
CPI
Class3 |
Clock |
CPU1 |
2 |
1.5 |
2.5 |
2GHz |
CPU2 |
1.5 |
2 |
3 |
1.5GHz |
and program P with the following instruction counts for each class
Class 1 |
Class 2 |
Class 3 |
1600 |
1000 |
400 |
Make the following calculations
Problem 3:
A recent Intel chip Ivy Bridge HE4 (quad-core) is reported to
have a die size of 160mm^2. It is interesting because it is the
first microprocessor to be made at a new 22nm technology on larger
wafers of 450mm diameter. Unfortunately, they have been having
problems with the yield. If the defect density is .9/cm^2 (there
is no reported defect density - the only report is that it is less
than 1 per cm^2),
Problem 4:
Assume that a particular program executing on a single CPU takes 10
sec of CPU time. If this program is 70% parallelizable and doing so
adds 1 sec of non-parallelizable execution time per additional CPU,
what is the best execution time on two CPUs? on four CPUs?
Answers
Problem 1:
1200*1600*3=5760000 or 5.76MB (where MB is 10^6 bytes)
Problem 2:
It is easier to work on subproblem B first, then derive A:
Time (sec) = (#Instrs1 * CPI1 + #Instrs2 * CPI2 + #Instrs3 *CPI3) (cycles) / Clock speed (cycles/sec)
for CPU1:
(2*1600+1.5*1000+2.5*400)/2*10^9 = 2850*10^-9
For CPU2:
(1.5*1600+2*1000+3*400)/1.5*10^9 = 3733.3*10^-9
The total number of instructions is 3000, so the overall CPI for each CPU for this program are
CPU1: (2*1600+1.5*1000+2.5*400)/3000 = 1.9
CPU2: (1.5*1600+2*1000+3*400)/3000 = 1.866
So, for this program, the CPI is nearly the same. It is just the clock rates that make the difference in performance.
The peak performance is the performance of the CPU executing any instruction sequence. For CPU1 this is Class 2 instructions, for which the CPI is 1.5. Using this to derive the IPS we get
2 GHz (cycles/sec) / 1.5 cycles/instruction = 2/1.5 = 1.333 * 10^9 instructions/sec or 1333 MIPS
For CPU2 this is Class 1 instructions for which the CPI is also
1.5. However, this CPU runs at 1.5 GHz, so its peak performance is
only 1000 MIPS.
Problem 3:
First, let's get the wafer area: pi*(450/2)^2 = 3.14 * 50625 = 158962.5mm^2. Disregarding edge problems, this means a single wafer will contain 158962.5/160, or about 1000 dies. Let's use 1000 for simplicity.
The yield is given by
1/(1 + (defect per area * die area)/2)2
converting our die area from mm^2 to cm^2 so the units match we have
1/ (1 + (.9 * 1.6)/2)^2 = 1/1.72^2 = 1/2.9584 = .338
This yields 338 working chips per wafer. If the wafer cost $1000, this is a cost just under $3/working die.
Problem 4:
Since 30% of the program is not parallelizable, 3 seconds of execution time must be sequential. This must be added to the parallelized portion. Further, adding each additional CPU adds 1 second of sequential time. Thus,
2 CPUs - 4 seconds of sequential time + 7secs executing in parallel on 2 CPUs, or 7.5 seconds.
4 CPUs - 6 seconds of sequential time + 7 secs executing in parallel on 4 CPUs, or 7.75 seconds!
In this case, in fact, 2 CPUs is the optimal number! Although this
may be an extreme case, you can see how important overhead is in
setting up multiple CPUs!
Even though the additional overhead added is made up, the 70% factor
for parallelization is pretty accurate, and is considered to be a very
good level of parallelization. You can see that throwing more cores at
the problem loses effectiveness quite quickly.
Prev | This page was made entirely
with free software on linux: Kompozer, the Mozilla Project and Openoffice.org |
Next |