sections in this module City College of San Francisco - CS270
Computer Architecture

Module: Background
module list

Sample Problems

In this section we will go through a few sample problems that are similar to some of the book problems in Chapter 1.

Problem 1:

Given a display with dimensions 1200x1600, how much RAM must be dedicated to hold a framebuffer for 24-bit color (8bits/pixel for each of red, green and blue)?

Problem 2:

Given two processors CPU1 and CPU2 with three instruction classes, their clock speeds and their CPIs as follows

Processor
CPI Class1
CPI Class2
CPI Class3
Clock
CPU1
2
1.5
2.5
2GHz
CPU2
1.5
2
3
1.5GHz

and program P with the following instruction counts for each class

Class 1
Class 2
Class 3
1600
1000
400

Make the following calculations

  1. Calculate the global CPI for each CPU for the program
  2. Calculate the time required to run the program for each CPU
  3. What is the peak performance of each CPU expressed in instructions/second

Problem 3:

A recent Intel chip Ivy Bridge HE4 (quad-core) is reported to have a die size of 160mm^2. It is interesting because it is the first microprocessor to be made at a new 22nm technology on larger wafers of 450mm diameter. Unfortunately, they have been having problems with the yield. If the defect density is .9/cm^2 (there is no reported defect density - the only report is that it is less than 1 per cm^2),

  1. What is the yield?
  2. What is the cost of a working die if the wafer costs $1000?

Problem 4:

Assume that a particular program executing on a single CPU takes 10 sec of CPU time. If this program is 70% parallelizable and doing so adds 1 sec of non-parallelizable execution time per additional CPU, what is the best execution time on two CPUs? on four CPUs?

Answers

Problem 1:

1200*1600*3=5760000 or 5.76MB (where MB is 10^6 bytes)

Problem 2:

It is easier to work on subproblem B first, then derive A:

Time (sec) = (#Instrs1 * CPI1 + #Instrs2 * CPI2 + #Instrs3 *CPI3) (cycles) / Clock speed (cycles/sec)

for CPU1:

(2*1600+1.5*1000+2.5*400)/2*10^9   =  2850*10^-9

For CPU2:

(1.5*1600+2*1000+3*400)/1.5*10^9 = 3733.3*10^-9

The total number of instructions is 3000, so the overall CPI for each CPU for this program are

CPU1: (2*1600+1.5*1000+2.5*400)/3000 = 1.9

CPU2: (1.5*1600+2*1000+3*400)/3000 = 1.866

So, for this program, the CPI is nearly the same. It is just the clock rates that make the difference in performance.

The peak performance is the performance of the CPU executing any instruction sequence. For CPU1 this is Class 2 instructions, for which the CPI is 1.5. Using this to derive the IPS we get

2 GHz (cycles/sec) / 1.5 cycles/instruction = 2/1.5 = 1.333 * 10^9 instructions/sec or 1333 MIPS

For CPU2 this is Class 1 instructions for which the CPI is also 1.5. However, this CPU runs at 1.5 GHz, so its peak performance is only 1000 MIPS.

Problem 3:

First, let's get the wafer area: pi*(450/2)^2 = 3.14 * 50625 = 158962.5mm^2. Disregarding edge problems, this means a single wafer will contain 158962.5/160, or about 1000 dies. Let's use 1000 for simplicity.

The yield is given by

  1/(1 + (defect per area * die area)/2)2

converting our die area from mm^2 to cm^2 so the units match we have

1/ (1 + (.9 * 1.6)/2)^2   =  1/1.72^2 = 1/2.9584 = .338

This yields 338 working chips per wafer. If the wafer cost $1000, this is a cost just under $3/working die.

Problem 4:

Since 30% of the program is not parallelizable, 3 seconds of execution time must be sequential. This must be added to the parallelized portion. Further, adding each additional CPU adds 1 second of sequential time. Thus,

2 CPUs - 4 seconds of sequential time + 7secs executing in parallel on 2 CPUs, or 7.5 seconds.

4 CPUs - 6 seconds of sequential time + 7 secs executing in parallel on 4 CPUs, or 7.75 seconds!

In this case, in fact, 2 CPUs is the optimal number! Although this may be an extreme case, you can see how important overhead is in setting up multiple CPUs!

Even though the additional overhead added is made up, the 70% factor for parallelization is pretty accurate, and is considered to be a very good level of parallelization. You can see that throwing more cores at the problem loses effectiveness quite quickly.

Prev This page was made entirely with free software on linux:  
Kompozer, the Mozilla Project
and Openoffice.org    
Next

Copyright 2014 Greg Boyd - All Rights Reserved.