Translating Code to Assembler

City College of San Francisco - CS270
Computer Architecture
Module: Simple Machine

[ the answer to the problem at the end of the last section is ARRAY[I] = VAL ]

Assembly code is organized (and written) very differently from high-level code. There are no programming constructs like we are used to in high-level languages, and the higher the level of the language, the further assembly code is from it. We will not discuss the translation of any object-oriented code - only the lower high-level language, C, which is fairly straightforward to translate to assembler. Once we become familiar with such things as pointers, procedure calls, and exceptions, and how they are implemented, the reader can begin to generalize to higher-level code. (Many of the "tricks" to get high-level code (like C++) to the level of C are performed by the compiler in any case. Of course, since Java usually relies on a virtual machine interpreter at run-time, it is a different story.)

The most important technique to learn is how to translate control constructs (if-statements, loops, switch statements) to a form that is easy to translate into assembler. The problem is this

in higher-level code	in assembler
if (some-condition) something;	if (some-condition) goto someplace;

The assembler form of this equation is implemented by a branch instruction, such as the JEQ instruction in our SM. Let's set aside for a moment the question of whether some-condition can be directly implemented in assembler, and just concentrate on the construct. Here is the straightforward way to do our transformation

in higher-level code	in assembler
if (some-condition) something	if (!some-condition) goto x; something; x:

Many of you are not used to a goto statement or a label in C (or C++) code, or maybe you didn't even know they existed. (Maybe I'll get in trouble with your C++ instructor(s) for telling you about them, as they are horribly unstructured.) But that is the heart of the matter: Assembly code is not structured! It has no structure. It is simply a sequence of code with branch statements and labels - very ugly to write and to follow, but not so difficult to write if you do a translation such as that above.

Let's look at a slightly more complex situation:

in higher-level code	in assembler
if (some-condition) something else somethingelse;	if (some-condition) goto dosomething; somethingelse goto somethingdone; dosomething: something somethingdone:

another way to do this is

in higher-level code	in assembler
if (some-condition) something else somethingelse;	if (!some-condition) goto dosomethingelse; something; goto somethingdone; dosomethingelse: somethingelse somethingdone:

Let's do a real example

in higher-level code	in assembler
if (N<0) RESULT=1 else RESULT=-1;	if (N<0) goto Nltz; RESULT=-1; goto Ngez; Nltz: RESULT=1; Ngez:

Of course, depending on the situation, you can start optimizing this code:

in higher-level code	in assembler
RESULT=-1; if (N<0) RESULT=1;	RESULT=1; if (N<0) goto Nltz; RESULT=-1; Nltz:

Believe it or not, this is the kind of transformation a compiler makes, and it is much uglier when we look at the actual assembly code, as we will see in the next section.

Loops

If if-statements are ugly, loops are hideous. However, when we discuss loops we also have to discuss things such as arrays, which we will get to in a second. First, let's see what kind of transformation we can do to a simple loop, which sums the integers from 0 to (N-1):

in higher-level code	in assembler
for (SUM=0,i=0;i<N;i++) SUM += i;	SUM=0; i=0; loop: if (i>=N) goto loopdone; SUM = SUM + i; i++; goto loop; loopdone:

Well, that wasn't so bad. But this is a silly loop. Real loops use arrays. Suppose our loop was

int myarray[N];
for (SUM=0,i=0;i<=N;i++) SUM += myarray[i];

How do we translate myarray[i] ? Well, in assembly code, an array is simply a label with some space (either initialized or uninitialized) attached. The label is the base of the array, and it is an address, in our case &myarray[0], or in C code, just myarray.

in higher-level code	in assembler
element = myarray[i]	temp=myarray; temp += i; element = *temp;

Finally, let's look at our new loop:

in higher-level code	in assembler
int myarray[N]; for (SUM=0,i=0;i<N;i++) SUM += myarray[i];	SUM=0; i=0; loop: if (i>=N) goto loopdone; temp=myarray; temp+=i; SUM = SUM + *temp; i++; goto loop; loopdone:

This gets uglier still if our loop is a bit more complicated, with multiple arrays:

int myarray[N], otherarray[N+1];
for (i=0;i<N;i++)
otherarray[i+1] = myarray[i];

Now is where optimization gets interesting. Before optimizing compilers, programmers were discouraged from writing code with array references in them. Instead, they were encouraged to write code using pointers. Suppose N is 1000. Drawing on the information we have been learning from instruction timings, how many statements are executed in the last loop in the "assembler" form. (I believe the answer is 6 per iteration * 1000 + 3 = 6003). But there is a lot of code in the loop that is unnecessary. Let's look at how the loop could be rewritten (and how programmers used to be encouraged to write):

int myarray[N]; int *temp, *arrayend;
for (SUM=0, temp=myarray, arrayend=&myarray[N] ; temp < arrayend ;temp++)
SUM += *temp;

Rewriting this for our "assembler" version we get

SUM=0;
temp=myarray;
arrayend=&myarray[N];
loop: if (temp >= arrayend)
goto loopdone;
SUM += *temp;
temp++;
goto loop;
loopdone:

We have reduced this code to 4*1000 + 4 instructions. And this gets more significant if a second array is added.

Optimizing compilers have made this kind of coding moot. In fact, hand optimizations often get in the way of the compiler rather than helping it today, as compilers expect the straightforward kind of coding. It is only special situations where hand-optimizations can produce better code than an optimizing compiler today. However, since we are doing the translation to assembler by-hand, transformations like this often make the final translation of the permuted C-like code to assembler easier!

This page was made entirely with free software on linux:
the Mozilla Project and Openoffice.org