sections in this module | City
College of San Francisco - CS270 Computer Architecture Module: MIPS-IV (Procedures) |
module list |
Procedures
In the last topic we covered writing leaf functions with arguments. Although that is a good first step, there is a lot more to the procedure calling convention. We will cover the rest of it in this section.
[ Sample files for this topic are in the directory online/mipsIV beneath our public work area on hills. ]
The procedure call
Calling a procedure is a special kind of branch. The difference is
that this branch must be returned from. After the call, you want the
callee to return to the next instruction after the call. That's what
our special instructions do - the address to return to is recorded in a
register when the call is made. The callee then can use the register to
return.
This basic procedure call, where one procedure (the caller) branches to another procedure (the callee) is implemented using one of two special MIPS instructions
Both jal and jalr use the link
register (hence the al
in their names means and link)
to record the return address. The link register is also called the
return address register and is named $ra. Some students affectionately refer to it a rah.
Of course you may see an immediate problem. The procedure main is called using the jal instruction. If it calls another procedure using the jal instruction the $ra register is overwritten. This means the address main should return to is lost!
A similar situation occurs in each called procedure (the callee). Such state information must be
saved at procedure call and restored upon return. This information
must be saved in a unique area
for each procedure. This should be apparent if you think
about using a recursive procedure (one that calls itself). This is
a simple illustration of the need for a stack. We have highlighted the basics already, but here they are once more:
The stack is used for three purposes
Use of the stack
A stack is simply a sequential piece of memory that is used to store data. Traditionally, data is pushed onto the stack using a pointer, the stack pointer, that points to the last word placed on the stack. Before data is pushed,
the stack pointer is moved to make room for the new data. The memory
pointed to by the current value of the stack pointer is referred to as
the top of stack, since it is the last data to be placed on the stack. When the data is retrieved, it is popped
from the stack - the data is copied from the top-of-stack, then the
stack pointer is moved back. Implementations often had instructions to
push and pop
data values, adjusting the stack pointer and storing the data at the
same
time.
On MIPS, there are no push and pop instructions. Instead, when a piece of data is to be placed on the stack, the stack pointer is moved, then used to access the space made available. For example, if we wanted to place two words (8 bytes) of data on the stack, we could adjust the stack pointer by 8, then store the data using store word instructions with offsets of 0 and 4 bytes from the stack pointer. If the data we wanted to place on the stack was in reg1 and reg2, and we wanted reg1 to be at the top-of-stack, it would look like this:
Because the stack grows downwards on MIPS, terms like top of stack are a bit confusing. It is always the last item placed on the stack, or the word stored at 0($sp). The terminology does not change, even though the stack grows to lower addresses. We will try to avoid using these terms, however.
If a procedure needs to store information on the stack, it allocates a chunk of stack space using the addiu instruction as above, with a negative adjustment. If a function needs stack space, the addiu instruction is the first instruction executed in the procedure. The chunk of stack space 'allocated' by this method for the procedure is called its stack frame or activation record.
Example: The function foo needs to allocate 5 words of stack space. Write its entry code.
Here, foo's stack frame is 20 bytes long - from 0-19 bytes relative to $sp.
The return address is always stored at the extreme end of the stack
frame, next to the stack frame for foo's caller, hence foo stores $ra
at 16($sp).
Stack alignment
For our purposes, the stack is word-aligned. This means
that the minimum amount of space that can be placed on the stack
is four bytes. (The issue of
alignment of the stack in the text is confusing. The examples in
Chapter 2 use a stack that is word-aligned, while those in
Appendix A use a stack that is doubleword-aligned. For our
class, we will use a stack that is word-aligned for simplicity.
In addition, we will not
use a frame-pointer, although we will discuss it briefly, time-permitting.)
(The doubleword-alignment of the stack is important in general so that
you can easily move double-precision floating point data and doubleword integers to and from
the stack. (Just like words must be word-aligned, doubles must be doubleword-aligned))
This restriction of four-byte quantities fits our use of registers quite well. Suppose you are working with character data and you load a character value into a register using
lbu $t0,0($s0)
where $s0 is a character pointer. As we know, registers are 32-bits wide, so the lbu instruction automatically adjusts the value for us. Since we used the lbu instruction, the upper 24-bits of the register are 0. If we need to save our register on the stack, we could simply use the sw instruction instead of the sb instruction. This is exactly what happens when small data (a char or a short) is passed as arguments. The argument is promoted to the 4-byte counterpart before being stored on the stack. Thus scalar data (individual arguments) that are chars and shorts are promoted to ints and unsigned chars and unsigned shorts are promoted to unsigned ints. In the receiving routine these arguments are retrieved as four-byte quantities and the high-order bits are ignored.
How do we allocate the space for our local data?
Consider the following sequence of code
Here, we have a character array and an individual character that must be allocated as local variables. The solution is pretty simple: the character ch is allocated as a word on the stack. If the address of that word is in $t0, then the character ch would be located at 0($t0). For the array carray, we allocate sufficient words to hold the entire array of chars, four chars per word. In this case we allocate (NB+3)/4 words on the stack for the array, where NB is the number of bytes in the array, (and 3 is added to adjust the number of bytes to the next word boundary before we truncate by integer division). Thus, in our case we allocate 13/4 or 3 words (12 bytes) for carray. carray would actually start at the beginning of this area, and there would be two bytes of padding at the end.
Who saves registers?
When a call is made, registers must be saved (and later restored)
so that the registers in-use by the caller are not overwritten by
the registers used by the callee. Of course, it benefits
performance if the least instructions possible to do this save and
restore are executed, but the convention requires that the call is blind, i.e., where neither side can "see" the register use of the other.
Traditionally, either the caller or the callee were responsible for saving all the registers it uses. These conventions are referred to as caller-save and callee-save. Each mechanism has pros and cons:
MIPS tries to get the best of both worlds by adopting a hybrid scheme between caller- and callee-save:
Putting it all together
Uses #1 and 2 occur in any non-leaf function. The last two uses are
optional - depending on your function - you may decide to use
s-registers (or not) and you may or may not have local variables.
A simple example of a procedure call
Consider the code fragment below.
We are going to consider what would happen in the dumbest
compilation possible, where we ignore the fact (and obvious optimization) that a and b are
constants in proc1(). We will instead translate the code just as it
is written.
Looking at this function, we notice the following:
Since it is not a leaf, it needs a stack frame with room for $ra. Since the maximum number of arguments it passes to a function is less than 4, it requires the minimum amount of space for arguments (4 words)
In our first iteration of writing code for this function, we will also allocate two words for our temporaries. Adding this all together gives us 1 ($ra) + 4 (args) + 2 (temps) words of stack space or 7 words = 28 bytes. This gives us our first instruction inside proc1:proc1: addiu $sp,$sp,-28
But how is our new space of 28 bytes organized?
Order of items on the stack
If you are inside the function proc1 and looking up the stack, you should see the following, in order
data item |
address |
$ra |
24($sp) |
b |
20($sp) |
a |
16($sp) |
argument area |
0($sp) |
Now that we have the offsets, we can write our function. Again, here is the code:
This first really silly translation is fine - it obeys all the calling convention rules. Even ignoring the constants, there are a few obvious optimizations we can do.
Using t-registers effectively
There are two obvious optimizations we can do to our function by remember a few facts
all our temporaries go away when we return.
you can use a t-register to hold a temporary (instead of the stack) if the temporary is not needed after a function call.
Looking over our function again
There are two situations here that qualify under the optimizations above
This allows us to simplify our code. Here is the result:
This is significantly simpler. You should notice something very important, however: by optimizing this function we modified (at least some of) the stack offsets! This
is not always necessary - in our case we could have left the unused
space for b in our stack frame. But if you modify the registers you
want to save, you will have to do some work on your stack offsets, and if you add space, you always need to change your offsets.
Non-leaf functions that use arguments
We have one final issue we need to illustrate simply - a function that uses arguments. We will modify our simple function to do that. Here is the new version:
int proc2(int);
int proc1(int a) {
Here, we have moved a to an argument - the first argument, which is passed in $a0. In this case, we need our argument a after a function call, so we must save it on the stack. But, remember, a already has a place allocated for it by the function that called it - a's home location. In our example in the last module, the home location of the first argument was 0($sp). But that was before we added our stack frame. So, a's home location is now 0+N where N is the size of our frame.
Our stack frame has changed now, too, since a is no longer a temporary. Since it is an argument, we will simply home a before we call proc2. Here is the code:
That seemed like a lot, but we have already covered most of the issues. There are just a few left, but we need an example with a bit more complexity to examine them.
A slightly more complicated example
First, let's transform this code to our ugly version:
[ Ugly, is right, yes? ] These programs are nmatches.c and nmatches1.c.
Next, let's do an analysis of the variables here:
The obvious things to do here would be to home str and comp before the function call and restore them afterwards. Similarly, we would allocate a temporary location for n and save and restore it across the function call. Let's see what the code looks like:
These decisions result in a lot of saving and restoring - of homing the arguments and n each iteration. We can avoid that by using a few s-registers.
Using s-registersIf you have a temporary variable (or an argument) that you need throughout the function and there are intervening procedure calls, placing the variable in an s-register is a good choice. Remember, s-registers are callee-save. This means it is the responsibility of the callee to save and restore the value across the function. As long as we do that, the value in the s-register is "safe" for the duration of our function, because any other function (that we call) that wants to use it must save and restore it for us.
In this function we will allocate three s-registers - $s0 for n, $s1 for str and $s2 for comp. Let's do a side-to-side comparison of the function before and after using s-registers:
(These examples are nmatches1.s and nmatches1_sregs.s. Complete
programs with test main's for each are nmatches_all.s and
nmatches_all_sregs.s.)
without using s-registers |
with using s-registers |
.globl nmatches
nmatches: addiu $sp,$sp,-24 sw $ra,20($sp) # char c; # $t0 is c # int n=0; # n is 16($sp) sw $zero,16($sp) #Lnmatchesloop: Lnmatchesloop: # c = *str; lb $t0,0($a0) # if (c == 0) goto Lnmatchesloopend; beq $t0,$zero,Lnmatchesloopend # if (equiv(c,comp) == 0) goto Lnmatchesskip; # home str and comp sw $a0,24($sp) sw $a1,28($sp) move $a0,$t0 jal equiv # we need to reload the arguments lw $a0,24($sp) lw $a1,28($sp) beq $v0,$zero,Lnmatchesskip # n++; lw $t1,16($sp) addi $t1,$t1,1 sw $t1,16($sp) #Lnmatchesskip: Lnmatchesskip: # str++; addi $a0,$a0,1 # goto Lnmatchesloop; b Lnmatchesloop #Lnmatchesloopend: Lnmatchesloopend: # return (n); lw $v0,16($sp) #} lw $ra,20($sp) addiu $sp,$sp,24 jr $ra |
.globl nmatches nmatches: addiu $sp,$sp,-32 sw $ra,28($sp) sw $s2,24($sp) # n sw $s1,20($sp) # comp sw $s0,16($sp) # str move $s1,$a1 move $s0,$a0 # char c; # $t0 is c # int n=0; # n is $s2 move $s2,$zero #Lnmatchesloop: Lnmatchesloop: # c = *str; lb $t0,0($s0) # if (c == 0) goto Lnmatchesloopend; beq $t0,$zero,Lnmatchesloopend # if (equiv(c,comp) == 0) goto Lnmatchesskip; move $a1,$s1 move $a0,$t0 jal equiv beq $v0,$zero,Lnmatchesskip # n++; addi $s2,$s2,1 #Lnmatchesskip: Lnmatchesskip: # str++; addi $s0,$s0,1 # goto Lnmatchesloop; b Lnmatchesloop #Lnmatchesloopend: Lnmatchesloopend: # return (n); move $v0,$s2 #} lw $ra,28($sp) lw $s2,24($sp) lw $s1,20($sp) lw $s0,16($sp) addiu $sp,$sp,32 jr $ra |
The first thing you might notice is that the sregs version is
slightly longer. The issue that matters, however, are memory
references. As we know, this is what slows down a program.
We will adopt the simple rule that register movements and
calculations are free, but memory references count 1 unit. The new
version does 6 new memory references (highlighed in bold
in the code on the right) that do not appear in the old version. The
old version has 8 memory references that do not appear in the new
version. It doesnt seem like a big difference until you notice that the
reference on the left occur inside the loop whereas the ones in the sregs version appear outside of the loop.
If we assume that the function is given a string 20 bytes long with 3
matches, the version on the left adds 4*20 + 2*3 + 2 = 88 memory
references while the sregs version adds 6.
Besides the speed improvement, the sregs version is easier to program. All you have to do is save and restore the registers, then initialize them. After that you do not have to worry about function calls and what you need to save and restore.
Aside - What does a real compiler do?
Just for fun, let's run the code for our last example through a real MIPS compiler. We will tell it to not use a frame pointer (we will discuss the frame pointer later), and to not optimize the code. After deleting the extraneous compiler directives, here is what we are left with:
(this is compiled with mips-linux-gnu-gcc -O1 -fomit-frame-pointer -S nmatches.c (I had to rearrange a couple of instructions, since the branch delay slot was used, and we haven't learned about it yet.)) I have compared it to our sregs version, after stripping out the comments.
MIPS compiler code |
our hand-assembled code |
nmatches: addiu $sp,$sp,-40 sw $31,36($sp) sw $18,32($sp) sw $17,28($sp) sw $16,24($sp) move $16,$4 move $18,$5 lb $4,0($4) bne $4,$0,$L2 move $17,$0 j $L3 $L2: move $17,$0 $L5: move $5,$18 jal equiv beq $2,$0,$L4 addiu $17,$17,1 $L4: addiu $16,$16,1 lb $4,0($16) bne $4,$0,$L5 $L3: move $2,$17 lw $31,36($sp) lw $18,32($sp) lw $17,28($sp) lw $16,24($sp) addiu $sp,$sp,40 j $31 |
nmatches: addiu $sp,$sp,-32 sw $ra,28($sp) move $s1,$a1 move $s0,$a0 move $s2,$zero Lnmatchesloop: lb $t0,0($s0) beq $t0,$zero,Lnmatchesloopend move $a1,$s1 move $a0,$t0 jal equiv beq $v0,$zero,Lnmatchesskip addi $s2,$s2,1 Lnmatchesskip: addi $s0,$s0,1 b Lnmatchesloop Lnmatchesloopend: move $v0,$s2 lw $ra,28($sp) lw $s2,24($sp) lw $s1,20($sp) lw $s0,16($sp) addiu $sp,$sp,32 jr $ra |
When you realize that $s0 is $16, $a0 is $4, $ra is $31 and $v0 is
$2, the code is quite similar. (It is longer because they restructure
the loop.) In particular, you can see they used s-registers just like
we did - for n, str and comp.
Another example:
Consider the C function below to fill up to max elements of an integer array:
int
fillarray(int
ia[], int max) {
Just by looking at this function, and without translating it to assembly, a few things should be obvious:
Here is one translation of this function:
A version of
this program, including the function get_element is in the online/mipsIV directory.
It uses the old-fashioned SPIM syscalls and a global for the array.
A note about main()
Although we will
observe this calling convention in our code, the MARS startup code
doesn't do so. Thus there is no space allocated on the stack prior to
calling main() for main to home its arguments. Since our code does not
use the arguments to main(), this should not be a problem, but I
mention it nonetheless.
The frame pointer ($fp)
The stack pointer may not be constant for the life of a procedure. It is adjusted at the beginning for the basic stack frame, but may be flexible for the life of the procedure. This creates two problems:
Because of this, some implementations use two pointers to reference the stack frame. The stack pointer is non-constant, and the frame pointer, set at the entry to the proc, always points to the first word allocated in the stack frame.
If this scheme is extended by always saving the caller's frame pointer
at a fixed location relative to the start of the callee's stack frame,
walking back the stack is simple. It also means that temporary
variables, saved registers and the home location of our arguments are
at a constant offset from the frame pointer for the life of the
procedure, even if we must move the stack pointer during the procedure.
For simplicity, we won't use a frame pointer in our
studies. There is an example in the book's appendix, but they have
mixed in the added complexity of a doubleword-aligned stack in the
same example making it unclear at best.
Prev | This page was made entirely
with free software on linux: Kompozer and Openoffice.org |
Next |