sections in this module City College of San Francisco - CS270
Computer Architecture

Module: MIPS-IV (Procedures)
module list

Procedures

In the last topic we covered writing leaf functions with arguments. Although that is a good first step, there is a lot more to the procedure calling convention. We will cover the rest of it in this section.

[ Sample files for this topic are in the directory online/mipsIV beneath our public work area on hills. ]

The procedure call

Calling a procedure is a special kind of branch. The difference is that this branch must be returned from. After the call, you want the callee to return to the next instruction after the call. That's what our special instructions do - the address to return to is recorded in a register when the call is made. The callee then can use the register to return.

This basic procedure call, where one procedure (the caller) branches to another procedure (the callee) is implemented using one of two special MIPS instructions

Both jal and jalr use the link register (hence the al in their names means and link) to record the return address. The link register is also called the return address register and is named $ra. Some students affectionately refer to it a rah.

Of course you may see an immediate problem. The procedure main is called using the jal instruction. If it calls another procedure using the jal instruction the $ra register is overwritten. This means the address main should return to is lost!

A similar situation occurs in each called procedure (the callee). Such state information must be saved at procedure call and restored upon return. This information must be saved in a unique area for each procedure. This should be apparent if you think about using a recursive procedure (one that calls itself). This is a simple illustration of the need for a stack. We have highlighted the basics already, but here they are once more:

The stack is used for three purposes

Note: in this discussion, the terms caller- and callee- are used a lot. Remember, most procedures are both callee and caller. They are the callee when they are entered. They are the caller when they call another procedure. Keep in mind that the terms are referring to a single call.

Use of the stack

A stack is simply a sequential piece of memory that is used to store data. Traditionally, data is pushed onto the stack using a pointer, the stack pointer, that points to the last word placed on the stack. Before data is pushed, the stack pointer is moved to make room for the new data. The memory pointed to by the current value of the stack pointer is referred to as the top of stack, since it is the last data to be placed on the stack. When the data is retrieved, it is popped from the stack - the data is copied from the top-of-stack, then the stack pointer is moved back. Implementations often had instructions to push and pop data values, adjusting the stack pointer and storing the data at the same time.

On MIPS, there are no push and pop instructions. Instead, when a piece of data is to be placed on the stack, the stack pointer is moved, then used to access the space made available. For example, if we wanted to place two words (8 bytes) of data on the stack, we could adjust the stack pointer by 8, then store the data using store word instructions with offsets of 0 and 4 bytes from the stack pointer. If the data we wanted to place on the stack was in reg1 and reg2, and we wanted reg1 to be at the top-of-stack, it would look like this:

addiu $sp,$sp,-8
sw reg2,4($sp)
sw reg1,0($sp)

Because the stack grows downwards on MIPS, terms like top of stack are a bit confusing. It is always the last item placed on the stack, or the word stored at 0($sp). The terminology does not change, even though the stack grows to lower addresses. We will try to avoid using these terms, however.

If a procedure needs to store information on the stack, it allocates a chunk of stack space using the addiu instruction as above, with a negative adjustment. If a function needs stack space, the addiu instruction is the first instruction executed in the procedure. The chunk of stack space 'allocated' by this method for the procedure is called its stack frame or activation record.

Example: The function foo needs to allocate 5 words of stack space. Write its entry code.

foo: addiu  $sp,$sp,-20
     sw $ra,16($sp)

Here, foo's stack frame is 20 bytes long - from 0-19 bytes relative to $sp. The return address is always stored at the extreme end of the stack frame, next to the stack frame for foo's caller, hence foo stores $ra at 16($sp).

Stack alignment

For our purposes, the stack is word-aligned.  This means that the minimum amount of space that can be placed on the stack is four bytes. (The issue of alignment of the stack in the text is confusing. The examples in Chapter 2 use a stack that is word-aligned, while those in Appendix A use a stack that is doubleword-aligned. For our class, we will use a stack that is word-aligned for simplicity. In addition, we will not use a frame-pointer, although we will discuss it briefly, time-permitting.) (The doubleword-alignment of the stack is important in general so that you can easily move double-precision floating point data and doubleword integers to and from the stack. (Just like words must be word-aligned, doubles must be doubleword-aligned))

This restriction of four-byte quantities fits our use of registers quite well. Suppose you are working with character data and you load a character value into a register using

lbu  $t0,0($s0)

where $s0 is a character pointer. As we know, registers are 32-bits wide, so the lbu instruction automatically adjusts the value for us. Since we used the lbu instruction, the upper 24-bits of the register are 0. If we need to save our register on the stack, we could simply use the sw instruction instead of the sb instruction. This is exactly what happens when small data (a char or a short) is passed as arguments. The argument is promoted to the 4-byte counterpart before being stored on the stack. Thus scalar data (individual arguments) that are chars and shorts are promoted to ints and unsigned chars and unsigned shorts are promoted to unsigned ints. In the receiving routine these arguments are retrieved as four-byte quantities and the high-order bits are ignored.

How do we allocate the space for our local data?

Consider the following sequence of code

foo() {
char ch, carray[10];
...

Here, we have a character array and an individual character that must be allocated as local variables. The solution is pretty simple: the character ch is allocated as a word on the stack. If the address of that word is in $t0, then the character ch would be located at 0($t0). For the array carray, we allocate sufficient words to hold the entire array of chars, four chars per word. In this case we allocate (NB+3)/4 words on the stack for the array, where NB is the number of bytes in the array, (and 3 is added to adjust the number of bytes to the next word boundary before we truncate by integer division). Thus, in our case we allocate 13/4 or 3 words (12 bytes) for carray. carray would actually start at the beginning of this area, and there would be two bytes of padding at the end.

Who saves registers?

When a call is made, registers must be saved (and later restored) so that the registers in-use by the caller are not overwritten by the registers used by the callee. Of course, it benefits performance if the least instructions possible to do this save and restore are executed, but the convention requires that the call is blind, i.e., where neither side can "see" the register use of the other.

Traditionally, either the caller or the callee were responsible for saving all the registers it uses. These conventions are referred to as caller-save and callee-save. Each mechanism has pros and cons:

MIPS tries to get the best of both worlds by adopting a hybrid scheme between caller- and callee-save:

Putting it all together

In the sections above, and in the previous module, we have highlighted several uses for the stack
  1. for the return address, $ra. This is saved at entry and restored at exit. Any function that allocates a stack frame must add room for $ra.
  2. for the home location of arguments for procedures you call. This must be sufficient to hold the maximum arguments of any function you call, with a minimum of four words of space.
  3. for callee-save registers - the s-registers. If  you want to keep a variable in a register for the duration of your function, you should use an s-register for it. The previous (caller's) value of the s-register must be saved at entry to your function and restored at exit.
  4. for temporary data - any locals that require a permanent location must be allocated on the stack.

Uses #1 and 2 occur in any non-leaf function. The last two uses are optional - depending on your function - you may decide to use s-registers (or not) and you may or may not have local variables.

A simple example of a procedure call

Consider the code fragment below.

int proc2(int);
int proc1(void) {
int a=10, b=40;
a = a + proc2(b);
return (a);
}

We are going to consider what would happen in the dumbest compilation possible, where we ignore the fact (and obvious optimization) that a and b are constants in proc1(). We will instead translate the code just as it is written.

Looking at this function, we notice the following:

Since it is not a leaf, it needs a stack frame with room for $ra. Since the maximum number of arguments it passes to a function is less than 4, it requires the minimum amount of space for arguments (4 words)

In our first iteration of writing code for this function, we will also allocate two words for our temporaries. Adding this all together gives us 1 ($ra) + 4 (args) + 2 (temps) words of stack space or 7 words = 28 bytes. This gives us our first instruction inside proc1:

proc1: addiu  $sp,$sp,-28

But how is our new space of 28 bytes organized?

Order of items on the stack

If you are inside the function proc1 and looking up the stack, you should see the following, in order

data item
address
$ra
24($sp)
b
20($sp)
a
16($sp)
argument area
0($sp)

Now that we have the offsets, we can write our function. Again, here is the code:

int proc2(int);
int proc1(void) {
int a=10, b=40;
a = a + proc2(b);
return (a);
}

# int proc1(void) {

proc1: addiu  $sp,$sp,-28
    sw  $ra,24($sp)
# int a=10, b=40;
    li $t0,10
    sw $t0, 16($sp)
    li $t0, 40
    sw $t0, 20($sp)
# a = a + proc2(b);
    lw  $a0, 20($sp)
    jal proc2
    lw  $t0, 16($sp)
    add $t0,$t0,$v0
    sw $t0,16($sp)
# return (a);
    lw  $v0,16($sp)
# }
    lw  $ra,24($sp)
    addiu $sp,$sp,28
    jr $ra

This first really silly translation is fine - it obeys all the calling convention rules. Even ignoring the constants, there are a few obvious optimizations we can do.

Using t-registers effectively

There are two obvious optimizations we can do to our function by remember a few facts

all our temporaries go away when we return.

you can use a t-register to hold a temporary (instead of the stack) if the temporary is not needed after a function call.

Looking over our function again

int proc2(int);
int proc1(void) {
int a=10, b=40;
a = a + proc2(b);
return (a);
}

There are two situations here that qualify under the optimizations above

  1. the temporary b is only used to pass an argument to proc2. It is not used afterwards.
  2. there is no reason to store the final value of a. Just return it.

This allows us to simplify our code. Here is the result:

# int proc1(void) {
proc1: addiu  $sp,$sp,-24
    sw  $ra,20($sp)
# int a=10, b=40;
    li $t0,10
    sw $t0, 16($sp) # a
    li $t1, 40     # use $t1 for b
# a = a + proc2(b);
    move $a0,$t1
    jal proc2
    lw  $t0, 16($sp)
    # no need to store the final value of a. We are just returning it.
   
add $v0,$t0,$v0 
# return (a);
# }
    lw  $ra,20($sp)
    addiu $sp,$sp,24
    jr $ra

This is significantly simpler. You should notice something very important, however: by optimizing this function we modified (at least some of) the stack offsets! This is not always necessary - in our case we could have left the unused space for b in our stack frame. But if you modify the registers you want to save, you will have to do some work on your stack offsets, and if you add space, you always need to change your offsets.

Non-leaf functions that use arguments

We have one final issue we need to illustrate simply - a function that uses arguments. We will modify our simple function to do that. Here is the new version:

int proc2(int);
int proc1(int a) {

int b=40;
a = a + proc2(b);
return (a);
}

Here, we have moved a to an argument - the first argument, which is passed in $a0. In this case, we need our argument a after a function call, so we must save it on the stack. But, remember, a already has a place allocated for it by the function that called it - a's home location. In our example in the last module, the home location of the first argument was 0($sp). But that was before we added our stack frame. So, a's home location is now 0+N where N is the size of our frame.

Our stack frame has changed now, too, since a is no longer a temporary. Since it is an argument, we will simply home a before we call proc2. Here is the code:

# int proc1(int a) {
proc1: addiu  $sp,$sp,-20
    sw  $ra,16($sp)
# int b=40;
    li $t1, 40     # use $t1 for b
# a = a + proc2(b);
    sw  $a0,20($sp)  # home a
    move $a0,$t1
    jal proc2
    lw  $t0, 20($sp)  # reload a
    # no need to store the final value of a. We are just returning it.
    add $v0,$t0,$v0 
# return (a);
# }
    lw  $ra,16($sp)
    addiu $sp,$sp,20
    jr $ra

That seemed like a lot, but we have already covered most of the issues. There are just a few left, but we need an example with a bit more complexity to examine them.

A slightly more complicated example

int equiv (char, char);
int nmatches(char *str, char comp) {
char c;
int n=0;
while ((c = *str) != 0) {
if (equiv(c,comp) != 0) n++;
str++;
}
return (n);
}

First, let's transform this code to our ugly version:

int equiv (char, char);
int nmatches(char *str, char comp) {
char c;
int n=0;
Lnmatchesloop:
c = *str;
if (c == 0) goto Lnmatchesloopend;
if (equiv(c,comp) == 0) goto Lnmatchesskip;
n++;
Lnmatchesskip:
str++;
goto Lnmatchesloop;
Lnmatchesloopend:
return (n);
}

[ Ugly, is right, yes? ] These programs are nmatches.c and nmatches1.c.

Next, let's do an analysis of the variables here:

  1. c - this temporary variable is not used after the function call. We can just put it in a t-register. Nice
  2. n - this temporary variable is used past the function call. We must allocate space for it so it is 'safe'
  3. str, comp - both of these arguments are used past the function call.

The obvious things to do here would be to home str and comp before the function call and restore them afterwards. Similarly, we would allocate a temporary location for n and save and restore it across the function call. Let's see what the code looks like:

    .globl nmatches
nmatches:
    addiu   $sp,$sp,-24
    sw      $ra,20($sp)
#    char c;
    # $t0 is c
#    int n=0;
    # n is 16($sp)
    sw      $zero,16($sp)
#Lnmatchesloop:
Lnmatchesloop:
#    c = *str;
    lb      $t0,0($a0)
#    if (c == 0) goto Lnmatchesloopend;
    beq     $t0,$zero,Lnmatchesloopend
#    if (equiv(c,comp) == 0) goto Lnmatchesskip;
    # home str and comp
    sw      $a0,24($sp)
    sw      $a1,28($sp)
    move    $a0,$t0
    jal     equiv
    # we need to reload the arguments
    lw      $a0,24($sp)
    lw      $a1,28($sp)
    beq     $v0,$zero,Lnmatchesskip
#    n++;
    lw      $t1,16($sp)
    addi    $t1,$t1,1
    sw      $t1,16($sp)
#Lnmatchesskip:
Lnmatchesskip:
#    str++;
    addi    $a0,$a0,1
#    goto Lnmatchesloop;
    b       Lnmatchesloop
#Lnmatchesloopend:
Lnmatchesloopend:
#    return (n);
    lw      $v0,16($sp)
    #}
    lw      $ra,20($sp)
    addiu   $sp,$sp,24
    jr      $ra

These decisions result in a lot of saving and restoring - of homing the arguments and n each iteration. We can avoid that by using a few s-registers.

Using s-registers

If you have a temporary variable (or an argument) that you need throughout the function and there are intervening procedure calls, placing the variable in an s-register is a good choice. Remember, s-registers are callee-save. This means it is the responsibility of the callee to save and restore the value across the function. As long as we do that, the value in the s-register is "safe" for the duration of our function, because any other function (that we call) that wants to use it must save and restore it for us.

In this function we will allocate three s-registers - $s0 for n, $s1 for str and $s2 for comp. Let's do a side-to-side comparison of the function before and after using s-registers:

(These examples are nmatches1.s and nmatches1_sregs.s. Complete programs with test main's for each are nmatches_all.s and nmatches_all_sregs.s.)

without using s-registers
with using s-registers
    .globl nmatches
nmatches:
    addiu   $sp,$sp,-24
    sw      $ra,20($sp)
#    char c;
    # $t0 is c
#    int n=0;
    # n is 16($sp)
    sw      $zero,16($sp)
#Lnmatchesloop:
Lnmatchesloop:
#    c = *str;
    lb      $t0,0($a0)
#    if (c == 0) goto Lnmatchesloopend;
    beq     $t0,$zero,Lnmatchesloopend
#    if (equiv(c,comp) == 0) goto Lnmatchesskip;
    # home str and comp
    sw      $a0,24($sp)
    sw      $a1,28($sp)
    move    $a0,$t0
    jal     equiv
    # we need to reload the arguments
    lw      $a0,24($sp)
    lw      $a1,28($sp)
    beq     $v0,$zero,Lnmatchesskip
#    n++;
    lw      $t1,16($sp)
    addi    $t1,$t1,1
    sw      $t1,16($sp)
#Lnmatchesskip:
Lnmatchesskip:
#    str++;
    addi    $a0,$a0,1
#    goto Lnmatchesloop;
    b       Lnmatchesloop
#Lnmatchesloopend:
Lnmatchesloopend:
#    return (n);
    lw      $v0,16($sp)
    #}
    lw      $ra,20($sp)
    addiu   $sp,$sp,24
    jr      $ra
   .globl nmatches
nmatches:
    addiu   $sp,$sp,-32
    sw      $ra,28($sp)
    sw      $s2,24($sp) # n
    sw      $s1,20($sp) # comp
    sw      $s0,16($sp) # str
    move    $s1,$a1
    move    $s0,$a0
#    char c;
    # $t0 is c
#    int n=0;
    # n is $s2
    move    $s2,$zero
#Lnmatchesloop:
Lnmatchesloop:
#    c = *str;
    lb      $t0,0($s0)
#    if (c == 0) goto Lnmatchesloopend;
    beq     $t0,$zero,Lnmatchesloopend
#    if (equiv(c,comp) == 0) goto Lnmatchesskip;
    move    $a1,$s1
    move    $a0,$t0
    jal     equiv
    beq     $v0,$zero,Lnmatchesskip
#    n++;
    addi    $s2,$s2,1
#Lnmatchesskip:
Lnmatchesskip:
#    str++;
    addi    $s0,$s0,1
#    goto Lnmatchesloop;
    b       Lnmatchesloop
#Lnmatchesloopend:
Lnmatchesloopend:
#    return (n);
    move    $v0,$s2
    #}
    lw      $ra,28($sp)
    lw      $s2,24($sp)
    lw      $s1,20($sp)
    lw      $s0,16($sp)
    addiu   $sp,$sp,32
    jr      $ra

The first thing you might notice is that the sregs version is slightly longer. The issue that matters, however, are memory references. As we know, this is what slows down a program.

We will adopt the simple rule that register movements and calculations are free, but memory references count 1 unit. The new version does 6 new memory references (highlighed in bold in the code on the right) that do not appear in the old version. The old version has 8 memory references that do not appear in the new version. It doesnt seem like a big difference until you notice that the reference on the left occur inside the loop whereas the ones in the sregs version appear outside of the loop. If we assume that the function is given a string 20 bytes long with 3 matches, the version on the left adds 4*20 + 2*3 + 2 = 88 memory references while the sregs version adds 6.

Besides the speed improvement, the sregs version is easier to program. All you have to do is save and restore the registers, then initialize them. After that you do not have to worry about function calls and what you need to save and restore.

Aside - What does a real compiler do?

Just for fun, let's run the code for our last example through a real MIPS compiler. We will tell it to not use a frame pointer (we will discuss the frame pointer later), and to not optimize the code. After deleting the extraneous compiler directives, here is what we are left with:

(this is compiled with mips-linux-gnu-gcc -O1 -fomit-frame-pointer -S nmatches.c  (I had to rearrange a couple of instructions, since the branch delay slot was used, and we haven't learned about it yet.)) I have compared it to our sregs version, after stripping out the comments.


MIPS compiler code
our hand-assembled code
nmatches:
    addiu    $sp,$sp,-40
    sw    $31,36($sp)
    sw    $18,32($sp)
    sw    $17,28($sp)
    sw    $16,24($sp)
    move    $16,$4
    move    $18,$5
    lb    $4,0($4)
    bne    $4,$0,$L2
    move    $17,$0
    j    $L3
$L2:
    move    $17,$0
$L5:
    move    $5,$18
    jal    equiv
    beq    $2,$0,$L4
    addiu    $17,$17,1
$L4:
    addiu    $16,$16,1
    lb    $4,0($16)
    bne    $4,$0,$L5
$L3:
    move    $2,$17
    lw    $31,36($sp)
    lw    $18,32($sp)
    lw    $17,28($sp)
    lw    $16,24($sp)
    addiu    $sp,$sp,40
    j    $31

nmatches:
    addiu   $sp,$sp,-32
    sw      $ra,28($sp)
    move    $s1,$a1
    move    $s0,$a0
    move    $s2,$zero
Lnmatchesloop:
    lb      $t0,0($s0)
    beq     $t0,$zero,Lnmatchesloopend
    move    $a1,$s1
    move    $a0,$t0
    jal     equiv
    beq     $v0,$zero,Lnmatchesskip
    addi    $s2,$s2,1
Lnmatchesskip:
    addi    $s0,$s0,1
    b       Lnmatchesloop
Lnmatchesloopend:
    move    $v0,$s2
    lw      $ra,28($sp)
    lw      $s2,24($sp)
    lw      $s1,20($sp)
    lw      $s0,16($sp)
    addiu   $sp,$sp,32
    jr      $ra

When you realize that $s0 is $16, $a0 is $4, $ra is $31 and $v0 is $2, the code is quite similar. (It is longer because they restructure the loop.) In particular, you can see they used s-registers just like we did - for n, str and comp.

If you want to see the actual MIPS compiler output for this (before my modifications), it is in nmatches_gcc.s

Another example:

Consider the C function below to fill up to max elements of an integer array:

int fillarray(int ia[], int max) {

int index=0;
while (index<max) {
if (!get_element(&ia[index])) break;
index++;
}
return(index);
}

Just by looking at this function, and without translating it to assembly, a few things should be obvious:

  1. We need room for four argument registers on the stack, since we are not a leaf function. We will also need room for the return address.
  2. Keeping index in a register would be helpful. We can use an s-register for this. Then we will need room to save the s-register at procedure entry.
  3. It does not look like we will have any live temporary registers, but that depends on how the function is translated. So far we need at least 4*4 (args) + 4 (ra) + 4 (s-temp) bytes of stack space = 24 bytes.
  4. It has two arguments that are needed throughout the function. We could use s-registers, but we will home them instead for practice:
    1. the value of the second parameter, max, does not change. It may be homed at function entry and restored after the call to get_element
    2. the value of ia might change, depending on how we use it, so we will home it prior to calling get_element (for practice).

Here is one translation of this function:

.globl fillarray
fillarray:
# fillarray needs one save register (for the index)
# and must save its arguments on the stack as well.
addiu $sp,$sp,-24 # allocate the stack frame
# home $a0 and $a1
sw   $a1,28($sp)

sw   $a0,24($sp)

sw   $ra,20($sp)
sw   $s0,16($sp)
# use $s0 to hold the current value of index
move $s0,$zero
.Loop:
bge  $s0,$a1,.Endloop
sll  $t3,$s0,2
add  $a0,$t3,$a0
jal  get_element
beq  $v0,$zero,.Endloop
# reload the address of ia and the value of max
lw   $a0,24($sp)
lw   $a1,28($sp)
add  $s0,$s0,1
b   .Loop
.Endloop:
move $v0,$s0
lw   $ra,20($sp)
lw   $s0,16($sp)
addiu $sp,$sp,24
jr   $ra

A version of this program, including the function get_element is in the online/mipsIV directory. It uses the old-fashioned SPIM syscalls and a global for the array.

A note about main()

Although we will observe this calling convention in our code, the MARS startup code doesn't do so. Thus there is no space allocated on the stack prior to calling main() for main to home its arguments. Since our code does not use the arguments to main(), this should not be a problem, but I mention it nonetheless.

The frame pointer ($fp)

The stack pointer may not be constant for the life of a procedure. It is adjusted at the beginning for the basic stack frame, but may be flexible for the life of the procedure. This creates two problems:

Because of this, some implementations use two pointers to reference the stack frame. The stack pointer is non-constant, and the frame pointer, set at the entry to the proc, always points to the first word allocated in the stack frame.

If this scheme is extended by always saving the caller's frame pointer at a fixed location relative to the start of the callee's stack frame, walking back the stack is simple. It also means that temporary variables, saved registers and the home location of our arguments are at a constant offset from the frame pointer for the life of the procedure, even if we must move the stack pointer during the procedure.

For simplicity, we won't use a frame pointer in our studies. There is an example in the book's appendix, but they have mixed in the added complexity of a doubleword-aligned stack in the same example making it unclear at best.

Prev This page was made entirely with free software on linux:  
Kompozer
and Openoffice.org    
Next

Copyright 2015 Greg Boyd - All Rights Reserved.