sections in this module City College of San Francisco - CS270
Computer Architecture

Module: MIPS-III (Procedures)
module list

A word on Strings

We have been using strings for a while but have not specifically talked about them. Let's take a moment and go over the basics. We will use character strings as our example, since C strings are easy to understand.

A C string is simply a sequence of characters with a zero byte at the end. That is what is generated in the .asciiz directive. We are used to the construct

welcome: .asciiz "Welcome to strings"

Here welcome is a label attached to our string. We also know that to output this string we must get its address in the appropriate register for the syscall like this

la $a0,welcome

What, exactly, is happening here? Mars (or the compiler system in general) keeps a record of where each label is. At the time the .asciiz directive was encountered, Mars was filling up a data area (the .data section) with data, sequentially adding data as it was encountered. At any time, a counter indicated where in the data area we currently were (the "end" of the .data area). When the welcome: label was encountered, the value of the counter was recorded and attached to the label welcome in an internal table called the symbol table. Immediately following, bytes in the .data section were initialized with the contents of the string and the counter was incremented by its length.

Later, when the la instruction was encountered, the symbol table was consulted, and the address of welcome was substituted for the label.

For example, if the current end of the .data section was 0x10000040 when the .asciiz directive was encountered, the address 0x10000040 would be entered in the symbol table for welcome, 19 bytes (the string plus a null byte) would be initialized (0x10000040 - 0x10000052) and the new end of the .data section would be 0x10000053. Later, when the la instruction was encountered, the address 0x10000040 would be substituted for the label welcome. This would essentially become a load immediate instruction with a 32-bit constant

li $a0,0x10000040

and would be translated to two MIPS instructions

lui $at,0x1000
addiu $a0,$at,0x40

Once the address of the string is in a register, you can perform the syscall OR write code to manipulate the string character by character. To retrieve the first character of the string, for example

lb $t0,0($a0)

To increment the pointer to point to the next character we simply

addi $a0,$a0,1

etc.

Let's take one more step and examine the C statement (using a global variable welcome):

char * welcome = "Welcome to strings";

Although this looks very similar, it is somewhat different. Again, a null-terminated ASCII string must be initialized and its address recorded. This time, however, the address is used to initialize a [pointer] variable with type char *. Let's look at the code that would be generated for Mars by a compiler, then explain it

.Lwelcome: .asciiz "Welcome to strings"
.align 2
welcome: .word .Lwelcome

Just like before, a label is generated (this time by the compiler) to attach to the string. The [compiler-generated temporary] label (.Lwelcome) is entered into the symbol table with the address of the string. Then, to keep track of the string in the high level language, a variable is initialized with this address. The variable is a char *, or a pointer to char. When Mars encounters the .word directive, it looks up the value of .Lwelcome and substitutes it. In our example, the .word directive would become

welcome: .word 0x10000040

However, we have a further step to do. There is now a new label, welcome, which has its own address. This address (which in our case, after the .align directive, would be 0x10000054) is the address of the pointer. Once we have a memory word (our pointer) initialized with the correct address (of the string), we just use it like any other data value. Now to output the string we would

lw  $a0,welcome

Note the difference between the la and lw. In the earlier case welcome was the label attached to the string. In this case it was a data word that had been initialized with the address of the string.

Once you have the address of the string in a register you manipulate it the same way.

Note that the characters in C strings are constant characters. It is illegal to modify the characters in a string initialized in this way. (The type of the pointer welcome above is actually const char *, or pointer to constant char.) You may very well get a fault if you modify the characters.

If you want to modify characters in a string you must use a character array and initialized it in some way - by reading in a string or by copying a constant string to it.

Other pointers

The initialization of integer pointers is exactly the same. The only difference is how you use and increment the pointer. At that time you must know the underlying size of the data. If the register $a0 has been initialized with the correct address:

for a string

lb $t0,0($a0)
addi $a0,$a0,1

for an integer array

lw  $t0,0($a0)
addi $a0,$a0,4


Prev This page was made entirely with free software on linux:  
Kompozer, the Mozilla Project
and Openoffice.org    
Next

Copyright 2015 Greg Boyd - All Rights Reserved.