CSIS 480 - ARM Assembler

We will be using the ARM Cortex-A8 as the target for our compiler. You will need access to an ARM system (or emulator). Your compiler need not necessarily run on the ARM system, but you will need to compile and execute the ARM assembly output from your compiler on an ARM system.

The steps to compilation are:

Write a program (e.g., in a file named foo.h) in our language
Run your compiler on the program to generate ARM assembly code (i.e., output a file named foo.s)
On the ARM system:
Run the ARM assembler (via gcc to produce an executable (e.g., $ gcc -o foo foo.s to produce the foo program)
Execute your program (e.g., $ ./foo)

Our language is rather restricted, and so only a small portion of the ARM assembly language instructions will be relevant for our purposes. Relevant instructions include:

mov: loads a register with a value from another register or a constant
ldr: loads a register with a value from memory
str: stores a register value to memory
add: adds two register values together and stores the results in another register
sub: subtracts values in registers
mul: multiplies values in registers (for division, see note below)
cmp: compares values in registers
b[l] <label>: jumps to the given label; we'll also use forms that branch on "equal", "not equal", "less than" and "greater than" in conjunction with the cmp instruction

Implementation Notes

All "local" variables will be allocated on the stack - including those for the main program. This tutorial gives an example of how to use the stack for local variables.

Later we will be implementing subroutines using the stack (note that we will not be using the ARM convention of passing arguments via registers); so in addition to register r13 (aka sp), r11 (aka fp) will not be used as a general-purpose register since it will be used to anchor the call-frame of the current function (initially set to the value of the stack pointer). Access to parameters and local variables within a function will be done relative to the address in fp. As local variables are allocated we will adjust the stack pointer to make room (and since the stack grows "down" on the ARM this will involve decrementing the stack pointer). So local variables will be accessed relative to the frame pointer (e.g., local variable 1 will be accessed as [fp, #-4])

Each assembly file we produce will need a template with a few pre-existing values defined. The following is a general template that you can use:

.text

.global main
main:
   ldr r0, =link  /* Store the lr value for graceful return at the end */
   str lr, [r0]
   mov fp, sp  /* set the fp to the base of the stack */

   /***
    * Generated code goes here 
    **/

   ldr r0, =link  /* Reset the lr */
   ldr lr, [r0]
   mov r0, #0     /* Return 0 for success */
   bx lr          /* return */


.data
.balign 4
link: .word 0
.balign 4
string_format: .asciz "%s"
.balign 4
int_format: .asciz "%d"
/***
 * Program-defined string constants here
 **/

Note that we'll generate the .data section at the end so that we can simply scan the entire "symbol table" for all constant strings and allocate them here. I recommend that you use a simple counter for the number of constant strings and create simple labels for each one as they're entered into the symbol table, e.g.:

.balign 4
s1: .asciz "Hello World!"
.balign 4
s2: .asciz "I love compilers!"
...

Register Allocation

You can follow some simple rules for managing registers.

First, we will be calling the printf function for output. Since printf requires 2 registers (the format and the data) we need to either

reserve the use of r0 and r1 for printing, leaving us with registers r2 - r10 for general use, or
save the values of registers r0 and r1 before we setup and invoke printf; as long as we save the register states for r0-r3 before we begin to print and restore them after we print, we should be able to use r0 and r1 as general purpose registers.

Note that the ARM calling convention assumes that a function such as printf is free to use registers r0-r3 so if our code is using registers r2, r3 they'll need to be saved on the stack before the call and restored afterwards. So it makes sense to simply push r0-r3 before we begin to print, and then restore them, leaving us free to use r0-r10

Register allocation is a matter of finding a free register to satisfy "load" operations and arithmetic operations that have a destination. The process of finding a register is a function often vaguely referred to as getreg()). At the very least our compiler will need to keep track of what symbols are in what registers; we may also want to keep track of the inverse - given a symbol, what register might it be in. Allocation will generally follow the simple rules that

if the symbol is already in a register, return that register
if there is a free register, return that register
if there is a register holding a value that is "dead", return that register
if all registers are in use by "live" values, "spill" registers to get a free register; we will need to restore spilled registers later.

The trick is to determine when a register is (or can no longer be) "live"; we will talk about simple strategies in class. There are a number of ways to optimize register allocation to minimize load/store operations and mov operations. Those optimizations will be discussed separately.

Last modified: , by David M. Hansen