Procedure Calls – Computer Architecture

Below is an outline of what happens when a procedure P calls a procedure Q. Note most of the task are optional depending on the situation.

Preserve caller-saved registers by pushing their values onto the stack
P passes arguments to Q via registers and the stack
- The first 6 integral and address arguments are passed via set in registers.
- Additional arguments are pushed onto the stack.
P passes control to Q using call instruction.
- The address of the instruction to execute (in P) after Q returns is pushed on the stack.
- The program counter is set to the address of the first instruction in Q.
Instructions in Q are executed
- Q preserves any callee-saved registers by pushing their values onto the stack.
- Q is free to access arguments put on the stack and allocate new space on the stack.
- …
- Q passes data back to P in %rax
- Q frees up any memory it allocated on the stack
- Q resets any callee-saved registers by popping values off the stack and into the registers.
- Q passes control back to P with a ret instruction.
  - The next instruction in P to execute (at %rsp) is loaded in the program counter.
P deallocates stack memory used to pass arguments to Q
P resets caller-saved registers by popping values off the stack and into the registers.

The Run-Time Stack

Reasons for using the stack in a procedure:

Preserve the state of registers.
There are not enough registers to hold the arguments of a procedure call.
The address-of operator(&) references a variable, and so we need to be able to get an address for the variable.
Some local variable are arrays or structures and must be able to be accessed with array and structure references.

When a procedure uses space on the stack, the space is referred to as the procedure’s stack frame. The stack pointer (%rsp) points to the top of the stack and the stack grows toward lower memory addresses.

Data can be stored on and popped off of the stack using pushq and popq instructions, respectively.

Space for data can also be allocated on the stack by decrementing $rsp by the size of the data and moving the data into that memory. Similarly, space can be deallocated by incrementing $rsp.

Preserving Register Data

Since registers are used by all procedures, when a caller calls another procedure (the callee), they both have to take care to preserve the information in the registers.

All registers execept %rsp are either callee-saved or caller-saved registers. Registers %rbx, %rbp and %r12 – %r15 are classified as callee-saved registers. All other registers are caller-saved.

When P calls Q, P must preserve the data in the caller-saved registers before calling Q by pushing the data on the stack and restoring the data in the registers after Q returns.

pushq  %rsi          // save caller-saved registers before making procedure call
...                  // put arguments on stack if necessary
call   proc
...                  // free up argument space on stack if necessary
popq   %rsi          // restore caller-saved registers after procedure call

Q must preserve the data in the callee-saved registers. Q can preserve them by not using them or by pushing their values on the stack before using them and popping them before returning.

pushq  %rbx          // saves value first thing when the callee starts
...                  // use %rbx inside the procedure
popq   %rbx          // restores value right before callee returns

Data Transfer

With x86_64, most data that is passed to and from procedures takes place via registers. There are conventions that specify which registers are used for arguments and return values. In x86_64 up to (6) integral values (integer and pointers) can be passed via registers. Again, by convention, these registers are used in a specific order.

Remember the order: disi dxcx 89

%rdi

%rsi

%rdx

%rcx

%r8

%r9

Arg1 is stored in %rdi, arg2 is stored %rsi, etc.

Other register labels can be used for different data sizes.

When a function has more then 6 arguments, the other arguments are pushed on the stack. Suppose a P calls Q and Q has n > 6 arguments. Then P must

Copy arguments 1 through 6 into the registers
Allocate a stack frame for Q with enough space for (n – 6) elements (all data sizes are rounded up to multiples of 8 bytes)
Copy elements 7 through n onto the stack with the 7th element on top
Execute a call instruction (return address is automatically pushed onto its stack frame)

Allocating and Deallocating Stack Space

Values can be stored on the stack in different sizes and are referenced by offsets to %rsp.

To allocate, use and deallocate 8-bytes on the stack we do the following:

sub   $8, %rsp        // allocate 8 bytes on the stack
movq  $1, (%rsp)      // move the value 1 onto the stack
...                   // use the variable by dereferencing (%rsp)
add   $8, %rsp        // deallocate 8 bytes on the stack

Control Transfer

When P passes control to Q, it includes a call Q instruction. The call Q instruction automatically pushes onto the stack the address (A) of the instruction that should be executed after Q returns. When Q calls ret, A is popped off the stack and loaded into PC.

The general form of call and ret are as follows:

call  Label          Procedure call
call  *Operand       Procedure call
ret                  Return from call

Example

Below is an example of a procedure (call_proc) calling another procedure (proc) that has 8 parameters.

long call_proc() {
    long x1 = 1; int x2 = 2;
    short x3 = 3; char x4 = 4;
    proc(x1, &x1, x2, &x2, x3, &x3, x4, &x4);
    return (x1+x2)*(x3-x4);
}

void proc(long x1, long *x1p, int x2, int *x2p, 
          short x3, short *x3p, char x4, char *x4p){ 
    *x1p += x1; 
    *x2p += x2; 
    *x3p += x3; 
    *x4p += x4; 
}

Assembly Code

1  call_proc:
2      subq    $32, %rsp               // allocate 32-byte stack frame
3      movq    $1, 24(%rsp)            // store 1 in &x1
4      movl    $2, 20(%rsp)            // store 2 in &x2
5      movw    $3, 18(%rsp)            // store 3 in &x3
6      movb    $4, 17(%rsp)            // store 4 in &x4
7      leaq    17(%rsp), %rax          // get &x4
8      movq    %rax, 8(%rsp)           // store &x4 in arg8
9      movl    $4, (%rsp)              // store 4 in arg7
10     leaq    18(%rsp), %r9           // store &x3 in arg6
11     movl    $3, %r8d                // store 3 in arg5
12     leaq    20(%rsp), %rcx          // store &x2 in arg4
13     movl    $2, %edx                // store 2 in arg3
14     leaq    24(%rsp), %rsi          // store &x1 in arg2
15     movl    $1, %edi                // store 1 in arg1

16     call  proc

17     movslq  20(%rsp), %rdx          // get x2 and convert to long
18     addq    24(%rsp), %rdx          // compute x1 + x2
19     movswl  18(%rsp), %eax          // get x3 and convert to int
20     movsbl  17(%rsp), %ecx          // get x4 and convert to int
21     subl    %ecx, %eax              // compute x3 - x4
22     cltq                            // convert to long
23     imulq   %rdx, %rax              // compute (x1+x2)*(x3-x4)
24     addq    $32, %rsp               // deallocate stack frame
25     ret

26  proc: 
27     addq    %rdi, (%rsi)             // *x1p += x1 : fetch AND store to same address
28     addl    %edx, (%rcx)             // *x2p += x2 
29     addw    %r8w, (%r9)              // *x3p += x3 
30     movq    16(%rsp), %rax           // fetch x4p : not @ 8(%rsp), (see below)
31     movl    8(%rsp), %edx            // fetch x4 
32     addb    %dl, (%rax)              // *x4p += x4
33     ret

Notice that call_proc() stored &x4 and x4 at 8(%rsp) and (%rsp), respectively, but proc() retrieves them from 16(%rsp) and 8(%rsp). This is because the call instruction pushed onto the stack the address of the next instruction in call_proc() to execute after proc() returns. Proc() needs to skip over the 8-bytes holding the address when accessing x4 and &x4.

Example 2

long caller() {
    long arg1 = 534;
    long arg2 = 1057;
    long sum = swap_add(&arg1, &arg2);
    long diff = arg1 - arg2;
    return sum * diff;
}

Assembly Code

1 long_caller:
2    subq     $16, %rsp            // allocate 16 bytes
3    movq     $534, (%rsp)         // store 534 in arg1
4    moveq    $1057, 8(%rsp)       // store 1057 in arg2
5    movq     %rsp, %rdi           // store &arg1 in first parameter to swap_add
6    leaq     8(%rsp), %rsi        // store &arg2 in second parameter to swap_add
7    call     swap_add
8    movq     (%rsp), %rdx         // get value in arg1
9    subq     8(%rsp), %rdx        // %rdx = arg1 - arg2
10   imulq    %rdx, %rax           // %rax = sum * diff
11   addq     $16, %rsp            // deallocate stack frame
12   ret

Example 3

Below is an assembly program that makes nested procedure calls. Initially 100 is passed into main() via %rdi. Bar() is called, where 5 is subtracted from 100 (still in %rdi) with the result being stored in %rdi. Then foo() is called in bar(). The value 95 resides in %rdi added to 2 and stored in %rax. Foo() terminates with 97 in %rax and bar() resumes. Bar() doubles the value in %rax, stores it (194) in %rax, and returns to main(). Main() stores 194 in %rdx.

foo(long y)

Note: y in %rdi

1    400540 <foo>:
2    400540: 48 8d 47 02       lea    0x2(%rdi),%rax    // 95+2=97
3    400544: c3                retq                     // return

bar(long x)

Note: x in %rdi

4    400545 <bar>:
5    400545: 48 83 ef 05       sub     $0x5,%rdi        // 100-5=95
6    400549: e8 f2 ff ff ff    callq   400540 <foo>     // call foo(95)
7    40054e: 48 01 c0          add     %rax,%rax        // 2*97=194
8    400551: c3                retq                     // return

main()

Note: 100 in %rdi

9    40055b: e8 e5 ff ff ff    call    400545 <bar>     // call bar(100)
10   400560: 48 89 c2          mov     %rax,%rdx        // 194 in %rdx
...