Linking

The Static Keyword in C

An external variable is one that is declared outside of a function.

A static external variable is only accessible within the module in which it is declared (ie. private). Same with static methods.

External variables that are not static are called global variables.

A static internal variable remains in memory and retains its value between function calls.

Linking is the process of collecting and combining various pieces of code and data into a single file that can be loaded into memory and executed.

Linking can be performed at compile time, load time, or at run time.

Linking is performed by programs called linkers.

Benefits of linking separate compilation units

Resusability
Modular (separate development, separate testing)

7.1 Compiler Drivers

To compile:

$ gcc -Og main.c sum.c -o prog

Runs cpp (c preprocessor). Translates main.c to ASCII file main.i
Runs cc1 (compiler). Translates main.i to ASCII file main.s.
Runs as (assembler). Translates main.s to binary relocatable object file main.o.
Steps 1, 2, and 3, are run for sum.c
Runs ld (linker) to combine main.o, sum.o and necessary system object files to create a single binary executable object.

To run:

$./prog

The shell invokes a loader which:

Copies the code and data in prog into memory
transfers control to the beginning of the prog

7.2 Static Linking

Static linkers (ld) take as input a collection of relocatable object files and command line args and generates a fully linked executable file.

Steps to build the executable

Symbol Resolution – Object files define and reference symbols (functions, global variables, static variables). Symbol resolution associates each symbol reference with exactly one symbol definition.
Relocation – Compilers and assemblers generate code and data sections that start at address 0. The linker relocates these sections by associating a memory location with each symbol definition and then modifying all of the references to those symbols. The linker performs these relocations using detailed instructions generated by the assembler, called relocation entries.

7.3 Object Files

Object files come in 3 forms.

Relocatable object files – contains code and data that can be combined w/ other relocatable object files at compile time to create an executable file.
Executable object file – binary code and data that can be copied into memory and executed.
Shared object file – a relocatable object file that can be loaded into memory and linked dynamically at load time or run time.

Each OS has a different format for their object files. Linux uses ELF (executable and linkable format).

7.4 Format of ELF Relocatable Object Files

ELF Header
- size of this ELF header
- offset (# bytes) to the sections header table at end of file.
- object file type (relocatable, executable, or shared)
- byte order (big endian | little endian)
- word size (i.e. size of address)
- machine type on which it was compiled (e.g. x86_64)
.text
- The machine code of the compiled program
.rodata
- Read only data including format strings for printf and jump tables for switch statements.
.data
- values for initialized global and static variables
.bss
- values for uninitialized global and static variables
- Just a placeholder for relocatable object files. Since the program is not being run, no space is actually needed to store values.
.symtab
- A symbol table with info about functions and global and static variables that are defined and referenced in the program. No entries for local variables.
.rel.text
- A list of relocation entries (see below) for objects in the .text section that will need to be modified (external function calls and references to global variables) when this object file is linked with others. No data is needed in this section for executable files.
.rel.data
- Relocation information for global variables that need to be initialized to the address of a global variable or external function.
.debug
- Debug symbol table with entries for
  - local variables
  - typedefs
  - global variables defined and referenced in the module
  - the original C source code
- Present if compiled with -g.
.line
- A mapping between line numbers in the C code and the machine instructions in the .text section. Present if compiled with -g.
.strtab
- A table of null terminated strings for symbols in .symtab and .debug and for section names in the section headers.
Section header table
- Location and size of the various sections above.

BLUE (for relocatable object files)
Magenta (for debugging when compiled with -g flag)

7.5 Symbols and Symbol Tables

Each relocatable object module m, has a symbol table (.symtab).

3 types of symbols:

Global symbols that are defined in m and that can be referenced by other modules. (e.g. global variables and non-static external variables)
Global symbols that are referenced in m, but defined in another module.
Local symbols defined and referenced exclusively in m.

Recall that .symtab is a table with info about functions and global and static variables that are defined and referenced in the program. No entries for local variables. The symbol table can also contain entries for the individual sections and for the path name of the original source code file.

Symbol Table Entries

typedef struct {
    int name;       // String table offset to the name of the symbol
    char type:4,    // function or data?
         binding:4; // local or global?
    char reserved;  // unused
    short section;  // section header index
    long value;     // section offset or absolute address
    long size;      // object size (in bytes)
} Elf64_Symbol;

The name field holds the offset in .strtab to the string holding the symbol’s name.
The type fields usually holds either data or function.
The section field holds the index in the section header table to indicate which section the symbol is located in. For example, with gcc, section=1 denotes .text and section=3 denotes .data.
- There are 3 special pseudo-sections for relocatable object files:
  - ABS – for symbols that should not be relocated
  - UNDEF – for symbols defined in other modules
  - COMMON – for uninitialized data objects that are not yet allocated
The value field hold either
- an offset from the beginning of the section to the symbol (for relocatable object files) or
- an absolute runtime address (for executable object files)
The size field holds the size of the object (in bytes)

7.6 Symbol Resolution

The linker resolves symbol references by associating each reference with exactly one symbol definition from the symbol tables of its relocatable object files.

Since the compiler ensure that local variable names are unique within the module (even static internal variables) references to local variables are easy.

If the compiler encounters a symbol (variable or function name) that is not defined in the module, it adds an entry to the linker symbol table (rel.text ?) and leaves it up to the linker to resolve. If the linker can not find a definition for the symbol, it prints a message and terminates.

If the linker finds multiple definitions for a global variable, it can choose one or report an error.

7.6.1 How Linkers Resolve Duplicate Symbol Names

On Linux, the compiler tags functions and globals

Functions and initialized globals get tagged as strong
uninitialized functions get tagged as weak

Multiple strong symbols with the same name are not allowed
Given a strong and multiple weak symbols, choose the strong.
Given multiple weak symbols, choose one.

The books shows where Rule 2 and Rule 3 can produce unwelcome bugs.

Use linker flag gcc -fno-common to prohibit multiply defined global symbols or use -Werror which turns all warnings to errors.

7.6.2 Linking with Static Libraries

In practice, all compilation systems provide static libraries that combine related object files into a single file. For example, ISO C99 provides libc.a that includes common library functions: printf, strcpy, scanf, etc.

On Linux, static libraries are stored in archive files that have the .a extension.

At link time, the linker only copies the object modules that are referenced by the program. GCC includes .libc.a automatically.

To create an archive file we use the ar tool.

$ gcc -c addvec.c multvec.c
$ ar rcs libvector.a addvec.o multvec.o

To compile and link main.o and libvector.a we do the following. We assume main2.c include a header with the vector function prototypes.

$ gcc -c main2.c
$ gcc -static main2.o ./libvector.a -o prog
or equivalently
$ gcc -static main2.o -L. -lvector -o prog

The -static flag tells the compiler driver that the linker should create a runnable executable that doesn’t require linking at run time.

7.6.3 How Linkers Use Static Libraries to Resolve References

During the symbol resolution phase the linker scans the relocatable object files in the order listed on the command line (.c files are converted to .o files). During the scan, the linker maintains a set E of relocatable object files that will be merged into a single executable, a set U of unresolved symbols, and a set D of previously defined symbols.

Note: the procedure “update U and D” below includes considering duplicates as discussed in the previous section.

For each file f on the command line:
    if f is a .o file:
        add f to E
        update U and D
    if f is a .a file:
        until U and D are unchanged:
            for each module m in f:
                if m is in U:
                    add m to E
                    update U and D
if U is not empty:
    report error and terminate
Merge and relocate object files in E into single executable

Order of object and archive files on the command line matters.

7.7 Relocation

Once the linker has completed the symbol resolution step, it has associated each symbol reference in the code with exactly one symbol definition in one of the object module’s symbol table.

At this point the linker knows the exact size of the code (.text) and data (.data and .bss) sections in its input object modules.

It is now ready to begin the relocation process where it merges the input modules and assigns run-time addresses to each symbol to produce an executable.

2 steps

Relocating sections and symbol definitions.
1. The linker merges all sections of the same type into a single section of the same type (e.g. all .data sections are merged into a single .data section).
2. The linker then assigns run-time memory addresses to the
  1. new aggregate sections
  2. to each section defined by the input modules
  3. to each symbol defined by the input modules
3. When this is complete, every instruction and global variable has a unique run-time memory address.

Relocating symbol references within sections.
1. The linker modifies every symbol reference in the bodies of the code and data sections so that they point to the correct run-time addresses. To perform this step, the linker relies on relocation entries in the .rel.data sections of the relocation object modules.

7.71. Relocation Entries

When an assembler generates an object module, it does not know where the code and data will ultimately be in memory nor does is know where externally defined functions and global variables will be. So whenever it encounters a symbol whose ultimate location is unknown it creates a relocation entry that tells the linker how to modify the reference when it merges the object files into an executable.

Below is the format of a ELF relocation entry.

typedef struct {
    long offset;     // offset of the reference to relocate
    long type: 32,   // relocation type
        symbol:32;   // symbol table index
    long addend;     // constant part of the relocation expression
} Elf64_Rela;

The offset field holds the section offset of the reference that will need to be modified.
The symbol field holds the index in the symbol table that the modified reference should point to.
The addend field holds a signed constant that is used by some types of relocations to bias the value of the modified reference.
The type field informs the linker how to perform the relocation.

2 Types

R_X86_64_PC32

Relocate a reference that uses a 32-bit PC relative address. A PC relative address is an offset from the current run-time value fo the PC. When the CPU executes an instruction using PC-relative addressing, it forms an effective address by adding the 32-bit value encoded in the instruction to the current run-time value of the PC (which is the address of the next instruction).

R_X86_64_32

Relocate a reference that uses a 32-bit absolute address. With absolute addressing, the CPU uses the 32-bit value encoded in the instruction as the effective address, without further modifications.

In the final executable, the CPU expects absolute addresses for references to global variables and PC relative addresses for function references.

During the relocation phase, the linker processes each relocation entry in the relocatable object files’ .rel.text sections. When processing, if a symbol in a relocation entry is a reference to a function, the linker computes the offset to the function object and replace the placeholder with the offset. If a symbol in a relocation entry is a reference to a global variable, the linker replace the placeholder with the address of the global variable.

7.7.2 Relocating Symbol References

Consider the following code for main.o

0: 48 83 ec 08             sub $0x8,%rsp
4: be 02 00 00 00          mov $0x2,%esi
9: bf 00 00 00 00          mov $0x0,%edi            %edi = &array
                      a: R_X86_64_32 array          relocation record

e: e8 00 00 00 00          callq 13<main+0x13>      sum()
                      f: R_X86_64_PC32 sum-0x4

13: 48 83 c4 08            add $0x8,%rsp
17: c3                     retq

We assume that the linker knows the final runtime address of each section s, denoted ADDR(s), and of each symbol t, denoted ADDR(t).

Relocating Absolute Addresses

Since the linker knows the final location of all the symbols, the placeholders in instructions that access a global variables are replaced by the actual addresses.

In the example above, a relocation entry was created for the reference to the global array variable.

r.offset = 0xa
r.symbol = array (actually index in symbol table)
r.type = R_X86_64_32
r.addend = 0

The mov instruction was at offset 0x9, leaving the first byte of the placeholder for the array address at 0xa. The placeholder consists of the byte at 0xa, 0xb, 0xc, and 0xd.

Lets suppose ADDR(s) is at 0x4004d0. The the location of the placeholder is at:

refaddr = ADDR(s) + r.offset = 0x4004d0 + 0xa = 0x4004da

The linker will update the byte at that address and the 3 bytes that follow.

Suppose ADDR(array) = 0x601018. The value that will be placed in the placeholder’s memory will is computed as follows:

*refaddr = (unsigned) (ADDR(r.symbol) + r.addend)
         = (unsigned) (0x601018 + 0)
         = (unsigned) 0x601018

Updating PC-Relative References

The CPU expects the operand of a callq instruction to be an offset to the address in the program counter register [PC].

Let’s suppose that ADDR(.text) = 0x4004d0. Then since callq is at offset 0xe then callq would be at address 0x4004de and the first byte of the placeholder is at 0x4004df. Since sum.o’s .text section would be inserted after main.o’s .text section, the memory addresses can be viewed as a linear array as shown below.

Before the CPU processes the callq instruction it would set the program counter [PC] to the address of the next instruction, namely to 0x4004e3. So the offset to sum() from the address in [PC] is 0x5 (5 bytes).

		sum()
		sum()
		sum()
	0x4004e8	beginning of sum()
c3	0x4004e7
08	0x4004e6
c4	0x4004e5
83	0x4004e4
48	0x4004e3	[PC]
00	0x4004e2	placeholder 4
00	0x4004e1	placeholder 3
00	0x4004e0	placeholder 2
00	0x4004df	placeholder 1
e8	0x4004de	callq

Since we’ve assumed that ADDR(.text) = 0x4004d0, the address of the first byte of the placeholder is:

refaddr = ADDR(s) + r.offset = 0x4004d0 + 0xf
                             = 0x4004df

The value to be stored in the placeholder is:

*refaddr = (unsigned) (ADDR(r.sym) + r.addend - refaddr)
         = (unsigned) (0x4004e8 + (-4) - 0x4004df)
         = (unsigned) (0x4004e8 - 0x4004e3)
         = (unsigned) 0x5

7.8 Executable Object Files

An ELF executable file contains the following sections:

ELF header (includes the program header table that describes the format of file which is used when loading the file into memory. Also contains the address of first instruction to execute)
Segment header table (maps contiguous file sections to run-time memory sections)
.init (contains small function called _init that will be called by OS initialization routine)
.text
.rodata
.data
.bss
.symtab
.debug
.line
.strtab
Section header table

The Segment header table maps contiguous file sections to run-time memory segments.

The first 5 sections make up the code segment (in blue). The code segment is loaded into read-only memory segments.
The .data and .bss sections make up the data segment (in red) and is loaded into read/write memory segments.
The others are not loaded into memory.

The program header table specifies information about the sections that are loaded into memory.

offset to the start of the section
size of the section
size required in physical memory (may be different than size of location, e.g. space needed for .bss)
starting virtual address
permissions (read-only for .text, read-write for .data)

7.9 Loading Executable Object Files

When a user runs a program from the command line,

The shell forks a child process that is a duplicate of the shell’s
The child process invokes the loader via execve
The loader deletes the child’s existing virtual memory segments and creates new code, data, heap and stack segments.
The new stack and heap segments are initialized to 0.
The code and data segments are initialized to the contents of the executable file by mapping pages in the virtual address space to page-size chunks of the executable file.
The loader jumps to the _start address which eventually calls the application’s main routine.

Aside from some header information, there is no copying from disk to memory during loading. The copying is deferred until the CPU references a mapped virtual page, at which point the operating system automatically transfers the page from disk to memory using its paging routine.

Practice Problems

7.1, 7.2, 7.3, 7.4, 7.5

Homework Problems

7.6, 7.8, 7.10, 7.11, 7.12