A Tour of Computer Systems – Computer Architecture

In this part of the course, we’ll look at a real instruction set, see how our c programs are translated to machine code and investigate how we can write better code.

Questions We’ll Answer

Is a switch statement more efficient than a sequence of if-else statments?
How much overhead is incurred by a function call?
Is a while-loop more efficient than a for-loop?
Are pointer references more efficient than array indexes?
Why does a function run faster when it uses a local variable rather than a reference passed in.
How can we make a function run faster by simply rearranging the parenthesis in an arithmetic expression?

Information Consists of Bits and Context

Everything: data in CPU, data in RAM, contents of files, data transfering over network is stored as bits. Context affects the meaning.

bits, bytes, word
binary representation of characters
binary representation of numbers

Computer representations use a limited number of bits to encode a number, and hence some operations can overflow when the results are too large. Take for example the product: 200 * 300 * 400 * 500. You’d expect a positive number.

#include <stdio.h>
int main(void) {
    printf("%d\n", 200*300*400*500);
    return 0;
}

macbook:~ eric$ ./test
-884901888

Floating point operations also have limitations due to the finite precision of the representation. For example, the result of the program shown below should be 3.14 but is 0.

#include <stdio.h>
int main(void) {
    printf"%lf\n", (3.14 + 1e20) - 1e20);
    return 0;
}

macbook:~ eric$ ./test
0.000000

By studying the number representations, we can understand the limitations of the different arithmetic operations. This is critical for writing programs that work correctly.

Hardware

The von-Neumann Architecture is a model for any modern stored-program computer.

The CPU at its core has a word-size register called the instruction address register (IAR). At any given time, the IAR holds the address of the next instruction in memory to execute. The CPU repeatedly fetches an instruction, updates the IAR register, and executes the instruction.

Many of the instructions are arithmetic or logical operations (e.g. ADD, SHIFT, AND, CMP). These are performed in the Arithmetic/Logic Unit (ALU) of the CPU. Other instructions (e.g. JUMP, LOAD, STORE) are performed in the control unit of the CPU.

Main memory consists of dynamic random access memory (DRAM). Logically, memory is organized as a linear array of bytes, each with its own address, starting at 0. DRAM is accessed by passing an address into the memory address register (MAR). When DRAM is enabled, the byte stored at the address in MAR is placed on the bus.

The bus carries information between components. Busses typically transfer chunks of bytes known as words. Most machines today have a word size of 4 bytes or 8 bytes. The Bus size is determined by the CPU design.

Categories of Instructions

Load – copy a byte or word from main memory into a register.
Store – copy a byte or word from a register to a location in memory.
Jump – Extract a word from the instruction itself and copy that word into the IAR.
Operate – copy the contents of one or two registers to the ALU, perform some arithmetic operation, and store the results in a register.

Hierarchy of Storage

Each level is a cache for the level below it.

Registers
L1 cache – on the CPU (nearly as fast as register on CPU)
L2 cache – connected to CPU via special bus (maybe 5-10x slower than register)
DRAM
Local Disk
Remote storage