Paging – Operating Systems

Recall that …

Each OS sets a frame size, which is used to partition RAM.
At compile time, the compiler identifies the frame size for the system and partitions the program’s virtual space into pages of the same size, adding padding to fill out the last page.
Symbolic addresses within the program are bound to virtual address pairs (p,o) where p is the page number and o is the offset within the given page.
When the program is run, the dispatcher establishes the memory needed for the process (including pages for the compiled binary), finds free frames in the frame table, then maps the executable’s pages to these frames and records the mapping in the process’ page table.
When the CPU needs to access a region of memory referenced by a pair (p,o), the address translation hardware finds the frame number at index p in the page table. The memory address then computed using

physical address = frame_number x frame_size + offset

Paging ensures no external fragmentation, that is, all free frames are usable at any time. There may be, however, internal fragmentation inside the frames when programs don’t use all the space allocated to them.

Page Table Allocation and TLBs

Frames are typically between 4KB and 8KB and always of size that is a power of 2.

Most modern computers allow the page table to be very large (over 1 million entries). Here the page table is kept in memory and a page table base register (PTBR) points to the page table.

To find a memory address bound to (p,o), we need to do 2 memory accesses. One to find the frame number in the page table and one to access the memory bound to (p,o).

To avoid having to check the page table each time, most CPUs contain translation look-aside buffers (TLB). This is high speed cache that stores recently used page table entries in (key/value) pairs. A query of the TLB checks all keys simultaneously, so it is very fast. A TLB is usually very small, typically between 32 and 1,024 entries. When the CPU needs to translate a (p,o) pair, the TLB is checked for a corresponding page/frame entry. If found then the translation can be done immediately. If not, a TLB miss occurs. Then the frame number is retrieved from the page table in memory and is stored in the TLB. Then the address can be translated as described.

To allow multiple processes to have entries in the TLB at the same time, some systems include an address-space identifier (ASID) for each entry in the TLB.

If a system does not use ASIDs then each time a context switch is performed, the TLB must be flushed.

Memory Protection

A page table may be longer than the number of pages allocated to the process. A page-table length register (PTLR) is used to indicate the size of the page table. Access to pages outside the value in the PTLR cause a trap instruction to the OS.

The page table, aside from specifying the frame, often includes protection bits and a valid-invalid bit. The protection bits (e.g. 1 bit for writeable and 1 bit for user_mode accessible) are used to protect regions of a process’ memory from inappropriate access. When the address is being translated, the protection bits are checked to make sure that the program has read, write, or execute access to the frame. The valid-invalid bits are used to identify which pages are currently being used (i.e. have valid frames associated with them). Unauthorized actions on frames causes a trap instruction to be sent to the OS.

Virtual Memory

Many programs include routines that are rarely (or never used) and some programs require more memory that is provided in a computer. The ability to execute a program that is only partially in memory would help in these instances.

Recall that …

a program may allocate space on the heap
as nested function calls are made, additional stack frames are allocated in the stack.
dynamically linked libraries are copied into the address space when needed.
pages can be shared when using shared memory
fork() shares a parent’s executable with child processes

This requires additional pages to be allocated dynamically when needed and that pages be shared.

On Demand Paging

Rather than loading an entire program into memory at one time, pages can be loaded when needed using a technique called demand paging.

The component that copies in pages from the swap space or RAM on demand is called a pager. When a context switch is being performed, the pager guesses the pages that will be used next time it is run, before the process is swapped out.

To distinguish which pages are in memory and which are in the swap space or RAM, we can utilize additional bits in the page table. Valid-invalid bits are used to indicate whether pages have been allocated a frame and if they are currently loaded into memory. If a page is referenced and it is in memory, it is called a page hit. Otherwise it is called a page fault.

When a process is first created,

A page table is created as normal
The number of pages needed for the process is stored (usually in the process control block).
The dispatcher loads a subset of the process’ pages into frames and the remaining into virtual memory and sets the page table accordingly.
- All pages that have frames allocated have the valid-invalid bits set to valid in their page table entries.
- All other entries have the valid-invalid bit set to invalid.

When the CPU needs to translate an address,

The OS checks to determine if the page requested is an allowable page number (checks the PCB).
- If the page number is not allowable, a trap instruction is sent to the OS.

If the page number is allowable, the page table (or TLB) is checked to see if a frame has been allocated checking the (valid-invalid bit).
- If a frame has been allocated, the frame number is used to determine the physical address.
If a frame has not been allocated, a page fault occurs.
The frame table is checked to see if a free frame is available.
- If a frame is not available, a frame is swapped out to the swapping store.
The pager then swapped the page into the frame and the CPU can translate the address.

When repeated page faults occur, it is called thrashing.

Copy-on-Write

Recall that a fork() system call creates a child process that is a duplicate of its parent. Older systems created a copy of the parent’s address space for the child. Since most child processes call exec() immediate, a copy of the address space is wasteful. So most implementations, allow the child process to share the parent’s pages until the exec() method is called. When the child process is created, the pages that are shared and are writable are set to copy-on-write. Once a child or parent attempts to write to a shared writable page, the system makes a separate copy for the child.

Linux offers vfork() which does not set the shared pages to copy-on-write. Changes made by a parent or child are visible to the other. The function is intended to be used for child processes that immediately call exec(), saving the need to set all of the shared writable pages to copy-on-write.