Threads – Operating Systems

A thread is the smallest unit of instructions that can be scheduled by an operating system. Every process has at least one thread of execution, but may have many.

Many applications are multi-threaded. For example,

web browser with multiple tabs
IDE with code intelligence
web server that can service multiple requests concurrently

Benefits

Scalability – a multi-threaded application can run on multiple cores.
Responsiveness – if a thread is blocked the application can perform other tasks in other threads.
Resource sharing – they share memory and process resources – avoiding the need for other IPC.
Economy – typically a multithreaded application use less resources than multi-process application.
Manageability – it is typically easier to coordinate threads to work concurrently than it is to manage processes.

Resources

Some resources are shared among all threads in a process and some resources are provided to each thread.

Per Process	Per Thread
Address space (code section & heap)	Program counter
Global variables	Registers
Open files	State Info
Child processes	Stack
Pending alarms
Signals and signal handlers
Accounting Information

Amdahl’s Law

A system is concurrent if it performs context switches between execution units making it appear as if each is running on the CPU at the same time. This occurs for example on a CPU with a single core.

A system is parallel if it can perform more than one task simultaneously. Multicore processors provide an environment for parallel processing.

Threads allow a program running in a single process to take advantage of a processor’s parallel computing capabilities.

Amdahl’s Law is a formula that identifies the potential performance gains from running a program on fixed input using additional processing cores.

[latex] A(s,N) \leq \frac{1}{s+\frac{1-s}{N}}[/latex]

where s is the percentage of the application that must be performed serially on a system with N processing cores.

Almost any sequence of code can be parallelized to some degree. But consider the following example. Suppose we have a function that prompts the user to enter their name, address and phone number and then reads the data into memory. Since the user has access to only one keyboard and monitor, it makes no sense to divide this task up into smaller tasks. Therefore 100% of the code needs to be performed serially.

Using Amdahl’s Law, we see that we gain nothing by adding more cores.

Speedup from 2 cores: ?(1,2) ≤ 1
Speed up from 4 cores: ?(1,4) ≤ 1
Speed up from 8 cores: ?(1,8) ≤ 1

Now consider a function that takes an array of k integers as a parameter and prints each of the numbers to the screen (in any order). Here, each one of the print statements can be performed by a separate thread, so we can say that 0% of the code needs to be computed serially or rather 100% can be computed in parallel. In this case, adding additional cores increases the speed of the program.

Speedup from 2 cores: ?(0,2) ≤ 2
Speed up from 4 cores: ?(0,4) ≤ 4
Speed up from 8 cores: ?(0,8) ≤ 8

Now, consider a program that runs serially 25% of the time and in parallel the other 75% of the time. On two processing cores the program can potentially see a speed up of

[latex]A(0.25,2) \leq \frac{1}{0.25+\frac{1-0.25}{2}} = \frac{1}{0.25 + 0.375} = 1.6[/latex]

Amdahl’s Law implies there is a limit to the potential performance gains when a portion of the code must be performed serially, regardless of the number of CPU cores.

with 4 cores: 2.29
with 8 cores: 2.9
with 16 cores: 3.39
with 32 cores: 3.66
with 64 cores: 3.82
with 128 cores: 3.91
with 256 cores: 3.95
with 512 cores: 3.98
with 1024 cores: 3.99

This formula converges to [latex]1/S = 1/.25 = 4[/latex]. If a program runs 5% serially, the most we can gain is [latex]1/S = 1/.05 = 20[/latex].

This formula suggests that even the slightest amount of serial processing limits the advantages of multicore processors, whereas if it were running completely in parallel the gains continuously increase with an increase in the number of cores.

Gustafson-Barsis’ Law

Amdahl’s Law is predicated on a fixed input size. Gustafson-Barsis’ Law, however, states that as the size of the input dataset increases and the number of cores increases, the lack of gain by the serial computations are overshadowed by the gains achieved by the parallel computations.

So, parallel computing can have limited gains on fixed input sizes but can have significant gains as the size of the dataset increases.

2 Types of Parallelism

Data parallelism divides a dataset into equal parts and runs the same task on each individual subset.

Task parallelism divides an algorithm into separate parts that can be performed in parallel.

Programming Challenges & Solutions

As the number of cores on a computer increase, we as programmers need to take advantage of them to speed up our programs. But designing a multi-threaded program can be challenging.

Things we have to consider:

Should we use data parallelism, task parallelism, both or none?
What are the individual data segments or tasks?
How do we ensure the workload is balanced?
How do we coordinate data sharing?
How do we debug?

Challenges When Sharing Resources

Global variables are shared by threads. So global variables in code that was not intended to be multi-threaded, but now is, can cause bugs.
The current working directory is shared by threads.
- T1 changes the current working directory and reads a file into memory.
- T2 changes the current working directory and opens, writes to a file.
- T1 writes the contents in memory to a new file in the current working directory.
  Where did T1 write the new file?

Reentrant Functions

A reentrant function is a function that can be safely interrupted in the middle of execution, called in another thread, and “re-entered” by the initial thread at a later time.

The following function is not reentrant.

int count = 1;    // global

int foo() {
    count = count + 2;
    return count; 
}

Consider if foo() is called by two different threads at the same time. Then the first call to foo() can return either 3 or 5, depending on whether or not the thread was preempted during the execution of foo(). This is problematic.

Reentrant functions are considered thread-safe. A list of POSIX functions that are not thread-safe can be found on the pthreads man page. Some examples are:

• getdate() • readdir() • system() • strtok() • setenv() • rand()

POSIX Threads (pthreads)

Threads are implemented in a program using a thread library.

IEEE has defined a standard for threads. The package is called pthreads. There are over 60 functions to control threads. Some are as follows:

Thread call	Description
pthread_attr_init	Create and initialize a thread’s attribute structure
pthread_create	Create a new thread of execution, returns thread id
pthread_equal	Determine if two threads are equal, returns nonzero if equal
pthread_self	Return thread id of the calling thread
pthread_join	Wait for a specific thread to exit
pthread_exit	Terminate the calling thread, release stack
pthread_attr_destroy	Remove a thread’s attribute structure

Programs using pthreads should be compiled using gcc –pthread.

#include <pthread.h>

int pthread_create(
    pthread_t *restrict tidp,
    const pthread_attr_t * restrict attr, 
    void *(*start_routine)(void *), 
    void * arg);

pthread_create() takes a pointer to a pthread_t object and sets its value to the new thread id.

The second argument is a pointer to a thread attribute structure. If NULL is passed, a thread with default settings is created. If the programmer wants to create a thread with customized attributes the programmer can initialize an attribute structure using pthread_att_init() and then call additional functions to set the values of the specific attributes.

The newly created threads start running at the address of the start_routine function. The function takes a single argument, arg, which is a void pointer.

#include <stdio.h> 
#include <stdlib.h> 
#include <unistd.h> 
#include <pthread.h>

#define MAX_THREADS 16

void * thread_do(void *arg) { 
    int i = *(int *) arg; 
    printf("thread %d\n", i); 
    fflush(NULL);

    char str[16];
    sprintf(str, "new thread: %d", i); 
    printids(str);

    return ((void *)0); 
}

void printids(const char *s) {
    pid_t pid;
    pthread_t tid;
    pid = getpid();
    tid = pthread_self();

    printf("%s: pid %u tid (0x%x) \n", 
        s, (unsigned int)pid, (unsigned int)tid);
    fflush(NULL); 
}

int main(int argc, char*argv[]) { 
    int err;
    pthread_t tid;
    int num_threads = atoi(argv[1]);

    if (num_threads > MAX_THREADS) {
        printf("usage: %s num_threads (< %d)\n", argv[0], MAX_THREADS);
        exit(0); 
    }

    int thread_args[MAX_THREADS]; 

    int i;
    for (i = 0; i < num_threads; i++)
        thread_args[i] = i;

        for (i = 0; i < atoi(argv[1]); i++) {
            err = pthread_create(&tid, NULL, thread_do, &thread_args[i]);

            if (err != 0) {
                perror("Error creating thread");
            } 
        }
        printids("main thread");
        sleep(2); // wait for the threads to terminate
        exit(0); 
    }
}

Thread Termination

If any thread calls an exit() function, the entire process is terminated along with all other threads.

There are 3 ways that a thread can terminate.

It can return from the start routine.
It can be canceled by another thread in the same process.
The thread can call void pthread_exit()

pthread_exit() passes the pointer passed into the parameter to threads that have successfully joined the calling thread.

#include <pthread.h>
void pthread_exit(void *value_ptr);

Waiting for a Thread to Terminate

#include <pthread.h>
int pthread_join(pthread_t id, void **rval_ptr);

pthread_join() pauses until the thread with the specified id terminates. If the terminating thread calls pthread_exit(), the data passed into pthread_exit() is available for to the waiting thread. PTHREAD_CANCEL is stored in rval_ptr if the terminating thread does not pass data in pthread_exit().

Requesting Another Thread to be Canceled

A thread can request that another thread be canceled by calling pthread_cancel()

#include <pthread.h>
int pthread_cancel(pthread_t id)

By default all threads are cancelable. A thread can change the cancelability state by calling pthread_setcancelstate().

#include <pthread.h>
int pthread_setcancelstate(int state, int *oldstate)

The state is either PTHREAD_CANCEL_ENABLE or PTHREAD_CANCEL_DISABLE.

If a thread is cancelable and canceled, it will continue until it reaches a cancelation point. A cancelation point is a point where the thread checks to see if it has been canceled and exits. POSIX specifies 55 functions that are cancelation points and 32 functions that can be implemented as cancelation points. When a thread has been canceled, is cancelable and calls one of the cancelation point functions it is terminated.