Table of Contents

Study Guide 1: Fundamentals of Operating Systems

While not a well-defined concept, operating systems (OS) provide hardware abstractions (e.g. file systems, processes) to software applications and manage hardware resources (e.g. memory, CPU). It can be seen as a special layer of software that provides application software access to hardware resources. An OS serves its purpose by taking on three roles.

Role Purpose
Referee Manage protection, isolation, and sharing of resources
Illusionist Provide clean, easy-to-use abstractions of physical resources
Glue Provide common services

Address Space

An address space is the set of accessible addresses and the state associated with them. For a 32-bit processor, there are \(2^{32} \approx 4 billion\) addresses.

The entire address space doesn’t necessarily represent real locations but rather potential spaces. A user program reading or writing to an address may result in a regular memory access, exception or fault if not allowed due to protection, I/O operation from a memory-mapped I/O device, or more. However, the kernel can access the entire address space without limitations.

In most modern operating systems, programs operate with virtual memory. Instead of accessing a physical memory directly, programs will request to access memory at a virtual memory address which is translated into a physical memory address.

Process

A process is an execution environment with restricted rights. Each process has its own address space and resources such as code, global data, and files.

Processes are protected from each other due to differing address spaces. If a bug were to corrupt a process, it would generally avoid compromising the entire system due to the protections from the address space.

Dual Mode Operation

The underlying hardware that the OS interfaces with provides at least two modes: kernel and user. Kernel mode, also known as supervisor or privileged mode, has the most privileges, meaning the kernel itself and other parts of the OS operate in this mode. On the other hand, user mode prohibits certain operations. This is where programs are normally executed. The restricted access is important in user mode to make sure processes running in user mode cannot maliciously corrupt another process.

There are three main ways the system switches from a user mode to kernel mode (mode transfer). When processes request a system service, this is known as a system call (syscall). While similar to a function call, syscalls occur outside the process, meaning it is executed in the kernel mode. Therefore, syscalls encompass functionality that requires the privileges or abstractions of being in the kernel mode. Interrupts, sometimes referred to as hardware interrupts, are external asynchronous events (e.g. timer, I/O) that trigger a mode switch (from user mode to kernel mode). Traps, also known as exceptions or software interrupts, are internal synchronous events (e.g. segmentation fault, divide by zero) that trigger a mode switch.

All three types of mode transfers are unprogrammed control transfers. Instead of the process specifying the specific address like in a regular function call, the process specifies an index into the interrupt vector table (IVT), which is a table that contains the address and properties of each interrupt handler. The interrupt in IVT is used as a general term that’s not just limited to the interrupts mentioned in the previous paragraph. The location of the IVT is stored in a designated processor register.

Concept Check

What is the importance of address translation?

Solution

Address translation is necessary for the idea of virtual memory which provides an isolation abstraction. This gives the illusion that each process is the sole user of the address space. It also provides protection between different processes since virtual addresses will not translate to the same physical address, preventing processes from accessing and potentially corrupting each other’s memory.

Similar to what’s done in the prologue at calling convention, what needs to happen before a mode transfer occurs?

Solution

The processor’s state (e.g. registers) need to be saved in the process control block (PCB) because the kernel may overwrite the registers when it executes its own code.

How does the syscall handler protect the kernel from corrupt or malicious user code?

Solution

When executing a syscall, the user program specifies an index instead of the direct address of the handler, meaning the user program cannot directly execute in kernel mode. The arguments will be validated by the handler to make sure the user is not intending a malicious attack. Moreover, the handler will copy over the arguments instead of using them from the user stack directly. This is necessary because the user program could change the arguments after the handler performs initial checks for malicious purposes (i.e. TOCTTOU). After the syscall finishes, the results are copied back in to user memory. The user process is not allowed to access the results stored in kernel memory for security reasons.

Processes

Processes are instances of a running program with their own protected address spaces.

Process Control Block

An OS needs to run many programs, meaning there will be many processes. As a result, basic mechanisms such as switching between user processes and the kernel, switching among user processes through the kernel, and protecting the OS from user processes and among themselves need to be present in an operating system.

To accomplish this, the kernel represents each process with a process control block (PCB), a data structure that keeps track of the various properties of a process including (but not limited to) status, register state, process id (pid). The kernel scheduler allocates the CPU to different processes by maintaining a data structure containing the aforementioned PCBs. Other resources, such as memory and I/O, need to be managed and allocated as well, though not necessarily by the kernel scheduler.

Syscalls

Generally, an OS provides a library or API that implements process management syscalls. For Unix-based operating systems, this usually resides as part of the C standard library (libc). As a result, full documentation is available through the man pages.

void exit(int status)

exit terminates the calling process with the exit code specified by status. Generally speaking, exit code 0 means the code executed without any error, while non-zero exit codes indicate otherwise. Rarely will you see a C program where main directly calls exit since the OS library will call exit for you once main returns.

pid_t fork(void)

fork creates a new process by copying the current process. Typically, the process created from fork is known as the child process, while the process calling fork is known as the parent process. The parent and child processes are identical in many ways (e.g. address space) except for the pid. More differences are listed on the man pages. The return type of fork is pid_t, which is a signed integer. If the return value is greater than 0, this means the current process is the parent process. On the other hand, if the return value is 0, this means the current process is the child process. When the return value is -1, an error has occurred; the current process remains (i.e. no new child process has been created).

A typical workflow you might see is:

// fork_simple.c
int main() {
    pid_t fork_ret = fork();
    if (fork_ret > 0) {
        /* parent process logic */
    } else if (fork_ret == 0) {
        /* child process logic */
    } else {
        /* error handling */
    }
}
exec

exec changes the program being run by the current process. A key distinction is that unlike fork, exec does not create a new process. In Unix-based operating systems, exec is a family of functions with different parameters, which is why a full method header is not given. For a full list, visit the man page.

pid_t wait(int *wstatus)

wait waits for a child process to finish. On success, it returns the pid of the terminated child process, while returning -1 on an error. wstatus is used to store status information if not NULL.

int kill(pid_t pid, int sig)

kill sends a signal, an interrupt-like notification, to another process. SIGINT (ctrl-C), SIGTERM (kill on command line), and SIGSTOP (Ctrl-Z) are all types of signals you may be familiar with. When a process receives a signal, it has a defined behavior known as the signal handler. A custom signal handler can be defined for most signals except for SIGKILL and SIGSTOP using sigaction.

Fork and Friends

How many new processes (not including the original process) are created when the following program is run? Assume all fork calls succeed.

// fork_loop.c
int main(void) {
    for (int i = 0; i < 3; ++i)
        pid_t fork_ret = fork();
    return 0;
}
Solution

Seven processes will be created. Newly forked processes will continue to execute the loop from wherever the parent process left off.

What are the possible outputs when the following program is run?

// fork_stack.c
int main(void) {
    int stuff = 5;
    pid_t fork_ret = fork();
    printf("The last digit of pi is %d\n", stuff);
    if (fork_ret == 0)
        stuff = 6;
    return 0;
}
Solution

fork creates a new process, meaning the address spaces are identical but not shared. As a result, changing stuff in the child process will have no effect on the value of stuff in the parent process, regardless of the execution order of the two processes. This means the following will be printed:

The last digit of pi is 5.
The last digit of pi is 5.

However, it is possible the fork doesn’t succeed as no such assumption was made. As a result, the child process would never be created, only resulting in one statement being printed.

What are the possible outputs when the following program is run?

// fork_heap.c
int main(void) {
    int* stuff = malloc(sizeof(int));
    *stuff = 5;
    pid_t fork_ret = fork();
    printf("The last digit of pi is %d\n", *stuff);
    if (fork_ret == 0)
        *stuff = 6;
    return 0;
}
Solution

The heap is part of the address space like the stack, meaning the heap will not be shared between the parent and child process after a fork As a result, the possible outputs are the exact same as the previous question: two lines of The last digit of pi is 5 if fork succeeds, one line otherwise.

What are the possible outputs when the following program is run? Assume the child process has pid of 162162.

// fork_wait.c
int main(void) {
    pid_t fork_ret = fork();
    int exit;
    if (fork_ret != 0)
        wait(&exit)
    printf("Hello World: %d\n", fork_ret);
    return 0;
}
Solution

The parent process will wait until the child process completes, meaning it will not print until after the child process. As a result, the program will output:

Hello World: 0
Hello World: 162162

Note that the order matters here. If fork fails, then the program will print:

Hello World: -1

since fork returns -1.

Does the following program print all numbers from 0 to 9 as well as the output of running ls? If not, what is the minimal code change to accomplish this? Assume all syscalls succeed.

// fork_exec.c
int main(void) {
    char** argv = (char**) malloc(3 * sizeof(char*));
    argv[0] = "/bin/ls";
    argv[1] = ".";
    argv[2] = NULL;
    for (int i = 0; i < 10; i++) {
        printf("%d\n", i);
        if (i == 3) {
            execv("/bin/ls", argv);
        }
    }
    return 0;
}
Solution

Currently, the program stops after printing 3, giving an output of:

0
1
2
3
<output of ls>

When execv is run, the entire process image is overwritten, meaning the rest of the loop will not execute. To achieve the desired output, we can fork and exec only in the child process to make sure the parent process continues the loop.

int main(void) {
    char** argv = (char**) malloc(3 * sizeof(char*));
    argv[0] = "/bin/ls";
    argv[1] = ".";
    argv[2] = NULL;
    for (int i = 0; i < 10; i++) {
        printf("%d\n", i);
        if (i == 3) {
            pid_t fork_ret = fork();
            if (fork_ret == 0)
                execv("/bin/ls", argv);
        }
    }
    free(argv);
    return 0;
}

Signal Handling

The following are a list of standard POSIX signals.

Signal Value Action Comment
SIGHUP 1 Terminate Hangup detected on controlling terminal or death of controlling process
SIGINT 2 Terminate Interrupt from keyboard (Ctrl-C)
SIGQUIT 3 Core Dump Quit from keyboard (Ctrl-)
SIGILL 4 Core Dump Illegal Instruction
SIGABRT 6 Core Dump Abort signal from abort(3)
SIGFPE 8 Core Dump Floating point exception
SIGKILL 9 Terminate Kill signal
SIGSEGV 11 Core Dump Invalid memory reference
SIGPIPE 13 Terminate Broken pipe: write to pipe with no readers
SIGALRM 14 Terminate Timer signal from alarm(2)
SIGTERM 15 Terminate Termination signal
SIGUSR1 30,10,16 Terminate User-defined signal 1
SIGUSR2 31,12,17 Terminate User-defined signal 2
SIGCHLD 20,17,18 Ignore Child stopped or terminated
SIGCONT 19,18,25 Continue Continue if stopped
SIGSTOP 17,19,23 Stop Stop process
SIGTSTP 18,20,24 Stop Stop typed at tty
SIGTTIN 21,21,26 Stop tty input for background process
SIGTTOU 22,22,27 Stop tty output for background process

Overriding SIGSTOP and SIGKILL is disabled. Why?

Solution

If a process were to override their signal handlers to ignore SIGSTOP and SIGKILL, it can run a malicious process forever, posing substantial threats to the operating system and rest of the processes.

What are the different ways you can use the keyboard to cause the program to exit? Assume the program is run in a bash shell.

// exit_signals.c
void sigquit_handler(int sig) {
    if (sig == SIGINT || sig == SIGQUIT)
        exit(1);
}
void sigint_handler(int sig) {
    if (sig == SIGINT)
        signal(SIGINT, sigquit_handler);
}
int main() {
    signal(SIGQUIT, sigquit_handler);
    signal(SIGINT, sigint_handler);
    while (1) {
        printf("Sleeping for a second ...\n");
        sleep(1);
    }
}
Solution

In main, we initialize SIGINT and SIGQUIT to custom handlers. SIGQUIT’s new handler exits after checking sig, so pressing Ctrl-\ will cause the program to exit. When the program receives SIGINT the first time, it will redefine the signal handler to be sigquit handler. As a result, Ctrl-C needs to be pressed twice or Ctrl-C and Ctrl-\.

PintOS Lists

Consider the following code, which sums up the items of a traditional linked list.

// ll.c
struct ll_node {
    int value;
    struct ll_node *next;
};

/* Returns the sum of a linked list. */
int ll_sum(ll_node *start) {
    ll_node *iter;
    int total = 0;
    for (iter = start; iter != NULL; iter = iter->next)
        total += iter->value;
    return total;
}

Fill in the blanks below that emulates the above code but for PintOS lists (i.e. a list summer).

// pl.c
/* Given a struct list, returns a reference to the first list_elem in the list. */
struct list_elem *list_begin(struct list *lst);

/* Given a struct list, returns a reference to the last list_elem in the list. */
struct list_elem *list_end(struct list *lst);

/* Given a list_elem, finds the next list_elem in the list. */
struct list_elem *list_next(struct list_elem *elem);

/* Converts a pointer to list element LIST_ELEM into a pointer to the structure
   that LIST_ELEM is embedded inside. You must also provide the name of the
   outer structure STRUCT and the member name MEMBER of the list element. */
STRUCT *list_entry(LIST_ELEM, STRUCT, MEMBER);

struct pl_node {
    int value;
    struct list_elem elem;
};

/* Returns the sum of a pintos-style list of pl_nodes. */
int pl_sum(struct list *lst) {
    struct list_elem *iter;
    struct pl_node *temp;
    int total = 0;
    for (______________________; _____________________; ______________________) {
        temp = list_entry(__________________________);
        ____________________;
    }
    return total;
}
Solution
    for (iter = list_begin(lst); iter != list_end(lst); iter = list_next(iter)) {
        temp = list_entry(iter, struct pl_node, elem);
        total += temp->value;
    }

If you need more help with PintOS Lists abstraction, check out lib/kernel/list.h in your PintOS source code. For a deeper dive into the design, check out this article on Linux’s doubly linked lists, which are what PintOS lists are based off of.