Table of Contents

Study Guide 0: C, x86

C

C is often the programming language of choice in operating systems. C grants programmers low-level access to memory which is very useful. You’ll be reading and writing a lot of C code throughout this class.

Types

C is statically typed where types are known at compile time. However, C is also weakly typed meaning you can cast between any types. This gives the necessary flexibility for working with low-level memory, but it also opens up many avenues for errors if you’re not careful.

The primitive types include char, short, int, long, float, and double. The size of data types may vary depending on the operating system, so it’s best to check using the operator sizeof.

Arrays, denoted with [] (e.g. int[]), are contiguous regions of memory of fixed size. Each element is the size of the data type corresponding to that array. A string in C is just an array of characters (char) with a last element as null to indicate the end of the string.

Users can define compound data types using structs which are contiguous pieces of memory comprised of multiple other data types.

Pointers are references that hold the address of an object in memory. Fundamentally, pointers are just unsigned integers of size equal to the number of bits supported by the operating system. Prefixing a pointer with * will return the value at the memory address that the pointer is holding. On the other hand, prefixing a variable with & will return the memory address of the variable. Incrementing (decrementing) a pointer will change the pointer value by the corresponding number multiplied by the size of the type of the pointer.

Memory

A typical C program’s memory is divided into five segments.

Segment	Purpose
Text	Machine code of the compiled program
Initialized Data	Initialized global and static memory
Uninitialized Data	Uninitialized global and static memory
Heap	Dynamically allocated memory
Stack	Local variables and argument passing

In general, memory can be thought of as a giant array with elements of one byte where a memory address indexes into this array.

Unlike stack memory, heap memory needs to be explicitly managed by the user. Memory can be allocated in the heap segment using malloc, calloc, or realloc. These all return a pointer to a chunk of memory in the heap that is the size of the amount requested. When the memory is no longer being used, it needs to be explicitly released using free.

GNU Debugger (GDB)

GNU Debugger (GDB) is a powerful tool to debug your programs. While you may have skidded by CS 61C without using it, the complicated codebase for this class will require you to be proficient with GDB. Help will not be given to those who are not able to use GDB.

The general workflow of using GDB is as follows.

Compile the program using the -g flag.
Start GDB using gdb <executable>.
Set breakpoints using break <linenum>. You can also break at functions by using break <func>.
Run program with run. If the program takes in arguments, pass in those (i.e. run arg1 arg2).
Once you hit your breakpoint, examine using print. Other commands like display, watch, set, and many more will come in handy. You can also step line by line using next.

While you don’t have to memorize all the GDB commands, you’ll grow familiar with them the more you use them. When looking for certain functionality, check out the GDB User Manual.

Make

GNU Make is program that is commonly used to build other programs. When you run make, GNU Make looks in your current directory for a file named Makefile and executes the commands inside, according to the make file language.

my_first_makefile_rule:
    echo "Hello world"

The building block of GNU Make is a rule. We just created a rule, whose target is my_first_makefile_rule and the recipe is echo "Hello world". When we run make my_first_makefile_rule, GNU Make will execute the steps in the recipe and print "Hello world".

Rules can also contain a list of dependencies, which are other targets that must be executed before the rule. In this example, the task_two rule has a single dependency: task_one. If we run make task_two, then GNU Make will run task_one and then task_two.

task_one:
    echo 1
task_two: task_one
    echo 2

More details about Make

If you just run make with no specified target, then GNU Make will build the first target.
By convention, target names are also file names. If a rule’s file exists and the file is newer than all of its dependencies, then GNU Make will skip the recipe. If a rule’s file does not exist, then the timestamp of the target would be the beginning of time. Otherwise, the timestamp of the target is the modification time of the corresponding file.
When you run make clean, the clean recipe is executed every time, because a corresponding file named clean is never actually created. (You can also use the .PHONY feature of the make file language to make this more robust.)
Makefile recipes must be indented with tabs, not spaces.
You can run recipes in parallel with make -j 4 (specify the number of parallel tasks).
GNU Make creates automatic rules if you don’t specify them. For example, if you create a file named my_program.c, GNU Make will know how to compile it if you run make my_program.
There are many features of the make file language. Special variables like $@ and $< are commonly used in Makefiles. Look up the documentation online for more!

PintOS, the educational operating system that you will use for projects, has a complex build system written with Makefiles. Understanding GNU Make will help you navigate the PintOS build system.

C Concept Check

Consider a valid double pointer char** dbl_char in a 32-bit system. What is sizeof(*dbl_char)?

Solution

sizeof(*dbl_char) == 4; dbl_char is a double pointer, so dereferencing it once will still give a pointer. A 32-bit system means memory addresses will be 32 bits or equivalently 4 bytes.

Bonus question: Does sizeof(*dbl char) error if dbl char == NULL

Solution

No, since type sizes are known at compile time

Consider strings initialized as char* a = "4103 is the best"; char b[] = "4103 is the best"; Are a and b different?
Solution
Yes. Since it’s defined as a literal, a points to a string literal in a read only section of memory. On the other hand, b resides on the stack. It is equivalent to declaring:
char b[] = {'4', '1', '0', '3', ' ', 'i', 's', ' ', 'b', 'e', 's', 't', 0}.

Consider the following struct declaration:
struct point {
    int x;
    int y;
};
Point out a few differences between
struct point p;
printf("%d", p.x = 1);
and
struct point* p;
printf("%d", p->x = 1);
Solution

The first one uses the stack to allocate a struct that is the size of two ints. On the other hand, the second one declares a single pointer to a struct point on the stack, which is the size of a single int (assuming sizeof(struct point*) == sizeof(int)). Also, the second one will likely segfault, since the pointer is uninitialized.

Suppose you have an integer array int nums[3] = {152, 161, 162}. What are the differences between nums, &nums, and &nums[0]?

Solution

The three will all return memory addresses that are equivalent to each other. However, there is a subtle difference in that nums and &nums[0] point to the first element (i.e. an integer), while &nums points to the entire array (i.e. an array of integers). As a result, if you increment each of them by one, nums and &nums[0] will be incremented by sizeof(int), while &nums will be incremented by 3 * sizeof(int).

Calling a Function in another File

Consider a C program consisting of two files:

// my_app.c
#include <stdio.h>
int main(int argc, char** argv) {
    char* result = my_helper_function(argv[0]);
    printf("%s\n", result);
    return 0;
}

// my_lib.c
char* my_helper_function(char* string) {
    int i;
    for (i = 0; string[i] != '\0'; ++i) {
        if (string[i] == '/') {
            return &string[i + 1];
        }
    }
    return string;
}

You build the program with gcc my_app.c my_lib.c -o my_app.

What is the bug in the above program? (Hint: it’s in my_app.c.)

Solution

my_helper_function is not declared in my_app.c, so the compiler (incorrectly) guesses that its return type is int. Because sizeof(int) == 4 but sizeof(char*) may be not equal to 4, this may result in a segfault.

How can we fix the bug?

Solution

Declare my_helper_function with the proper signature above main.

Including a Header File

Suppose we add a header file to the above program and revise my_app.c to #include it.

// my_app.c
#include <stdio.h>
#include "my_lib.h"
int main(int argc, char** argv) {
    char* result = my_helper_function(argv[0]);
    printf("%s\n", result);
    return 0;
}

// my_lib.h
char* my_helper_function(char* string);

You build the program with gcc my_app.c my_lib.c -o my_app.

Suppose that we made a mistake in my_lib.h, and declared the function as char* my_helper_function(void);. Additionally, the author of my_app.c sees the header file and invokes the function as my_helper_function(). Would the program still compile? What would happen when the function is called?

Solution

The program would compile but the compiler would not pass an argument to the callee even though it is expecting one, causing it to read some value on the stack (%ebp offset by 8).

What could the author of my_lib.c do to make such a mistake less likely? Also

Solution

#include "my_lib.h" at the top of my_lib.c.

Using `#define`

Assume the following file contents:

//  app.c
#include <stdio.h>
#include "lib.h"

int main(int argc, char** argv) {
    helper_args_t helper_args;
    helper_args.string = argv[0];
    helper_args.target = '/';
    char* result = helper_func(&helper_args);
    printf("%s\n", result);
    return 0;
}

// lib.h
typedef struct helper_args {
#ifdef ABC
    char* aux;
#endif
    char* string;
    char target;
} helper_args_t;

char* helper_func(helper_args_t* args);

// lib.c
#include "lib.h"

char* helper_func(helper_args_t* args) {
    int i;
    for (i = 0; args->string[i] != '\0'; i++)
        if (args->string[i] == args->target)
            return &args->string[i + 1];
    return args->string;
}

You build the program on a 64-bit machine as follows.
> gcc -c app.c -o app.o
> gcc -c lib.c -o lib.o
> gcc app.o lib.o -o app
What is the size of a struct helper_args_t?

Solution

16 bytes. Since ABC is not defined, there is only char* string and char target, which is a total of 8 + 1 = 9 bytes. However, GCC pads the structure for alignment, so it fills in the remaining 7 bytes after char target.

Suppose you add #define ABC at the top of lib.h. What is the size of a struct helper_args_t?

Solution

24 bytes. Now that ABC is defined, there is an additional 8 bytes from char* aux.

Suppose we build the program in a different way with the original files (i.e. none of the changes from previous question apply):
> gcc -DABC -c app.c -o app.o
> gcc -c lib.c -o lib.o
> gcc app.o lib.o -o app
The program will now exhibit undefined behavior. What is the issue?

Solution

app.c is compiled with ABC defined (as seen from the -DABC flag), but lib.c is compiled without it. As a result, they have differing definitions of helper_args_t. In main, argv[0] is put in to address of helper_args + 8 bytes since app.c was compiled with the char* aux member, making the char* string member be 8 bytes offset from args. However, in helper_func, the program will access the address of args when it accesses args->string since it was compiled without the char* aux member, making the char* string the first member. This leads to undefined behavior since char* aux was never initialized in main, so it contains a garbage value.

An important observation is that this won’t always lead to a segfault. The value in char* aux member is garbage, meaning it could potentially be a non-illegal memory access by complete chance.

Using #include Guards

Suppose we split my_lib.h into two files:

// my_helper_function.h
#include "my_helper_args.h"
char* my_helper_function(helper_args_t* args);

// my_helper_args.h
typedef struct helper_args {
    char* string;
    char target;
} helper_args_t;

What happens if we include the following two lines at the top of my_app.c?
#include "my_helper_function.h"
#include "my_helper_args.h"
Solution

The compiler reports an error because helper_args_t is now defined twice.

How can we fix this? (Hint: look up #include guards)
Solution
Use an #include guard:
// my_helper_function.h
#ifndef MY_HELPER_FUNCTION_H_
#define MY_HELPER_FUNCTION_H_
#include "my_helper_args.h"
char* my_helper_function(helper_args_t* args);
#endif
Similar for my_helper_args.h, but using a different, unique identifier.

Debugging Segmentation Faults

Observe the following program from singer.c that aims to sort a string using quicksort. The program will use a string provided as the argument or defaults to “IU is the best singer!” if none is provided.

When the program is compiled and run, we get the following output. Use GDB to fix the issue.

> ./singer "Taeyeon is the best singer!"
Unsorted: "Taeyeon is the best singer!"
Sorted : "   !Tabeeeeeghiinnorssstty"
> ./singer
Unsorted: "IU is the best singer!"
Segmentation fault (core dumped)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
// singer.c
void swap(char* a, int i, int j) {
    char t = a[i];
    a[i] = a[j];
    a[j] = t;
}

int partition(char* a, int l, int r) {
    int pivot = a[l];
    int i = l, j = r+1;

    while (1) {
        do {
            ++i;
        } while (a[i] <= pivot && i <= r);

        do {
            --j;
        } while (a[j] > pivot);

        if (i >= j)
            break;

        swap(a, i, j);
    }

    swap(a, l, j);
    return j;
}

void sort(char* a, int l, int r) {
    if (l < r) {
        int j = partition(a, l, r);
        sort(a, l, j-1);
        sort(a, j+1, r);
    }
}

int main(int argc, char** argv) {
    char* a = NULL;

    if (argc > 1)
        a = argv[1];
    else
        a = "IU is the best singer!";

    printf("Unsorted: \"%s\"\n", a);
    sort(a, 0, strlen(a) - 1);
    printf("Sorted : \"%s\"\n", a);
}

We want to debug the program using GDB. How should we compile the program?

Solution

gcc -g singer.c -o singer. We need a -g flag for debugging information. The -o simply controls the name of the executable that is created which defaults to a.out if not specified.

When running the program without any arguments (i.e. using the default argument), what line does the segfault happen? Describe the memory operations happening in that line.
Solution
First, we compile the program using the command from the previous problem.

gcc -g singer.c -o singer

The first step in debugging a segfault is seeing which line it occurred in. You can get to the line by letting the program run until it encounters the fault. To get a more holistic view, you can also get the backtrace of the error using backtrace.
> gdb singer
(gdb) run
Starting program: /home/runner/intro/singer
Unsorted: "IU is the best singer!"
Program received signal SIGSEGV, Segmentation fault.
0x5646308006c8 in swap (a=0x564630800904 "IU is the best singer!", i=1, j=21) at singer.c:4
4 a[i] = a[j];
(gdb) backtrace
#0 0x5646308006c8 in swap (a=0x564630800904 "IU is the best singer!", i=1, j=21) at singer.c:4
#1 0x564630800773 in partition (a=0x564630800904 "IU is the best singer!", l=0, r=21) at singer.c:24
#2 0x5646308007bd in sort (a=0x564630800904 "IU is the best singer!", l=0, r=21) at singer.c:34
#3 0x564630800861 in main (argc=1, argv=0x7ffd04ac7098) at singer.c:48
This tells us the segfault occurred on line 4 in partition function, which is a[i] = a[j]. This line performs two memory operations: a read from a[j] and a write to a[i].

Run the program with and without an argument and observe the memory address of a in the seg-faulting line. Why are the memory addresses so different?
Solution
Using GDB, break at line 4 where the segfault occurs.
gdb singer
(gdb) break 4
Breakpoint 1 at 0x6ab: file singer.c, line 4.
(gdb) run
Starting program: /home/runner/intro/singer
Unsorted: "IU is the best singer!"
Breakpoint 1, swap (
a=0x5624e4600904 "IU is the best singer!", i=1, j=21) at singer.c:4
4 a[i] = a[j];
(gdb) print a
$1 = 0x5624e4600904 "IU is the best singer!"
(gdb) run "Taeyeon is the best singer!"
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: singer "Taeyeon is the best singer!"
Unsorted: "Taeyeon is the best singer!"
breakpoint 1, swap (
  a=0x7ffcce01bfe5 "Taeyeon is the best singer!", i=1, j=26)
  at singer.c:4
(gdb) print a
$2 = 0x7ffcce01bfe5 "Taeyeon is the best singer!"
When no argument is passed in, the program defaults to a statically defined string, meaning it resides in the read-only segment. On the other hand, when a specific string is passed in, it is passed in on the stack as an argument. As a result, the two addresses differ significantly.

How should the code be changed to fix the segfault?

Solution

The segfaulting line occurs in the swap function which is called by partition in lines 24 and 27. However, notice that the read from a[j] is not a problem as the same read is performed in line 19. As a result, this indicates that a write to a[i] is the issue.

This lines up with our observations from the previous question. Since the data is in a read-only segment, it makes sense that the program cannot write to it. As a result, we need to move the initial string to somewhere else on the address space (e.g. heap). To do this, we need to allocate the appropriate amount of space and then copy over the string. This could be accomplished with a malloc followed by a strcpy, or strdupa which does the same thing.

Replacing line 48 with a = strdup("IU is the best singer!") will fix the segfault. Note, that in this case we should call free(a) at the very end of main as well. Managing memory in C correctly is difficult and often quite convoluted…

x86

x86 is a family of instruction set architectures (ISA) developed by Intel. x86 is based on the complex instruction set computer (CISC) architecture. x86 being a family of ISAs means there are a variety of different dialects of this language. In this class, we will focus on the 32-bit ISA called IA-32 or i386 which is the common denominator for all 32-bit x86 processors and hence used in PintOS. While heavily related, this should not be confused with the 32-bit microprocessor i386, also known as Intel 80386). However, we will still occasionally mention the 64-bit ISA x86-64 as you may come across during some non-PintOS assignments.

Registers

Registers are small storage spaces directly on the processor, allowing for fast memory access. The x86 architecture has only has 8 general purpose registers (GPRs).

Register	Name	Purpose
`ax`	Accumulator	I/O port access, arithmetic, interrupt calls
`bx`	Base	Base pointer for memory access
`cx`	Counter	Loop counting, bit shifts
`dx`	Data	I/O port access, arithmetic, interrupt calls
`sp`	Stack Pointer	Top address of stack
`bp`	Base Pointer	Base address of stack
`si`	Source Index	Source for stream operations (e.g. string)
`di`	Destination Index	Destination for stream operations (e.g. string)

Due to x86’s 16-bit history, the GPRs started as 16-bits and were extended to 32-bits with the e prefix (e.g. eax for ax) and 64-bits with the r prefix (e.g. rax for ax). Each 16-bit GPR can be addressed by the 8-bit LSB (i.e. lower 8 bits) by replacing the last letter with l (e.g. al for ax). ax, cx, dx, and bx can also be addressed by the 8-bit MSB (i.e. higher 8 bits) by replacing the last letter with h (e.g. ah for ax).

x86 has an instruction pointer register ip. Like the GPRs, it is extended to 32-bits with the e prefix and 64-bits with r prefix. ip is a special register since it cannot be read and modified like a GPR (i.e. cannot use vanilla memory instructions). The address stored in ip always points to the next instruction to be executed. The value is updated by the processor based on the executed commands.

There are other registers such as segment, EFLAGS, control, debug, test, floating point and many more. However, the GPRs and the instruction pointer register are the main ones you will work with in this class.

Syntax

Although IA-32 specifies the registers and instructions, there are two different syntaxes: Intel and AT&T. In this class, we will use the AT&T syntax because it is used by the GNU Assembler, the assembler for GCC and thus the standard for PintOS and most Unix-like operating systems like Linux. The two syntaxes have significant differences, so make sure to check which syntax is being used when referencing documentation.

Registers are preceded by a percent sign (e.g. %eax for eax). Immediate values such as constants are preceded by a dollar sign (e.g. $4103 for the constant 4103).

The general structure of a line of code is inst src, dest. For example, movl %ebx, %eax will move the contents of %ebx into %eax.

Addressing memory uses the syntax of offset(base, index, scale) where base and index are registers, and offset and scale are integers (scale can only take on values of 1, 2, 4, or 8). This accesses the data at memory address base + index * scale + offset. All parameters are optional, though most cases you will see will have base and offset. The following are some use cases of addressing memory with different instructions.

Command	Explanation
`mov 8(%ebx), %eax`	Move contents from the address `ebx + 8` into `eax`
`mov %ecx, -4(%esi, %ebx, 8)`	Move contents in `ecx` into address `esi + 8 * ebx - 4`

One exception to the above syntax is when using the lea instruction which stands for “load effective address”. lea operates directly on the memory addresses themselves and not the contents contained in the memory addresses. For instance, lea 8(%ebx), %eax would put the address ebx + 8 into eax, not the contents at that address.

Each instruction also has a suffix that signifies the operand size. b means byte (8 bits), w means word (16 bits), and l means long (32 bits). These are used when the the intended data size is ambiguous (e.g. mov $0, (%esp)).

Command	Explanation
`movb $0, (%esp)`	Zero out a single byte from the stack pointer
`movw $0, (%esp)`	Zero out two bytes from the stack pointer
`movl $0, (%esp)`	Zero out four bytes from the stack pointer

The suffixes aren’t always necessary when the intended data size can be inferred in some cases (e.g. using a 32-bit register as an operand means a 32-bit operation), but it is good practice to use them regardless.

Calling Convention

Calling convention is a procedure for how to call and return from functions. They specify stack management, passing in parameters, any registers that need to be saved, stack management, returning values, and more. There are two sets of rules: one for the caller of the function and one for the callee of the function.

Calling conventions are heavily tied into the language that’s being compiled. In this class, we will use the calling convention defined by the i386 System V ABI as the default calling convention.

Caller

Before calling the function (i.e. prologue), the caller needs to:

Save caller-saved GPRs (eax, ecx, edx) onto the stack if needed after the function call.
Push parameters onto the stack in reverse order (i.e. store first parameter at the lowest address). Add necessary padding before the parameters to ensure a 16-byte alignment.

Then the caller calls the function by pushing the return address onto the stack and jumping to the function. Once the function call returns (i.e. epilogue), the caller needs to

Remove the parameters from the stack.
Restore caller-saved GPRs (if any) from the prologue.

Callee

Before executing any function logic (i.e. prologue), the callee needs to:

Push ebp onto the stack and set ebp to be the new esp (i.e. stack pointer after pushing the ebp). This marks the start of a new stack frame.
Allocate stack space for any local variables (i.e. decrement esp).
Save callee-saved GPRs (ebx, edi, esi) onto the stack if used during the function call.

Then the callee performs the function logic. Before returning (i.e. epilogue), the callee needs too:

Store the return value in eax.
Restore callee-saved GPRs (if any) from the prologue.
Deallocate local variables. While subtracting the correct amount will technically work, a less error prone way is to set esp to be the current ebp, effectively clearing the stack frame.
Restore caller’s ebp from the stack.
Return from the function by popping the return address pushed by the caller in its prologue and jumping to it.

Instructions

There are a few commonly used instructions that will show up in nearly every assembly code due to the calling convention.

Instruction	Purpose	Equivalent commands
`pushl src`	Push src onto stack	`subl $4, %esp` `movl src, (%esp)`
`popl dest`	Pop from stack into dest	`movl (%esp), dest` `addl $4, %esp`
`call addr`	Push return address onto stack and jump to addr	`pushl %eip` `jump addr`
`leave`	Restore EBP and ESP to previous stack frame	`movl %ebp, %esp` `popl %ebp`
`ret`	Pop return address from stack and jump to it	`popl %eip`

Keep in mind the effective column shows what the instruction is doing, but it may not exactly be what the processor does. In fact, call and ret access eip, while accessing it directly using mov is not allowed.

Optimizations

When compiling with some optimizations using GCC (e.g. -O3 flag), you may notice some violations of these calling conventions, most notably the saving of the base pointer. This is because the base pointer is not a necessity: the stack pointer is sufficient to address anything we need. See also this document for more information.

x86 Concept Check

Between sp and bp, which has a higher memory address?

Solution

bp has a higher memory address. The stack grows downwards, meaning the top of the stack moves towards lower addresses.

Write three different ways to clear the eax register (i.e. store a 0).
Solution
movl $0, %eax
subl %eax, %eax
xorl %eax, %eax

True or False: Right before the caller jumps to the desired function, the stack must be 16-byte aligned.

Solution

False. The stack needs to be 16-byte aligned right after the parameters have been pushed onto the sack. Return address is pushed right before jumping.

Reverse Engineering

Reading Disassembly

// file.c
int global = 0;
int callee(int x, int y) {
    int local = x + y;
    return local + 1;
}
void caller(void) {
    global = callee(3, 4);
}

When gcc compiles this file, with optimizations off, it outputs:

; file.s:
callee:
    pushl %ebp
    movl %esp, %ebp
    subl $16, %esp
    movl 8(%ebp), %edx
    movl 12(%ebp), %eax
    addl %edx, %eax
    movl %eax, -4(%ebp)
    movl -4(%ebp), %eax
    addl $1, %eax
    leave
    ret
caller:
    pushl %ebp
    movl %esp, %ebp
    pushl $4
    pushl $3
    call callee
    addl $8, %esp
    movl %eax, global
    nop
    leave
    ret

What does each instruction do? Mark the prologue(s), epilogue(s), and call sequence(s).

Solution

First three instructions of callee are the prologue: save esp and allocate space for locals.

The next two movl instructions read the function arguments off of the stack into registers.

The addl instruction computes x + y.

The next movl instruction stores the result at ebp - 4, the stack memory allocated for the local variable.

The next movl reads the value of local into a register, and the following addl instruction adds one to it. Now the return value is in eax.

The final two instructions are the epilogue: restore esp, pop the return address off the stack, and jump to it.

The first two lines of caller are the prologue: save esp.

The next four lines are the call sequence for calling callee: set up stack, call function, and clean up stack.

The next movl instruction stores the return value into the address of the global variable.

The nop appears to be an artifact of gcc; we’re compiling with optimizations off, so the compiler doesn’t optimize this out (although it would on any other optimization level)

The last two instructions are the the epilogue: restore esp, pop the return address of the stack and jump to it.

Pythagorean triplets

A Pythagorean triplet a < b < c satisfies the property $a^2 + b^2 = c^2$. triplet.s shown below returns the product of a Pythagorean triplet for which a + b + c = 1000. triplet.s has been assembled without optimizations based on a complete version of triplet.c given below as well.

Unused labels and directives have been omitted in triplet.s for simplicity.

// triplet.c
int main(void) {
    for (int a = 1; _____________) {
        int a2 = _____;
        for (int b = ________________) {
            int b2 = _____;
            int c = ____________;
            int c2 = _____;
            if (_____________) {
                return _________;
            }
        }
    }
    return 0;
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
; triplet.s
main:
    pushl %ebp
    movl %esp, %ebp
    subl $32, %esp
    movl $1, -4(%ebp)
    jmp .L2
.L7:
    movl -4(%ebp), %eax
    imull %eax, %eax
    movl %eax, -12(%ebp)
    movl -4(%ebp), %eax
    movl %eax, -8(%ebp)
    jmp .L3
.L6:
    movl -8(%ebp), %eax
    imull %eax, %eax
    movl %eax, -16(%ebp)
    movl $1000, %eax
    subl -4(%ebp), %eax
    subl -8(%ebp), %eax
    movl %eax, -20(%ebp)
    movl -20(%ebp), %eax
    imull %eax, %eax
    movl %eax, -24(%ebp)
    movl -12(%ebp), %edx
    movl -16(%ebp), %eax
    addl %edx, %eax
    cmp %eax, -24(%ebp)
    jne .L4
    movl -4(%ebp), %eax
    imull -8(%ebp), %eax
    imull -20(%ebp), %eax
    jmp .L5
.L4:
    addl $1, -8(%ebp)
.L3:
    cmpl $666, -8(%ebp)
    jle .L6
    addl $1, -4(%ebp)
.L2:
    cmpl $333, -4(%ebp)
    jle .L7
.L5:
    movl $0, %eax
    leave
    ret

What is the memory address of a relative to the base pointer?

Solution

triplet.c shows that a is defined to be a value of 1. In line 6 of triplet.s, we see a local variable being defined on the stack with the constant 1. Since a is the first variable that needs to be defined in the code, line 6 must correspond to a, meaning a is defined at ebp - 4 (i.e. four bytes below the base pointer).

What is the end condition for the outer loop using a?

Solution

After a is defined on line 6, we jump to .L2 in line 7. This brings us to line 41 where we compare a (i.e. -4(%ebp)) with 333 using the cmpl instruction, which subtracts 333 from a. In line 43, we jump to .L7 if a - 333 is less than or equal to 0. Otherwise, we proceed with the rest of the code which loads 0 into eax and performs a return procedure by executing .L5. Therefore, a must be greater than 333 for us to exit the outer loop.

What are the memory addresses of the rest of the local variables (a2, b, b2, c, c2) relative to the base pointer?

Solution

Since the code is unoptimized, we trace the code with the appropriate jumps to see where each local variable is defined.

After a is defined, we jump to .L7 where we see in line 11 that a2 is the next variable defined at ebp - 12. Afterwards, b is defined at ebp - 8 in line 13. This is confirmed by the similar loop structure as the code jumps to .L3 where cmpl and jle exists as seen for a.

The code then jumps to .L6 where b2, c, and c2 are defined at 16, 20, and 24 bytes below the base pointer in lines 18, 22, and 25, respectively. The stack viewed from the current base pointer looks like:

stack

ebp

a

b

a2

b2

c

An important observation is how local variables are not always placed on the stack in the order they are defined in, so it’s important to not make such an assumption and instead trace the code. This is due to GCC’s stack slot assignment optimizations.

stack
ebp
a
b
a2
b2
c

Fill in the missing code for triplet.c.
Solution
// triplet.c
int main(void) {
    for (int a = 1; a <= 333; ++a) {
        int a2 = a * a;
        for (int b = a; b <= 666; ++b) {
            int b2 = b * b;
            int c = 1000 - a - b;
            int c2 = c * c;
            if (a2 + b2 == c2) {
                return a * b * c;
            }
        }
    }
    return 0;
}
A breakdown of how this code was compiled can be found on this Compiler Explorer snippet which highlights corresponding lines between C and x86 and shows the exact compiler flags used. Feel free to play around with different optimizations and other compiler flags to see how the corresponding x86 code changes.

Stack Frame

// foobar.c
int p = 0;

int bar(int x, int y, int z) {
    int w = x + y - z;
    return w + 1;
}

void foo(int a, int b) {
    p = a + b + bar(3, 4, 5);
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
; foobar.s
p:
    .zero 4
bar:
    pushl %ebp
    movl %esp, %ebp
    subl $16, %esp
    movl 8(%ebp), %edx
    movl 12(%ebp), %eax
    addl %edx, %eax
    subl 16(%ebp), %eax
    movl %eax, -4(%ebp)
    movl -4(%ebp), %eax
    addl $1, %eax
    leave
    ret
foo:
    pushl %ebp
    movl %esp, %ebp
    pushl %ebx
    subl $4, %esp
    movl 8(%ebp), %edx
    movl 12(%ebp), %eax
    leal (%edx,%eax), %ebx
    subl $4, %esp
    pushl $5
    pushl $4
    pushl $3
    call bar
    addl $16, %esp
    addl %ebx, %eax
    movl %eax, p
    nop
    movl -4(%ebp), %ebx
    leave
    ret

Which lines of the code correspond to a caller/callee prologue/epilogue?

Solution

The most apparent caller/callee pair is foo/bar. Lines 25-28 are the prologue of the caller that setup the arguments. Line 30 is the epilogue of the caller that cleans up the stack by simply moving up the stack pointer. Lines 5-7 are the prologue of the callee that push the base pointer and make room for local variables. While there’s only one local variable w in bar, the compiler still makes room for 16 bytes due to the default stack boundary of 16 bytes per GCC’s -mpreferred-stack-boundary flag. Lines 14-16 are the epilogue of the callee that restores the stack pointer and returns back to the caller.

It’s important to recognize foo is a function as well, meaning it will be in the callee position when it’s called by another function. Similar to when bar was the caller, lines 18-21 and lines 34-36 serve as the prologue and epilogue respectively when foo is the callee.

What does line 20 do in foobar.s? Why is it necessary?

Solution

Line 20 saves the ebx register by pushing it onto the stack. ebx is a callee-saved register, so foo is responsible for saving it before using it.

Why is edx not saved by foo before calling bar despite being used in bar?

Solution

General purpose registers (GPRs) are saved only if the register content needs to persist after the function call. edx was used as a temporary register for computation in lines 22-24. As a result, its value does not need to persist past the call of bar.

Draw the stack frame and contents of relevant registers after executing line 17 of foobar.s.

Solution

Stack Frame of foo’s Caller

Padding

b

a

Return Addr of foo’s Caller

EBP of foo’s Caller

EBX of foo’s Caller

5

4

3

Return Addr of foo

EBP of foo

2