Table of Contents
Study Guide 0: C, x86
C
C is often the programming language of choice in operating systems. C grants programmers low-level access to memory which is very useful. You’ll be reading and writing a lot of C code throughout this class.
Types
C is statically typed where types are known at compile time. However, C is also weakly typed meaning you can cast between any types. This gives the necessary flexibility for working with low-level memory, but it also opens up many avenues for errors if you’re not careful.
The primitive types include char
, short
, int
, long
, float
, and double
. The size of data types may vary depending on the operating system, so it’s best to check using the operator sizeof.
Arrays, denoted with []
(e.g. int[]
), are contiguous regions of memory of fixed size. Each element is the size of the data type corresponding to that array. A string in C is just an array of characters (char
) with a last element as null to indicate the end of the string.
Users can define compound data types using struct
s which are contiguous pieces of memory comprised of multiple other data types.
Pointers are references that hold the address of an object in memory. Fundamentally, pointers are just unsigned integers of size equal to the number of bits supported by the operating system. Prefixing a pointer with *
will return the value at the memory address that the pointer is holding. On the other hand, prefixing a variable with &
will return the memory address of the variable. Incrementing (decrementing) a pointer will change the pointer value by the corresponding number multiplied by the size of the type of the pointer.
Memory
A typical C program’s memory is divided into five segments.
Segment | Purpose |
---|---|
Text | Machine code of the compiled program |
Initialized Data | Initialized global and static memory |
Uninitialized Data | Uninitialized global and static memory |
Heap | Dynamically allocated memory |
Stack | Local variables and argument passing |
In general, memory can be thought of as a giant array with elements of one byte where a memory address indexes into this array.
Unlike stack memory, heap memory needs to be explicitly managed by the user. Memory can be allocated in the heap segment using malloc
, calloc
, or realloc
. These all return a pointer to a chunk of memory in the heap that is the size of the amount requested. When the memory is no longer being used, it needs to be explicitly released using free
.
GNU Debugger (GDB)
GNU Debugger (GDB) is a powerful tool to debug your programs. While you may have skidded by CS 61C without using it, the complicated codebase for this class will require you to be proficient with GDB. Help will not be given to those who are not able to use GDB.
The general workflow of using GDB is as follows.
- Compile the program using the
-g
flag. - Start GDB using
gdb <executable>
. - Set breakpoints using
break <linenum>
. You can also break at functions by usingbreak <func>
. - Run program with run. If the program takes in arguments, pass in those (i.e.
run arg1 arg2
). - Once you hit your breakpoint, examine using
print
. Other commands likedisplay
,watch
,set
, and many more will come in handy. You can also step line by line usingnext
.
While you don’t have to memorize all the GDB commands, you’ll grow familiar with them the more you use them. When looking for certain functionality, check out the GDB User Manual.
Make
GNU Make is program that is commonly used to build other programs. When you run make
, GNU Make looks in your current directory for a file named Makefile
and executes the commands inside, according to the make
file language.
my_first_makefile_rule:
echo "Hello world"
The building block of GNU Make is a rule. We just created a rule, whose target is my_first_makefile_rule
and the recipe is echo "Hello world"
. When we run make my_first_makefile_rule
, GNU Make will execute the steps in the recipe and print "Hello world"
.
Rules can also contain a list of dependencies, which are other targets that must be executed before the rule. In this example, the task_two
rule has a single dependency: task_one
. If we run make task_two
, then GNU Make will run task_one
and then task_two
.
task_one:
echo 1
task_two: task_one
echo 2
More details about Make
- If you just run make with no specified target, then GNU Make will build the first target.
- By convention, target names are also file names. If a rule’s file exists and the file is newer than all of its dependencies, then GNU Make will skip the recipe. If a rule’s file does not exist, then the timestamp of the target would be the beginning of time. Otherwise, the timestamp of the target is the modification time of the corresponding file.
- When you run
make clean
, theclean
recipe is executed every time, because a corresponding file namedclean
is never actually created. (You can also use the.PHONY
feature of the make file language to make this more robust.) - Makefile recipes must be indented with tabs, not spaces.
- You can run recipes in parallel with
make -j 4
(specify the number of parallel tasks). - GNU Make creates automatic rules if you don’t specify them. For example, if you create a file named
my_program.c
, GNU Make will know how to compile it if you runmake my_program
. - There are many features of the make file language. Special variables like
$@
and$<
are commonly used inMakefile
s. Look up the documentation online for more!
PintOS, the educational operating system that you will use for projects, has a complex build system written with Makefile
s. Understanding GNU Make will help you navigate the PintOS build system.
C Concept Check
Consider a valid double pointer
char** dbl_char
in a 32-bit system. What issizeof(*dbl_char)
?Solution
sizeof(*dbl_char) == 4
;dbl_char
is a double pointer, so dereferencing it once will still give a pointer. A 32-bit system means memory addresses will be 32 bits or equivalently 4 bytes.
Bonus question: Does sizeof(*dbl char) error if dbl char == NULL
Solution
No, since type sizes are known at compile time
Consider strings initialized as
char* a = "4103 is the best"; char b[] = "4103 is the best";
Area
andb
different?Solution
Yes. Since it’s defined as a literal,
a
points to a string literal in a read only section of memory. On the other hand,b
resides on the stack. It is equivalent to declaring:char b[] = {'4', '1', '0', '3', ' ', 'i', 's', ' ', 'b', 'e', 's', 't', 0}.
Consider the following struct declaration:
struct point { int x; int y; };
Point out a few differences between
struct point p; printf("%d", p.x = 1);
and
struct point* p; printf("%d", p->x = 1);
Solution
The first one uses the stack to allocate a struct that is the size of two
int
s. On the other hand, the second one declares a single pointer to a struct point on the stack, which is the size of a singleint
(assumingsizeof(struct point*) == sizeof(int)
). Also, the second one will likely segfault, since the pointer is uninitialized.
Suppose you have an integer array
int nums[3] = {152, 161, 162}
. What are the differences betweennums
,&nums
, and&nums[0]
?Solution
The three will all return memory addresses that are equivalent to each other. However, there is a subtle difference in that
nums
and&nums[0]
point to the first element (i.e. an integer), while&nums
points to the entire array (i.e. an array of integers). As a result, if you increment each of them by one,nums
and&nums[0]
will be incremented bysizeof(int)
, while&nums
will be incremented by3 * sizeof(int)
.
Calling a Function in another File
Consider a C program consisting of two files:
// my_app.c
#include <stdio.h>
int main(int argc, char** argv) {
char* result = my_helper_function(argv[0]);
printf("%s\n", result);
return 0;
}
// my_lib.c
char* my_helper_function(char* string) {
int i;
for (i = 0; string[i] != '\0'; ++i) {
if (string[i] == '/') {
return &string[i + 1];
}
}
return string;
}
You build the program with gcc my_app.c my_lib.c -o my_app
.
What is the bug in the above program? (Hint: it’s in my_app.c.)
Solution
my_helper_function
is not declared inmy_app.c
, so the compiler (incorrectly) guesses that its return type isint
. Becausesizeof(int) == 4
butsizeof(char*)
may be not equal to4
, this may result in a segfault.
How can we fix the bug?
Solution
Declare
my_helper_function
with the proper signature abovemain
.
Including a Header File
Suppose we add a header file to the above program and revise my_app.c
to #include it.
// my_app.c
#include <stdio.h>
#include "my_lib.h"
int main(int argc, char** argv) {
char* result = my_helper_function(argv[0]);
printf("%s\n", result);
return 0;
}
// my_lib.h
char* my_helper_function(char* string);
You build the program with gcc my_app.c my_lib.c -o my_app
.
Suppose that we made a mistake in
my_lib.h
, and declared the function aschar* my_helper_function(void);
. Additionally, the author ofmy_app.c
sees the header file and invokes the function asmy_helper_function()
. Would the program still compile? What would happen when the function is called?Solution
The program would compile but the compiler would not pass an argument to the callee even though it is expecting one, causing it to read some value on the stack (
%ebp
offset by8
).
What could the author of
my_lib.c
do to make such a mistake less likely? AlsoSolution
#include "my_lib.h"
at the top ofmy_lib.c
.
Using #define
Assume the following file contents:
// app.c
#include <stdio.h>
#include "lib.h"
int main(int argc, char** argv) {
helper_args_t helper_args;
helper_args.string = argv[0];
helper_args.target = '/';
char* result = helper_func(&helper_args);
printf("%s\n", result);
return 0;
}
// lib.h
typedef struct helper_args {
#ifdef ABC
char* aux;
#endif
char* string;
char target;
} helper_args_t;
char* helper_func(helper_args_t* args);
// lib.c
#include "lib.h"
char* helper_func(helper_args_t* args) {
int i;
for (i = 0; args->string[i] != '\0'; i++)
if (args->string[i] == args->target)
return &args->string[i + 1];
return args->string;
}
You build the program on a 64-bit machine as follows.
> gcc -c app.c -o app.o > gcc -c lib.c -o lib.o > gcc app.o lib.o -o app
What is the size of a
struct helper_args_t
?Solution
16 bytes. Since
ABC
is not defined, there is onlychar* string
andchar target
, which is a total of8 + 1 = 9
bytes. However, GCC pads the structure for alignment, so it fills in the remaining7
bytes after char target.
Suppose you add
#define ABC
at the top oflib.h
. What is the size of astruct helper_args_t
?Solution
24 bytes. Now that
ABC
is defined, there is an additional 8 bytes fromchar* aux
.
Suppose we build the program in a different way with the original files (i.e. none of the changes from previous question apply):
> gcc -DABC -c app.c -o app.o > gcc -c lib.c -o lib.o > gcc app.o lib.o -o app
The program will now exhibit undefined behavior. What is the issue?
Solution
app.c
is compiled withABC
defined (as seen from the-DABC
flag), butlib.c
is compiled without it. As a result, they have differing definitions ofhelper_args_t
. Inmain
,argv[0]
is put in to address ofhelper_args + 8
bytes sinceapp.c
was compiled with thechar* aux
member, making thechar* string
member be 8 bytes offset fromargs
. However, inhelper_func
, the program will access the address ofargs
when it accessesargs->string
since it was compiled without thechar* aux
member, making thechar* string
the first member. This leads to undefined behavior sincechar* aux
was never initialized inmain
, so it contains a garbage value.An important observation is that this won’t always lead to a segfault. The value in
char* aux
member is garbage, meaning it could potentially be a non-illegal memory access by complete chance.
Using #include Guards
Suppose we split my_lib.h
into two files:
// my_helper_function.h
#include "my_helper_args.h"
char* my_helper_function(helper_args_t* args);
// my_helper_args.h
typedef struct helper_args {
char* string;
char target;
} helper_args_t;
What happens if we include the following two lines at the top of
my_app.c
?#include "my_helper_function.h" #include "my_helper_args.h"
Solution
The compiler reports an error because
helper_args_t
is now defined twice.
How can we fix this? (Hint: look up
#include
guards)Solution
Use an
#include
guard:// my_helper_function.h #ifndef MY_HELPER_FUNCTION_H_ #define MY_HELPER_FUNCTION_H_ #include "my_helper_args.h" char* my_helper_function(helper_args_t* args); #endif
Similar for
my_helper_args.h
, but using a different, unique identifier.
Debugging Segmentation Faults
Observe the following program from singer.c
that aims to sort a string using quicksort. The program will use a string provided as the argument or defaults to “IU is the best singer!” if none is provided.
When the program is compiled and run, we get the following output. Use GDB to fix the issue.
> ./singer "Taeyeon is the best singer!"
Unsorted: "Taeyeon is the best singer!"
Sorted : " !Tabeeeeeghiinnorssstty"
> ./singer
Unsorted: "IU is the best singer!"
Segmentation fault (core dumped)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
// singer.c
void swap(char* a, int i, int j) {
char t = a[i];
a[i] = a[j];
a[j] = t;
}
int partition(char* a, int l, int r) {
int pivot = a[l];
int i = l, j = r+1;
while (1) {
do {
++i;
} while (a[i] <= pivot && i <= r);
do {
--j;
} while (a[j] > pivot);
if (i >= j)
break;
swap(a, i, j);
}
swap(a, l, j);
return j;
}
void sort(char* a, int l, int r) {
if (l < r) {
int j = partition(a, l, r);
sort(a, l, j-1);
sort(a, j+1, r);
}
}
int main(int argc, char** argv) {
char* a = NULL;
if (argc > 1)
a = argv[1];
else
a = "IU is the best singer!";
printf("Unsorted: \"%s\"\n", a);
sort(a, 0, strlen(a) - 1);
printf("Sorted : \"%s\"\n", a);
}
We want to debug the program using GDB. How should we compile the program?
Solution
gcc -g singer.c -o singer
. We need a-g
flag for debugging information. The-o
simply controls the name of the executable that is created which defaults toa.out
if not specified.
When running the program without any arguments (i.e. using the default argument), what line does the segfault happen? Describe the memory operations happening in that line.
Solution
First, we compile the program using the command from the previous problem.
gcc -g singer.c -o singer
The first step in debugging a segfault is seeing which line it occurred in. You can get to the line by letting the program run until it encounters the fault. To get a more holistic view, you can also get the backtrace of the error using backtrace.
> gdb singer (gdb) run Starting program: /home/runner/intro/singer Unsorted: "IU is the best singer!" Program received signal SIGSEGV, Segmentation fault. 0x5646308006c8 in swap (a=0x564630800904 "IU is the best singer!", i=1, j=21) at singer.c:4 4 a[i] = a[j]; (gdb) backtrace #0 0x5646308006c8 in swap (a=0x564630800904 "IU is the best singer!", i=1, j=21) at singer.c:4 #1 0x564630800773 in partition (a=0x564630800904 "IU is the best singer!", l=0, r=21) at singer.c:24 #2 0x5646308007bd in sort (a=0x564630800904 "IU is the best singer!", l=0, r=21) at singer.c:34 #3 0x564630800861 in main (argc=1, argv=0x7ffd04ac7098) at singer.c:48
This tells us the segfault occurred on line 4 in partition function, which is
a[i] = a[j]
. This line performs two memory operations: a read froma[j]
and a write toa[i]
.
Run the program with and without an argument and observe the memory address of a in the seg-faulting line. Why are the memory addresses so different?
Solution
Using GDB, break at line 4 where the segfault occurs.
gdb singer (gdb) break 4 Breakpoint 1 at 0x6ab: file singer.c, line 4. (gdb) run Starting program: /home/runner/intro/singer Unsorted: "IU is the best singer!" Breakpoint 1, swap ( a=0x5624e4600904 "IU is the best singer!", i=1, j=21) at singer.c:4 4 a[i] = a[j]; (gdb) print a $1 = 0x5624e4600904 "IU is the best singer!" (gdb) run "Taeyeon is the best singer!" The program being debugged has been started already. Start it from the beginning? (y or n) y Starting program: singer "Taeyeon is the best singer!" Unsorted: "Taeyeon is the best singer!" breakpoint 1, swap ( a=0x7ffcce01bfe5 "Taeyeon is the best singer!", i=1, j=26) at singer.c:4 (gdb) print a $2 = 0x7ffcce01bfe5 "Taeyeon is the best singer!"
When no argument is passed in, the program defaults to a statically defined string, meaning it resides in the read-only segment. On the other hand, when a specific string is passed in, it is passed in on the stack as an argument. As a result, the two addresses differ significantly.
How should the code be changed to fix the segfault?
Solution
The segfaulting line occurs in the
swap
function which is called bypartition
in lines 24 and 27. However, notice that the read froma[j]
is not a problem as the same read is performed in line 19. As a result, this indicates that a write toa[i]
is the issue.This lines up with our observations from the previous question. Since the data is in a read-only segment, it makes sense that the program cannot write to it. As a result, we need to move the initial string to somewhere else on the address space (e.g. heap). To do this, we need to allocate the appropriate amount of space and then copy over the string. This could be accomplished with a
malloc
followed by astrcpy
, orstrdupa
which does the same thing.Replacing line 48 with
a = strdup("IU is the best singer!")
will fix the segfault. Note, that in this case we should callfree(a)
at the very end ofmain
as well. Managing memory in C correctly is difficult and often quite convoluted…
x86
x86 is a family of instruction set architectures (ISA) developed by Intel. x86 is based on the complex instruction set computer (CISC) architecture. x86 being a family of ISAs means there are a variety of different dialects of this language. In this class, we will focus on the 32-bit ISA called IA-32 or i386 which is the common denominator for all 32-bit x86 processors and hence used in PintOS. While heavily related, this should not be confused with the 32-bit microprocessor i386, also known as Intel 80386). However, we will still occasionally mention the 64-bit ISA x86-64 as you may come across during some non-PintOS assignments.
Registers
Registers are small storage spaces directly on the processor, allowing for fast memory access. The x86 architecture has only has 8 general purpose registers (GPRs).
Register | Name | Purpose |
---|---|---|
ax |
Accumulator | I/O port access, arithmetic, interrupt calls |
bx |
Base | Base pointer for memory access |
cx |
Counter | Loop counting, bit shifts |
dx |
Data | I/O port access, arithmetic, interrupt calls |
sp |
Stack Pointer | Top address of stack |
bp |
Base Pointer | Base address of stack |
si |
Source Index | Source for stream operations (e.g. string) |
di |
Destination Index | Destination for stream operations (e.g. string) |
Due to x86’s 16-bit history, the GPRs started as 16-bits and were extended to 32-bits with the e
prefix (e.g. eax
for ax
) and 64-bits with the r
prefix (e.g. rax
for ax
). Each 16-bit GPR can be addressed by the 8-bit LSB (i.e. lower 8 bits) by replacing the last letter with l
(e.g. al
for ax
). ax
, cx
, dx
, and bx
can also be addressed by the 8-bit MSB (i.e. higher 8 bits) by replacing the last letter with h
(e.g. ah
for ax
).
x86 has an instruction pointer register ip
. Like the GPRs, it is extended to 32-bits with the e
prefix and 64-bits with r
prefix. ip
is a special register since it cannot be read and modified like a GPR (i.e. cannot use vanilla memory instructions). The address stored in ip
always points to the next instruction to be executed. The value is updated by the processor based on the executed commands.
There are other registers such as segment
, EFLAGS
, control
, debug
, test
, floating point
and many more. However, the GPRs and the instruction pointer register are the main ones you will work with in this class.
Syntax
Although IA-32 specifies the registers and instructions, there are two different syntaxes: Intel and AT&T. In this class, we will use the AT&T syntax because it is used by the GNU Assembler, the assembler for GCC and thus the standard for PintOS and most Unix-like operating systems like Linux. The two syntaxes have significant differences, so make sure to check which syntax is being used when referencing documentation.
Registers are preceded by a percent sign (e.g. %eax
for eax
). Immediate values such as constants are preceded by a dollar sign (e.g. $4103
for the constant 4103
).
The general structure of a line of code is inst src, dest
. For example, movl %ebx, %eax
will move the contents of %ebx
into %eax
.
Addressing memory uses the syntax of offset(base, index, scale)
where base
and index
are registers, and offset
and scale
are integers (scale
can only take on values of 1, 2, 4, or 8). This accesses the data at memory address base + index * scale + offset
. All parameters are optional, though most cases you will see will have base
and offset
. The following are some use cases of addressing memory with different instructions.
Command | Explanation |
---|---|
mov 8(%ebx), %eax |
Move contents from the address ebx + 8 into eax |
mov %ecx, -4(%esi, %ebx, 8) |
Move contents in ecx into address esi + 8 * ebx - 4 |
One exception to the above syntax is when using the lea
instruction which stands for “load effective address”. lea
operates directly on the memory addresses themselves and not the contents contained in the memory addresses. For instance, lea 8(%ebx), %eax
would put the address ebx + 8
into eax
, not the contents at that address.
Each instruction also has a suffix that signifies the operand size. b
means byte
(8 bits), w
means word
(16 bits), and l
means long
(32 bits). These are used when the the intended data size is ambiguous (e.g. mov $0, (%esp)
).
Command | Explanation |
---|---|
movb $0, (%esp) |
Zero out a single byte from the stack pointer |
movw $0, (%esp) |
Zero out two bytes from the stack pointer |
movl $0, (%esp) |
Zero out four bytes from the stack pointer |
The suffixes aren’t always necessary when the intended data size can be inferred in some cases (e.g. using a 32-bit register as an operand means a 32-bit operation), but it is good practice to use them regardless.
Calling Convention
Calling convention is a procedure for how to call and return from functions. They specify stack management, passing in parameters, any registers that need to be saved, stack management, returning values, and more. There are two sets of rules: one for the caller of the function and one for the callee of the function.
Calling conventions are heavily tied into the language that’s being compiled. In this class, we will use the calling convention defined by the i386 System V ABI as the default calling convention.
Caller
Before calling the function (i.e. prologue), the caller needs to:
- Save caller-saved GPRs (
eax
,ecx
,edx
) onto the stack if needed after the function call. - Push parameters onto the stack in reverse order (i.e. store first parameter at the lowest address). Add necessary padding before the parameters to ensure a 16-byte alignment.
Then the caller calls the function by pushing the return address onto the stack and jumping to the function. Once the function call returns (i.e. epilogue), the caller needs to
- Remove the parameters from the stack.
- Restore caller-saved GPRs (if any) from the prologue.
Callee
Before executing any function logic (i.e. prologue), the callee needs to:
- Push
ebp
onto the stack and setebp
to be the newesp
(i.e. stack pointer after pushing theebp
). This marks the start of a new stack frame. - Allocate stack space for any local variables (i.e. decrement
esp
). - Save callee-saved GPRs (
ebx
,edi
,esi
) onto the stack if used during the function call.
Then the callee performs the function logic. Before returning (i.e. epilogue), the callee needs too:
- Store the return value in
eax
. - Restore callee-saved GPRs (if any) from the prologue.
- Deallocate local variables. While subtracting the correct amount will technically work, a less error prone way is to set
esp
to be the currentebp
, effectively clearing the stack frame. - Restore caller’s
ebp
from the stack. - Return from the function by popping the return address pushed by the caller in its prologue and jumping to it.
Instructions
There are a few commonly used instructions that will show up in nearly every assembly code due to the calling convention.
Instruction | Purpose | Equivalent commands |
---|---|---|
pushl src |
Push src onto stack | subl $4, %esp movl src, (%esp) |
popl dest |
Pop from stack into dest | movl (%esp), dest addl $4, %esp |
call addr |
Push return address onto stack and jump to addr | pushl %eip jump addr |
leave |
Restore EBP and ESP to previous stack frame | movl %ebp, %esp popl %ebp |
ret |
Pop return address from stack and jump to it | popl %eip |
Keep in mind the effective column shows what the instruction is doing, but it may not exactly be what the processor does. In fact, call
and ret
access eip
, while accessing it directly using mov
is not allowed.
Optimizations
When compiling with some optimizations using GCC (e.g. -O3
flag), you may notice some violations of these calling conventions, most notably the saving of the base pointer. This is because the base pointer is not a necessity: the stack pointer is sufficient to address anything we need. See also this document for more information.
x86 Concept Check
Between
sp
andbp
, which has a higher memory address?Solution
bp
has a higher memory address. The stack grows downwards, meaning the top of the stack moves towards lower addresses.
Write three different ways to clear the
eax
register (i.e. store a 0).Solution
movl $0, %eax subl %eax, %eax xorl %eax, %eax
True or False: Right before the caller jumps to the desired function, the stack must be 16-byte aligned.
Solution
False. The stack needs to be 16-byte aligned right after the parameters have been pushed onto the sack. Return address is pushed right before jumping.
Reverse Engineering
Reading Disassembly
// file.c
int global = 0;
int callee(int x, int y) {
int local = x + y;
return local + 1;
}
void caller(void) {
global = callee(3, 4);
}
When gcc compiles this file, with optimizations off, it outputs:
; file.s:
callee:
pushl %ebp
movl %esp, %ebp
subl $16, %esp
movl 8(%ebp), %edx
movl 12(%ebp), %eax
addl %edx, %eax
movl %eax, -4(%ebp)
movl -4(%ebp), %eax
addl $1, %eax
leave
ret
caller:
pushl %ebp
movl %esp, %ebp
pushl $4
pushl $3
call callee
addl $8, %esp
movl %eax, global
nop
leave
ret
What does each instruction do? Mark the prologue(s), epilogue(s), and call sequence(s).
Solution
- First three instructions of
callee
are the prologue: saveesp
and allocate space for locals.- The next two
movl
instructions read the function arguments off of the stack into registers.- The
addl
instruction computesx + y
.- The next
movl
instruction stores the result atebp - 4
, the stack memory allocated for thelocal
variable.- The next
movl
reads the value oflocal
into a register, and the followingaddl
instruction adds one to it. Now the return value is ineax
.- The final two instructions are the epilogue: restore
esp
,pop
the return address off the stack, and jump to it.- The first two lines of
caller
are the prologue: saveesp
.- The next four lines are the call sequence for calling
callee
: set up stack, call function, and clean up stack.- The next
movl
instruction stores the return value into the address of theglobal
variable.- The
nop
appears to be an artifact of gcc; we’re compiling with optimizations off, so the compiler doesn’t optimize this out (although it would on any other optimization level)- The last two instructions are the the epilogue: restore
esp
,pop
the return address of the stack and jump to it.
Pythagorean triplets
A Pythagorean triplet a < b < c
satisfies the property \(a^2 + b^2 = c^2\). triplet.s
shown below returns the product of a Pythagorean triplet for which a + b + c = 1000
. triplet.s
has been assembled without optimizations based on a complete version of triplet.c
given below as well.
Unused labels and directives have been omitted in triplet.s
for simplicity.
// triplet.c
int main(void) {
for (int a = 1; _____________) {
int a2 = _____;
for (int b = ________________) {
int b2 = _____;
int c = ____________;
int c2 = _____;
if (_____________) {
return _________;
}
}
}
return 0;
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
; triplet.s
main:
pushl %ebp
movl %esp, %ebp
subl $32, %esp
movl $1, -4(%ebp)
jmp .L2
.L7:
movl -4(%ebp), %eax
imull %eax, %eax
movl %eax, -12(%ebp)
movl -4(%ebp), %eax
movl %eax, -8(%ebp)
jmp .L3
.L6:
movl -8(%ebp), %eax
imull %eax, %eax
movl %eax, -16(%ebp)
movl $1000, %eax
subl -4(%ebp), %eax
subl -8(%ebp), %eax
movl %eax, -20(%ebp)
movl -20(%ebp), %eax
imull %eax, %eax
movl %eax, -24(%ebp)
movl -12(%ebp), %edx
movl -16(%ebp), %eax
addl %edx, %eax
cmp %eax, -24(%ebp)
jne .L4
movl -4(%ebp), %eax
imull -8(%ebp), %eax
imull -20(%ebp), %eax
jmp .L5
.L4:
addl $1, -8(%ebp)
.L3:
cmpl $666, -8(%ebp)
jle .L6
addl $1, -4(%ebp)
.L2:
cmpl $333, -4(%ebp)
jle .L7
.L5:
movl $0, %eax
leave
ret
What is the memory address of
a
relative to the base pointer?Solution
triplet.c
shows thata
is defined to be a value of1
. In line 6 oftriplet.s
, we see a local variable being defined on the stack with the constant 1. Since a is the first variable that needs to be defined in the code, line 6 must correspond toa
, meaninga
is defined atebp - 4
(i.e. four bytes below the base pointer).
What is the end condition for the outer loop using
a
?Solution
After
a
is defined on line 6, we jump to.L2
in line 7. This brings us to line 41 where we comparea
(i.e.-4(%ebp)
) with333
using thecmpl
instruction, which subtracts333
froma
. In line 43, we jump to.L7
ifa - 333
is less than or equal to0
. Otherwise, we proceed with the rest of the code which loads0
intoeax
and performs a return procedure by executing.L5
. Therefore,a
must be greater than333
for us to exit the outer loop.
What are the memory addresses of the rest of the local variables (
a2
,b
,b2
,c
,c2
) relative to the base pointer?Solution
Since the code is unoptimized, we trace the code with the appropriate jumps to see where each local variable is defined.
After
a
is defined, we jump to.L7
where we see in line 11 thata2
is the next variable defined atebp - 12
. Afterwards,b
is defined atebp - 8
in line 13. This is confirmed by the similar loop structure as the code jumps to.L3
wherecmpl
andjle
exists as seen fora
.The code then jumps to
.L6
whereb2
,c
, andc2
are defined at16
,20
, and24
bytes below the base pointer in lines 18, 22, and 25, respectively. The stack viewed from the current base pointer looks like:
stack ebp a b a2 b2 c An important observation is how local variables are not always placed on the stack in the order they are defined in, so it’s important to not make such an assumption and instead trace the code. This is due to GCC’s stack slot assignment optimizations.
Fill in the missing code for
triplet.c
.Solution
// triplet.c int main(void) { for (int a = 1; a <= 333; ++a) { int a2 = a * a; for (int b = a; b <= 666; ++b) { int b2 = b * b; int c = 1000 - a - b; int c2 = c * c; if (a2 + b2 == c2) { return a * b * c; } } } return 0; }
A breakdown of how this code was compiled can be found on this Compiler Explorer snippet which highlights corresponding lines between C and x86 and shows the exact compiler flags used. Feel free to play around with different optimizations and other compiler flags to see how the corresponding x86 code changes.
Stack Frame
// foobar.c
int p = 0;
int bar(int x, int y, int z) {
int w = x + y - z;
return w + 1;
}
void foo(int a, int b) {
p = a + b + bar(3, 4, 5);
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
; foobar.s
p:
.zero 4
bar:
pushl %ebp
movl %esp, %ebp
subl $16, %esp
movl 8(%ebp), %edx
movl 12(%ebp), %eax
addl %edx, %eax
subl 16(%ebp), %eax
movl %eax, -4(%ebp)
movl -4(%ebp), %eax
addl $1, %eax
leave
ret
foo:
pushl %ebp
movl %esp, %ebp
pushl %ebx
subl $4, %esp
movl 8(%ebp), %edx
movl 12(%ebp), %eax
leal (%edx,%eax), %ebx
subl $4, %esp
pushl $5
pushl $4
pushl $3
call bar
addl $16, %esp
addl %ebx, %eax
movl %eax, p
nop
movl -4(%ebp), %ebx
leave
ret
Which lines of the code correspond to a caller/callee prologue/epilogue?
Solution
The most apparent caller/callee pair is
foo
/bar
. Lines 25-28 are the prologue of the caller that setup the arguments. Line 30 is the epilogue of the caller that cleans up the stack by simply moving up the stack pointer. Lines 5-7 are the prologue of the callee that push the base pointer and make room for local variables. While there’s only one local variablew
inbar
, the compiler still makes room for 16 bytes due to the default stack boundary of16
bytes per GCC’s-mpreferred-stack-boundary
flag. Lines 14-16 are the epilogue of the callee that restores the stack pointer and returns back to the caller.It’s important to recognize
foo
is a function as well, meaning it will be in the callee position when it’s called by another function. Similar to whenbar
was the caller, lines 18-21 and lines 34-36 serve as the prologue and epilogue respectively whenfoo
is the callee.
What does line 20 do in
foobar.s
? Why is it necessary?Solution
Line 20 saves the
ebx
register by pushing it onto the stack.ebx
is a callee-saved register, sofoo
is responsible for saving it before using it.
Why is
edx
not saved byfoo
before callingbar
despite being used inbar
?Solution
General purpose registers (GPRs) are saved only if the register content needs to persist after the function call.
edx
was used as a temporary register for computation in lines 22-24. As a result, its value does not need to persist past the call ofbar
.
Draw the stack frame and contents of relevant registers after executing line 17 of
foobar.s
.Solution
Stack Frame of foo’s Caller Padding b a Return Addr of foo’s Caller EBP of foo’s Caller EBX of foo’s Caller 5 4 3 Return Addr of foo EBP of foo 2