How does the stack work ?

1/ Memory Layout

The memory can be represented by several layers. It is composed by the text segment, the initialized data segment (data), the uninitialized data segment (bss), the heap and the stack.

  • Text Segment : The text segment contains the machine code or the program instruction. It is located at the bottom of the memory layout to avoid overwriting the instructions in case of stack or heap overflows.
  • Initialized Data Segment: The segment is also called data, it contains global variable and static variable that are initialized in the program.
  • Uninitialized Data Segment: Also called BSS, it contains the global and static variable that are not initialized in the program.
  • The heap: The heap is the layer of the memory layout where dynamic memory allocation takes place and stores dynamic variable.
  • The stack: It contains local variables of a function.
In this tutorial, the focus is on the functioning of the stack. We will analyze it step by step to make the understanding easier.

2/ Understanding the stack

The assembly code below will be served as an example. For this example, we will monitor 3 registers (eip, esp, ebp).

  • eip: It stands for Extended Instruction Pointer, and it is used to tell the computer what is the next intruction in the program.
  • esp: Stack pointer
  • ebp: Base pointer
The base and stack pointer represent the limit of a stack frame, a new stackf rame is created when the program is calling a function.

Before reaching the instruction, all registers are pointing to an address. Those registers take the following addresses:
eip = 0xb7eadc70
ebp = 0xbffff900
esp = 0xbffff7e4

In our example, at the beginning, the stack looks like this:
  • First instruction: 0xb7eadc73 <__libc_start_main+227>: call 0x080483f4 <main>
  • Reaching the first instruction, eip will store the address of the first instruction.

    Registers information:
    eip = 0xb7eadc73
    ebp = 0xbffff900
    esp = 0xbffff7e4

    Calling a function will also push the address of the next instruction to the stack. The value of the address of the top of the stack is 0xb7eadc76. The register esp is the stack pointer, so it always points to the top of the stack, esp = 0xbffff7dc and then we jump to the address of the function (0x80483f4).

    Registers information:
    eip = 0x080483f4
    ebp = 0xbffff900
    esp = 0xbffff7dc

    Why esp = 0xbffff7dc ? Remember the memory layout presented on the top of the page, the stack goes from high address to low address. Each memory address is a single byte and since the program is storing the address of the next instruction at the top of the stack, esp will have a lower address.
    esp - 8 = 0xbffff7e4 - 8 = 0xbffff7dc
  • Second instruction: 0xb7eadc76 <__libc_start_main+230>: mov DWORD PTR [esp], eax
  • This second instruction means that DWORD PTR [esp] take the value of the register eax. DWORD means double word (32 bits) so 4 bytes. But it is not reaching this command, because it will first jump to the address 0x080483f4 after the instruction call.

  • Third instruction: 0x080483f4 <main+0>: push ebp
  • It is the first instruction of the main function “main+0”, push ebp means that it is putting the value of ebp on the top of the stack. Since ebp has been pushed on the top of the stack, esp value has changed because the stack pointer is pointing to the address of the top of the stack. Therefore, by doing push ebp, it is saving in the new stack frame the address of the new ebp.

    Registers information:
    eip = 0x080483f5
    ebp = 0xbffff900
    esp = 0xbffff7d8
  • Fourth instruction: 0x080483f5 <main+1>: mov ebp, esp
  • In this instruction, the base pointer will take the value of the stack pointer.
    ebp = esp

    Registers information:
    eip = 0x080483f7
    ebp = 0xbffff7d8
    esp = 0xbffff7d8
  • Fifth instruction: 0x080483f7 <main+3>: and esp, 0xfffffff0
  • Bitwise AND between the stack pointer and 0xfffffff0, the last bit of the stack pointer will be set to 0. It is used to round the stack pointer down to the nearest multiple of 16.
    Difference between Bitwise AND and logical operator AND: Logical return 1 (true) or 0 (false) whereas Bitwise is doing a bit by bit checking (1011 & 0101 = 0001).
    esp & 0xfffffff0 = 0xbffff7d8 & 0xfffffff0 = 0xbffff7d0
    To manually calculate bitwise, convert the hexadecimal number into binary, then proceed to the bitwise AND calculation.
    10111111111111111111011111011000
    11111111111111111111111111110000
    --------------------------------
    10111111111111111111011111010000 = BFFFF7D0

    Register information:
    eip = 0x080483f7
    ebp = 0xbffff7d8
    esp = 0xbffff7d0

  • Sixth instruction: 0x080483fa <main+6>: sub esp,0x60
  • The subtraction between the stack register and a value.
    esp = esp – 0x60
    This correspond to the size of the stack, after subtracting esp, it has created a new stack frame, the base pointer (ebp) will be pointing to the beginning of the stack whereas the stack pointer (esp) is pointing to the address at the top of the stack. The computer is allocating memory space for the function and every local variable will be store in that memory space called stack frame.

    Registers information:
    eip = 0x080483fd
    ebp = 0xbffff7d8
    esp = 0xbffff770
  • Seventh instruction: 0x080483fd <main+9>: mov DWORD PTR [esp+0x5c],0x0
  • It is setting the value 0x0 on the content of the address “esp+0x5c” as a double word.

    Registers information:
    eip = 0x080483fd
    ebp = 0xbffff7d8
    esp = 0xbffff770
  • Eighth instruction: 0x08048433 <main+63>: leave
  • The instruction “leave” has two parts, first it will set esp to ebp. The registers esp and ebp will get the same value and destroy the stack frame as consequences. This explains why we cannot use local variable of a function in another function, all memory inside of the stack frame is erased, which means that all data does not exist anymore. The instruction is mov esp, ebp.

    Registers information:
    eip = 0x080483fd
    ebp = 0xbffff7d8
    esp = 0xbffff7d8

    The second part takes off the top of the stack and put the content in ebp. The instruction is pop ebp. If you remember the value of ebp at the beginning, it has the same value, therefore we come back to the previous stack frame, the register ebp is pointing back to the beginning of the old base pointer.

    Registers information:
    eip = 0x080483fd
    ebp = 0xbffff900
    esp = 0xbffff7d8
  • Ninth instruction: 0x08048434 <main+64>: ret
  • The ret instruction is similar as pop eip. It is going to store the value of the top of the stack to eip. So eip is modified and the program will jump to that address. If you notice the instruction of that address is the instruction after calling the function "main". (second instruction)

    Registers information:
    eip = 0xb7eadc76
    ebp = 0xbffff900
    esp = 0xbffff7e4
    Same as the first instruction, but this time by deleting a memory address from the stack, it add 8 bits to the register esp address.
    esp + 8 = 0xbffff7d8 + 8 = 0xbffff7e4
    After that, the program run the instruction store in the register eip: 0xb7eadc76 <__libc_start_main+230>: mov DWORD PTR [esp], eax.