x86 Assembly

x86 Assembly Introduction
The intent of this section is to provide foundational knowledge which will be used for x86 assembly, as well as other architectures. You will install and configure your tools, write an x86 assembly program, assemble it, link it, run it, debug it, and understand every single byte contained in its code.
Memory Types
There are multiple levels of data storage available in a computer generally in the order of:

Cold storage such as magnetic or solid state flash drives:
  • Used for long term storage of data
  • Do not require electrical power to maintain data
  • Generally have the largest capacity but lowest seek and transfer speeds

Dynamic Random Access Memory (RAM or DRAM):
  • Stores data for use by the Operating System with currently running programs
  • Must have the data constantly refreshed with power (uses capacitors)
  • Faser seek and transfer speeds than cold storage, but lower capacity

Static Random Access Memory (SRAM or cache RAM):
  • Doesn't require constant refreshing
  • Will lose data with loss of voltage (uses transistors for storage)
  • Typically stores data for the CPU to operate on quickly and repeatedly
  • Typically stored on different levels (L1, L2, L3) in decreasing proximity to the CPU dye (but increasing capacity)

CPU Registers:
  • Provides the most immediate storage for the processor
  • Located directly in the CPU core (also transistors)
  • Smallest of all memory systems (sized based on the CPU architecture)

The original 8086 processor used 16-bit registers with four different function types:
  • General purpose registers, used for temporary data storage from other operations, addresses, variables, counters, etc.
  • Memory segment registers, used for managing segmented memory addressing (this is not used for modern RAM layouts which are flat)
  • Index and pointer registers, used to track memory locations for instructions, stack memory, and string operations
  • Flag register, used to hold state flags for operations


x86 Registers
Below is a breakdown of the CPU registers found in the 8086

General Purpose x86 Registers: ax bx cx dx (16-bit Word registers) ah al bh bl ch cl dh dl (High and low bit portion)

The GP registers aren't dedicated to any single function, but they do have common uses to include:
  • ax - accumulator register typically stores results from other operations
  • bx - typically used to store the base address of memory for an array or an offset address
  • cx - typically used as a counter register
  • dx - typically used as the data extension register for division and multiplication operations to store the operand

x86 Memory Segment Registers: (code segment) (data segment) (stack segment) (extra segment) cs ds ss es
The segment registers were used to address more than 64k of memory in the 8086 and 20286:
  • cs - held the base adddress of the current code segment
  • ds - held the base address of the current data segment
  • ss - held the base address of the current stack segment
  • es - held held the base address of any additional data segment
These registers are no longer used for memory segmentation since modern x86 CPUs use a flat memory model
Pointers and Index Registers: (base pointer) (stack pointer) bp sp
  • sp - always stores the address of the newest element in the stack, this will be the lowest memory address in the stack
  • bp - when multiple functions are nested, the base pointer stores the address of the calling function's base pointer
The stack will be examined in detail when we explore functions.

(instruction pointer) (source index) (dest. index) ip si di
  • ip - stores the address of the next instruction that should be executed, this is an essential register
  • si - typically used to store a memory location for string or memory operations, the address where the value is copied from
  • di - typically used to store a memory location for string or memory operations, the address where the value is copied to
Flag register:
  • 16-bit register that stores states for arithmatic logic unit (ALU) operations
  • Contains a bitmap showing what flags are set
bit 0x0F 0E 0D 0C 0B 0A 09 08 07 06 05 04 03 02 01 00 Flag 00 00 00 00 OF DF IF TF SF ZF 00 AF 00 PF 00 CF
  • 00: Not used or reserved
  • OF: Overflow Flag, indicates whether (1) or not (0) an overflow occured during a math operation
  • DF: Direction Flag, indicates the direction that strings should be process in memory, lower to higher memory (0) or higher to lower (1)
  • IF: Interrupt Flag, indicates whether (1) or not (0) maskable (optional) hardware interrupts should be processed
  • TF: Trap Flag, indicates whether (1) or not (0) commands should be executed a single step at a time
  • SF: Sign Flag, indicates the sign of a number for signed operations, negative (1), or positive (0)
  • ZF: Zero Flag, indicates whether the result of a logic test is zero (1) or not zero (0)
  • AF: Auxiliary Flag, indicates whether there is a carry (1) or not (0) between the low nibble and high nibble or 8-bit instructions
  • PF: Parity Flag, indicates whether there is an even (1) or odd (0) number of bytes in a value
  • CF: Carry Flag, indicates whether (1) or not (0) there is a value that is carried or borrowed by math operations

A complete reference to the x86 architecture can be downloaded from Intel here
x86 Extended Registers
The 80386 processor extended the registers from the 16-bits used by the 80286 and 8086 to 32-bits. An "e" was added to the register names to indicate that they are used for the extended 32-bit operations.
GP Registers: 32-bit (dword/long) 16-bit (word) (8086) 8-bit (byte) (8086) eax ax ah al ebx bx bh bl ecx cb ch cl edx dx dh dl Pointer and Index Registers: esp sp ebp bp esi si edi di eip ip
Segment Registers:
The 80386 also added two additional segment registers.
  • fs (frame segment), used in x86 for managing protected mode segmented memory
  • gs (general segment), used for x86 protected memory, used by x86_64 still for special purpose OS tasks
Extended Flag (EFLAG):
80386 also extended the x86 flag register to 32-bits
bit 0x1F 1E 1D 1C AB 1A 19 18 17 16 15 14 13 12 11 10 0F 0E 0D 0C 0B 0A 09 08 07 06 05 04 03 02 01 00 Flag 00 00 00 00 00 00 00 00 00 00 00 00 00 00 VM RF 00 NT PL PL OF DF IF TF SF ZF 00 AF 00 PF 00 CF
  • 00: Not used or reserved
  • PL: Privilege level flag, this is a 2-bit flag indicating the privilege level of an IO instruction from 0-3 (system defined)
  • NT: Nested task flag, indicates whether (1) or not (0) a process is executing as a next task (system defined)
  • RF: Resume flag, indicates whether (1) or not (0) execution should resume after a debug exception
  • VM: Virtual mode flag, indicates whether (1) or not (0) commands should be executed in virtual-8086 mode

x86 Assembly Source Code
Assembly source code is processed by an assembler to convert it in to machine code that the target CPU architecture will understand and output it to a file format that the OS will be able to execute.

The assembler:
  • Translates mnemonic opcodes into machine code
  • Resolves symbolic addresses and labels to actual memory addresses
  • Calculates relative offsets for branching instructions (calls, jumps, etc.)
  • Processes assembler directives that define data, set memory alignment, and specify output file sections
  • Generates output files that can be linked together to create an executable program
The assembler follows this order when processing assembly source files:
  1. Process assembler directives
  2. Assign memory addresses to labels
  3. Calculate the size and layout of the program in memory
  4. Translate opcodes to machine code
  5. Resolve symbolic addresses
  6. Generate an output file

Throughout this course we will be using the GNU Assembler (GAS). GAS has many directives, but some of the more common ones include:
  • .align n: Align the next data item on an n-byte boundary.
  • .ascii "string": Store the string in memory without a null terminator.
  • .asciz "string": Store the string in memory with a null terminator.
  • .balign n: Same as .align, but pads with zeros instead of NOP instructions.
  • .byte n1, n2, ...: Store a sequence of 8-bit bytes in memory.
  • .comm symbol, length: Declare a common block of the specified length for symbol.
  • .data: Switch to the data section for subsequent data items.
  • .equ symbol, expression: Set the value of a symbol to a constant expression.
  • .fill repeat, size, value: Generate a block of data with the specified size, repeat times, initialized to the given value.
  • .globl symbol: Mark a symbol as global, making it accessible by other object files during the linking process.
  • .local symbol: Mark a symbol as local, meaning it will not be accessible by other object files.
  • .long n1, n2, ...: Store a sequence of 32-bit integers in memory.
  • .org new_location: Set the assembly location counter to the specified new_location.
  • .section name, flags: Switch to a named section with the specified flags.
  • .short n1, n2, ...: Store a sequence of 16-bit integers in memory.
  • .size symbol, expression: Set the size of a symbol to the given expression.
  • .space n: Insert n bytes of zero-initialized space into the output.
  • .string "string": Same as .asciz, store the string in memory with a null terminator.
  • .text: Switch to the text section for subsequent instructions.
  • .word n1, n2, ...: Store a sequence of 16-bit or 32-bit integers in memory, depending on the target architecture.

x86 Hello World
It is time to write some code. The following code was written for GAS to demonstrate a basic Hello World program for x86. It was written with the Intel syntax (AT&T is the default for GAS). You can copy this code in to your favorite editor and read the very thorough comments to get a complete breakdown of the code.
.intel_syntax noprefix /* Directive for GAS (GNU Assembler) to use Intel sytanx instead of AT&T for x86 */ .section .data /* The .section directive is used to define or switch to an existing section in the object file */ /* The .data section is used to store static variables that are stored in memory for the entire duration of a programs execution. - Values for variables in this section are stored in the program binary - Variables are initialized with their stored values at run time - The .data section is writeable in memeory, so values here can be changed during execution */ message: /* A colon after a name is used to define the contents of a label */ /* Labels are user-defined names given to memory addresses When this code is assembled, the message label will be replaced with its memory address in the data section */ .ascii "Hello World!\n" /* The .ascii directive lets the assembler know that the following data should be interpreted as an ASCII string */ /* This value could be defined using the actual hex bytes, but it is easier to let the assembler convert the ASCII string to the byte value */ hex_message: .byte 0x48, 0x65, 0x6c, 0x6c, 0x6f, 0x20, 0x57, 0x6f, 0x72, 0x6c, 0x64, 0x21, 0x0a /* This stores the same message without using the .ascii directive as a short-cut Instead it defines a list of bytes to be stored in the .data section of memory These bytes have the same value as .ascii "Hello World!\n" */ .section .text /* The .text section stores the executable instructions of the program */ /* The text section is loaded as read-only */ .global _start /* The .global directive defines a global scope for a label */ /* _start is a special label used by GAS that indicates this is the starting point to begin executing code The .global directive indicates that the label _start should be accessible to other object files outside of the object file generated from this assembly source */ _start: /* This is the label for the memory address for the beginning of our code */ print_message: /* Print the message stored in the .data section using a write syscall Linux syscals for x86 can be referenced from these sources: https://chromium.googlesource.com/chromiumos/docs/+/master/constants/syscalls.md#x86-32_bit https://syscalls.w3challs.com/ x86 header file definitions for Linux syscalls can typically be found in: /usr/include/x86_64-linux-gnu/asm/unistd_32.h */ mov eax, 0x00000004 /* set the eax register to 0x4 which is the value for the write syscall */ /* mov is an opcode mnemonic that instructs the processor to copy a value In this instance mov is used to copy the immediate (hardcoded) value 0x00000004 to the eax register While 0x00000004 is used 0x4 or just 4 in decimal and it would write the same value When copying an immediate value to a 32-bit register, the machine code for mov is b8 + the 3-bit register bitmask The register masks for the GP registers are eax, 000 (0x0), ecx 001 (0x1), edx 010 (0xa), and ebx 011 (0x3) mov eax, 0x000000004 would be written in machine code as: b8 04 00 00 00 With little endian data values, individual bytes are read from left to right, but bytes are read right to left For example: mov eax, 0x12345678 would be written as: b8 78 56 34 12 */ mov ebx, 0x00000001 /* ebx stores the first argument for write */ /* This argument passes the file descriptor (fd) we want to write to FD 1 is stdout in Linux which will write the output to the terminal mov ebx, 0x00000001 will be written as bb 01 00 00 00 */ lea ecx, message /* ecx stores the second argument for write */ /* The Load Effective Address (lea) opcode loads the address for the message tag into ecx This is similar to using a pointer or refence rather than passing by value in C/C++ The machine opcode for lea is 8d, lea uses a 1 byte ModR/M bitmask to represent the addressing mode, the destination register, and the source register or displacement mode In this instance, the address of the message label is resolved and loaded directly into ecx as an immediate value The bitmask for this method of loading is: bit 7 6 5 4 3 2 1 0 function Mod Reg R/M For direct displacement addressing the both Mod bits are 0, and the R/M bits are 101 We know the ecx register mask is 001, so the complete bitmask mask would be: 0 0 0 0 1 1 0 1 which is 0x0d Therefore, this operation would be stored in machine code as: 8d 0d xx xx xx xx where xx is the address for message */ mov edx, 0x0000000d /* edx stores the third and final argument for write */ /* This argument passes how many bytes we want to write, starting at the address passed in argument 2 with ecx the machine opcode for mov for immediate values is b8 and the bitmask for edx is 010, so this instruction would be stored as ba 0d 00 00 00 in the binary */ int 0x80 /* interrupt (int) 0x80 invokes the x86 syscall */ /* This will execute the write syscall with the parameters that we loaded into registers ebx, ecx, and edx The return value will be written back to eax The interrupt machine code is 0xcd and the interrupt is 0x80, so this would be stored in the binary as cd 80 */ print_hex_message: /* This writes the same message, but uses the hex_message label instead of the .ascii message label */ mov eax, 0x00000004 mov ebx, 0x00000001 lea ecx, hex_message mov edx, 0x0000000d int 0x80 exit_program: /* This invokes the exit syscall which returns the exit code for the program */ mov eax, 0x00000001 /* eax is set to 1 for an exit syscall */ /* /usr/include/sysexits.h defines system exit codes If the program is terminated with a signal, then it will return 128 + SIGNAL /usr/include/asm-generic/signal.h defines the signals */ mov ebx, 0x00000000 /* ebx is set to our desired exit code - 0 is a successful exit */ int 0x80 /* invoke syscall */
These are links to the syscall tables referenced in the code comments:
Google syscall reference
W3 syscall reference

Assemble Link and Run
Once you have copied the code and have thoroughly read through the comments, it is time to make it executable. The GNU Assembler for x86_64 which was installed with the gcc-12 package can be used to assemble the source code into an object file. To assemble the source, navigate to the directory where the source is saved and enter the command:
x86_64-linux-gnu-as --32 -o hello_x86.o hello_x86.asm
This command assumes that you saved the source file as hello_x86.asm. It directs GAS to create a 32-bit object file named hello_x86.o and to use the hello_x86.asm file as an input source.

Once the object file is created, we can use the GNU linker to create the final executable binary. Enter the command:
x86_64-linux-gnu-ld -m elf_i386 -o hello_x86 hello_x86.o
The -m elf_i386 parameter directs the linker to create a legacy i386 32-bit binary in Executable and Linkable Format (ELF). The -o paramter specifies the output file as hello_x86 and the hello_x86.o is the only object file to be linked into the binary.

Once the executable is created, it can be run with ./hello_x86 It should produce the output:
Hello World! Hello World!

You can check the return value of the program immediately after it exists with:
echo $? 0

Debugging x86 in GDB
GDB is a powerful text based debugger that is run from the terminal. We will use it extensively throughout this tutorial, and start by examing the hello_x86 binary we just created. To begin, invoke the gdb debugger with:
gdb hello_x86
pete@framework16:~/Documents/ASM/hello_world/x86$ gdb hello_x86 GNU gdb (Ubuntu 12.1-0ubuntu1~22.04) 12.1 Copyright (C) 2022 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later &l;thttp://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <https://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from hello_x86... (No debugging symbols found in hello_x86) (gdb)
We are greated with the gdb prompt.

We wrote our assembly using the Intel syntax, so we will switch our disassembly output to that syntax with:
set disassembly-flavor intel

We want to view the disassembly code and the CPU registers while we are running our program, we can enable them with:
lay asm lay reg

Your terminal window should now look similar this this:
|------------------------------------------------------------------------------------------------------------------------------------------| | | | | | | | | | | | | | | | | | | | [ Register Values Unavailable ] | | | | | | | | | | | | | | | | | | | |------------------------------------------------------------------------------------------------------------------------------------------| | | | | | | | | | | | | | | | | | | | [ No Assembly Available ] | | | | | | | | | | | | | | | | | |------------------------------------------------------------------------------------------------------------------------------------------| exec No process In: L?? PC: ?? (gdb) lay reg (gdb)
Note: If at any time the screen output appears corrupted, you can enter ctrl + l to redraw the screen.
You can use ctrl + x then o or p to switch between the assembly, register, and gdb command frames.

We will now set a break point for debugging our code by entering:
break _start
This will set a break point at the beginning of the _start label.


We can now begin our program execution by entering:
run


Your terminal should now look similar to the output below:
|-Register group: general------------------------------------------------------------------------------------------------------------------------------------------------| |eax 0x0 0 ecx 0x0 0 edx 0x0 0 | |ebx 0x0 0 esp 0xffffd950 0xffffd950 ebp 0x0 0x0 | |esi 0x0 0 edi 0x0 0 eip 0x8049000 0x8049000 <_start> | |eflags 0x202 [ IF ] cs 0x23 35 ss 0x2b 43 | |ds 0x2b 43 es 0x2b 43 fs 0x0 0 | |gs 0x0 0 k0 0x0 0 k1 0x0 0 | |k2 0x0 0 k3 0x0 0 k4 0x0 0 | |k5 0x0 0 k6 0x0 0 k7 0x0 0 | | | | | | | | | | | | | | | | | | | | | | | |------------------------------------------------------------------------------------------------------------------------------------------------------------------------| |B+> 0x8049000 <_start> mov eax,0x4 | | 0x8049005 <_start+5> mov ebx,0x1 | | 0x804900a <_start+10> lea ecx,ds:0x804a000 | | 0x8049010 <_start+16> mov edx,0xd | | 0x8049015 <_start+21> int 0x80 | | 0x8049017 <print_hex_message> mov eax,0x4 | | 0x804901c <print_hex_message+5> mov ebx,0x1 | | 0x8049021 <print_hex_message+10> lea ecx,ds:0x804a00d | | 0x8049027 <print_hex_message+16> mov edx,0xd | | 0x804902c <print_hex_message+21> int 0x80 | | 0x804902e <exit_program> mov eax,0x1 | | 0x8049033 <exit_program+5> mov ebx,0x0 | | 0x8049038 <exit_program+10> int 0x80 | | 0x804903a add BYTE PTR [eax],al | | 0x804903c add BYTE PTR [eax],al | | 0x804903e add BYTE PTR [eax],al | | 0x8049040 add BYTE PTR [eax],al | | 0x8049042 add BYTE PTR [eax],al | | 0x8049044 add BYTE PTR [eax],al | | 0x8049046 add BYTE PTR [eax],al | |------------------------------------------------------------------------------------------------------------------------------------------------------------------------| native process 84922 In: _start L?? PC: 0x8049000 (gdb) lay reg (gdb) break _start Breakpoint 1 at 0x8049000 (gdb) run Starting program: /home/pete/Documents/ASM/hello_world/x86/hello_x86 Breakpoint 1, 0x08049000 in _start () (gdb)
Our program is now running inside the debugger. We can see that our breakpoint was set at the beginning of the _start label which is at memory address 0x0804900. The first instruction at that address should be highlighted in the assembly frame, and if we look at the register group, we can see that our eip register has the address of the next instruction to mov 0x4 into eax.

We will now execute a single instruction by entering:
si

|-Register group: general------------------------------------------------------------------------------------------------------------------------------------------------| |eax 0x4 4 ecx 0x0 0 edx 0x0 0 | |ebx 0x0 0 esp 0xffffd950 0xffffd950 ebp 0x0 0x0 | |esi 0x0 0 edi 0x0 0 eip 0x8049005 0x8049005 <_start+5> | |eflags 0x202 [ IF ] cs 0x23 35 ss 0x2b 43 | |ds 0x2b 43 es 0x2b 43 fs 0x0 0 | |gs 0x0 0 k0 0x0 0 k1 0x0 0 | |k2 0x0 0 k3 0x0 0 k4 0x0 0 | |k5 0x0 0 k6 0x0 0 k7 0x0 0 | | | | | | | | | | | | | | | | | | | | | | | |------------------------------------------------------------------------------------------------------------------------------------------------------------------------| |B+ 0x8049000 <_start> mov eax,0x4 | | > 0x8049005 <_start+5> mov ebx,0x1 | | 0x804900a <_start+10> lea ecx,ds:0x804a000 | | 0x8049010 <_start+16> mov edx,0xd | | 0x8049015 <_start+21> int 0x80 | | 0x8049017 <print_hex_message> mov eax,0x4 | | 0x804901c <print_hex_message+5> mov ebx,0x1 | | 0x8049021 <print_hex_message+10> lea ecx,ds:0x804a00d | | 0x8049027 <print_hex_message+16> mov edx,0xd | | 0x804902c <print_hex_message+21> int 0x80 | | 0x804902e <exit_program> mov eax,0x1 | | 0x8049033 <exit_program+5> mov ebx,0x0 | | 0x8049038 <exit_program+10> int 0x80 | | 0x804903a add BYTE PTR [eax],al | | 0x804903c add BYTE PTR [eax],al | | 0x804903e add BYTE PTR [eax],al | | 0x8049040 add BYTE PTR [eax],al | | 0x8049042 add BYTE PTR [eax],al | | 0x8049044 add BYTE PTR [eax],al | | 0x8049046 add BYTE PTR [eax],al | |------------------------------------------------------------------------------------------------------------------------------------------------------------------------| native process 84922 In: _start L?? PC: 0x8049005 (gdb) lay reg (gdb) break _start Breakpoint 1 at 0x8049000 (gdb) run Starting program: /home/pete/Documents/ASM/hello_world/x86/hello_x86 Breakpoint 1, 0x08049000 in _start () (gdb) si 0x08049005 in _start () (gdb)
The first instruction was executed, and looking at the registers, we can see that now eax holds the value 0x4, and eip holds the address of the next instruction at 0x08049005. This is also highlighted in our assembly frame and pointed to with the > symbol. The >_start+5< tag indicates that this memory location is offset 5 bytes from the beginning of our _start label, which means our first instruction was 5 bytes long. Or gdb command window indicates we are at memory address 0x08049005 in the _start label.

Enter si to execute the next instruction:
|-Register group: general------------------------------------------------------------------------------------------------------------------------------------------------| |eax 0x4 4 ecx 0x0 0 edx 0x0 0 | |ebx 0x1 1 esp 0xffffd950 0xffffd950 ebp 0x0 0x0 | |esi 0x0 0 edi 0x0 0 eip 0x804900a 0x804900a <_start+10 | |eflags 0x202 [ IF ] cs 0x23 35 ss 0x2b 43 | |ds 0x2b 43 es 0x2b 43 fs 0x0 0 | |gs 0x0 0 k0 0x0 0 k1 0x0 0 | |k2 0x0 0 k3 0x0 0 k4 0x0 0 | |k5 0x0 0 k6 0x0 0 k7 0x0 0 | | | | | | | | | | | | | | | | | | | | | | | |------------------------------------------------------------------------------------------------------------------------------------------------------------------------| |B+ 0x8049000 <_start> mov eax,0x4 | | 0x8049005 <_start+5> mov ebx,0x1 | | > 0x804900a <_start+10> lea ecx,ds:0x804a000 | | 0x8049010 <_start+16> mov edx,0xd | | 0x8049015 <_start+21> int 0x80 | | 0x8049017 <print_hex_message> mov eax,0x4 | | 0x804901c <print_hex_message+5> mov ebx,0x1 | | 0x8049021 <print_hex_message+10> lea ecx,ds:0x804a00d | | 0x8049027 <print_hex_message+16> mov edx,0xd | | 0x804902c <print_hex_message+21> int 0x80 | | 0x804902e <exit_program> mov eax,0x1 | | 0x8049033 <exit_program+5> mov ebx,0x0 | | 0x8049038 <exit_program+10> int 0x80 | | 0x804903a add BYTE PTR [eax],al | | 0x804903c add BYTE PTR [eax],al | | 0x804903e add BYTE PTR [eax],al | | 0x8049040 add BYTE PTR [eax],al | | 0x8049042 add BYTE PTR [eax],al | | 0x8049044 add BYTE PTR [eax],al | | 0x8049046 add BYTE PTR [eax],al | |------------------------------------------------------------------------------------------------------------------------------------------------------------------------| native process 84922 In: _start L?? PC: 0x804900a (gdb) lay reg (gdb) break _start Breakpoint 1 at 0x8049000 (gdb) run Starting program: /home/pete/Documents/ASM/hello_world/x86/hello_x86 Breakpoint 1, 0x08049000 in _start () (gdb) si 0x08049005 in _start () (gdb) si 0x0804900a in _start () (gdb)
Notice that ebx has been set to 0x1 and the eip register has been updated again to point to the next instruction.

Enter si again:
|-Register group: general------------------------------------------------------------------------------------------------------------------------------------------------| |eax 0x4 4 ecx 0x804a000 134520832 edx 0x0 0 | |ebx 0x1 1 esp 0xffffd950 0xffffd950 ebp 0x0 0x0 | |esi 0x0 0 edi 0x0 0 eip 0x8049010 0x8049010 <_start+16 | |eflags 0x202 [ IF ] cs 0x23 35 ss 0x2b 43 | |ds 0x2b 43 es 0x2b 43 fs 0x0 0 | |gs 0x0 0 k0 0x0 0 k1 0x0 0 | |k2 0x0 0 k3 0x0 0 k4 0x0 0 | |k5 0x0 0 k6 0x0 0 k7 0x0 0 | | | | | | | | | | | | | | | | | | | | | | | |------------------------------------------------------------------------------------------------------------------------------------------------------------------------| |B+ 0x8049000 <_start> mov eax,0x4 | | 0x8049005 <_start+5> mov ebx,0x1 | | 0x804900a <_start+10> lea ecx,ds:0x804a000 | | > 0x8049010 <_start+16> mov edx,0xd | | 0x8049015 <_start+21> int 0x80 | | 0x8049017 <print_hex_message> mov eax,0x4 | | 0x804901c <print_hex_message+5> mov ebx,0x1 | | 0x8049021 <print_hex_message+10> lea ecx,ds:0x804a00d | | 0x8049027 <print_hex_message+16> mov edx,0xd | | 0x804902c <print_hex_message+21> int 0x80 | | 0x804902e <exit_program> mov eax,0x1 | | 0x8049033 <exit_program+5> mov ebx,0x0 | | 0x8049038 <exit_program+10> int 0x80 | | 0x804903a add BYTE PTR [eax],al | | 0x804903c add BYTE PTR [eax],al | | 0x804903e add BYTE PTR [eax],al | | 0x8049040 add BYTE PTR [eax],al | | 0x8049042 add BYTE PTR [eax],al | | 0x8049044 add BYTE PTR [eax],al | | 0x8049046 add BYTE PTR [eax],al | |------------------------------------------------------------------------------------------------------------------------------------------------------------------------| native process 84922 In: _start L?? PC: 0x8049010 (gdb) lay reg (gdb) break _start Breakpoint 1 at 0x8049000 (gdb) run Starting program: /home/pete/Documents/ASM/hello_world/x86/hello_x86 Breakpoint 1, 0x08049000 in _start () (gdb) si 0x08049005 in _start () (gdb) si 0x0804900a in _start () (gdb) si 0x08049010 in _start () (gdb)
Now we can see our ecx register has been loaded with a memory address. This should be the memory address of our message label.

We can view a list of variable labels in our program by entering the command:
info variables

Which outputs:
(gdb) info variables All defined variables: Non-debugging symbols: 0x0804a000 message 0x0804a00d hex_message 0x0804a01a __bss_start 0x0804a01a _edata 0x0804a01c _end (gdb)
We can see that the address for our message matches what is loaded into the ecx register. Let's look at the actual data stored at this address. To view the data, we need to let gdb know how many bytes we want to view. We can see that message starts at 0x0804a000, and hex_message is the next label at 0x0804a00d, which means message should be 0xd (or 13) bytes long.

To view 13 bytes starting at the message label, enter:
x /13xb 0x0804a000

Which outputs:
(gdb) x /13xb 0x0804a000 0x804a000: 0x48 0x65 0x6c 0x6c 0x6f 0x20 0x57 0x6f 0x804a008: 0x72 0x6c 0x64 0x21 0x0a
This shows us the 13 bytes of data stored at 0x804a000 in hexadecimal, but since we know that the data contains ASCII characters, let's output the data in character format.

Enter:
x /13cb 0x0804a000

This outputs:
(gdb) x /13cb 0x0804a000 0x804a000: 72 'H' 101 'e' 108 'l' 108 'l' 111 'o' 32 ' ' 87 'W' 111 'o' 0x804a008: 114 'r' 108 'l' 100 'd' 33 '!' 10 '\n' (gdb)
We can easily recognize our "Hello World!\n" message.

Now let's view the contents of the entire .data section of our program. To do so, wee need to find the memory range that it occupies.

To view the memory ranges for our program's sections, enter:
info file

This outputs:
Symbols from "/home/pete/Documents/ASM/hello_world/x86/hello_x86". Native process: Using the running image of child process 84922. While running this, GDB does not access memory from... Local exec file: `/home/pete/Documents/ASM/hello_world/x86/hello_x86', file type elf32-i386. Entry point: 0x8049000 0x08049000 - 0x0804903a is .text 0x0804a000 - 0x0804a01a is .data 0xf7ffc0b4 - 0xf7ffc0f4 is .hash in system-supplied DSO at 0xf7ffc000 0xf7ffc0f4 - 0xf7ffc140 is .gnu.hash in system-supplied DSO at 0xf7ffc000 0xf7ffc140 - 0xf7ffc1f0 is .dynsym in system-supplied DSO at 0xf7ffc000 0xf7ffc1f0 - 0xf7ffc2b0 is .dynstr in system-supplied DSO at 0xf7ffc000 0xf7ffc2b0 - 0xf7ffc2c6 is .gnu.version in system-supplied DSO at 0xf7ffc000 0xf7ffc2c8 - 0xf7ffc31c is .gnu.version_d in system-supplied DSO at 0xf7ffc000 0xf7ffc31c - 0xf7ffc3ac is .dynamic in system-supplied DSO at 0xf7ffc000 0xf7ffc3ac - 0xf7ffc3b8 is .rodata in system-supplied DSO at 0xf7ffc000 0xf7ffc3b8 - 0xf7ffc40c is .note in system-supplied DSO at 0xf7ffc000 0xf7ffc40c - 0xf7ffc430 is .eh_frame_hdr in system-supplied DSO at 0xf7ffc000 0xf7ffc430 - 0xf7ffc53c is .eh_frame in system-supplied DSO at 0xf7ffc000 0xf7ffc540 - 0xf7ffd262 is .text in system-supplied DSO at 0xf7ffc000 --Type <RET> for more, q to quit, c to continue without paging--
The .text and .data entries are the two entires we care about right now. The entries listed below them are Dynamic Shared Objects (DSO) that we are not currently interested in. Looking at the entry for our .data section, we can see that it occupies memory addresses 0x0804a000 - 0x0804a01a.

We can have gdb calculate the size of the .data section for us by entering the command:
print (0x0804a01a - 0x0804a000)

This subtracts the starting address of .data from the ending address and prints the result, which should be 26 bytes.

To view all 26 bytes stored in the .data section, enter:
x /26xb 0x0804a000

This outputs:
(gdb) x /26xb 0x0804a000 0x804a000: 0x48 0x65 0x6c 0x6c 0x6f 0x20 0x57 0x6f 0x804a008: 0x72 0x6c 0x64 0x21 0x0a 0x48 0x65 0x6c 0x804a010: 0x6c 0x6f 0x20 0x57 0x6f 0x72 0x6c 0x64 0x804a018: 0x21 0x0a (gdb)
Again, since we know this should contain ASCII characters, we can output it in character format with:

x /26cb 0x0804a000

Which outputs:
0x804a000: 72 'H' 101 'e' 108 'l' 108 'l' 111 'o' 32 ' ' 87 'W' 111 'o' 0x804a008: 114 'r' 108 'l' 100 'd' 33 '!' 10 '\n' 72 'H' 101 'e' 108 'l' 0x804a010: 108 'l' 111 'o' 32 ' ' 87 'W' 111 'o' 114 'r' 108 'l' 100 'd' 0x804a018: 33 '!' 10 '\n' (gdb)
This shows the data for both our message label which starts at 0x804a000, and our hex_message label at 0x804a00d.

Let's continue to step through our program execution by entering si:
|-Register group: general------------------------------------------------------------------------------------------------------------------------------------------------| |eax 0x4 4 ecx 0x804a000 134520832 | |edx 0xd 13 ebx 0x1 1 | |esp 0xffffd950 0xffffd950 ebp 0x0 0x0 | |esi 0x0 0 edi 0x0 0 | |eip 0x8049015 0x8049015 <_start+21> eflags 0x202 [ IF ] | |cs 0x23 35 ss 0x2b 43 | |ds 0x2b 43 es 0x2b 43 | |fs 0x0 0 gs 0x0 0 | |k0 0x0 0 k1 0x0 0 | |k2 0x0 0 k3 0x0 0 | |k4 0x0 0 k5 0x0 0 | |k6 0x0 0 k7 0x0 0 | | | | | | | | | | | | | | | |------------------------------------------------------------------------------------------------------------------------------------------------------------------------| |B+ 0x8049000 <_start> mov eax,0x4 | | 0x8049005 <_start+5> mov ebx,0x1 | | 0x804900a <_start+10> lea ecx,ds:0x804a000 | | 0x8049010 <_start+16> mov edx,0xd | | > 0x8049015 <_start+21> int 0x80 | | 0x8049017 <print_hex_message> mov eax,0x4 | | 0x804901c <print_hex_message+5> mov ebx,0x1 | | 0x8049021 <print_hex_message+10> lea ecx,ds:0x804a00d | | 0x8049027 <print_hex_message+16> mov edx,0xd | | 0x804902c <print_hex_message+21> int 0x80 | | 0x804902e <exit_program> mov eax,0x1 | | 0x8049033 <exit_program+5> mov ebx,0x0 | | 0x8049038 <exit_program+10> int 0x80 | | 0x804903a add BYTE PTR [eax],al | | 0x804903c add BYTE PTR [eax],al | | 0x804903e add BYTE PTR [eax],al | | 0x8049040 add BYTE PTR [eax],al | | 0x8049042 add BYTE PTR [eax],al | | 0x8049044 add BYTE PTR [eax],al | | 0x8049046 add BYTE PTR [eax],al | |------------------------------------------------------------------------------------------------------------------------------------------------------------------------| native process 84922 In: _start L?? PC: 0x8049015 0xf7ffc430 - 0xf7ffc53c is .eh_frame in system-supplied DSO at 0xf7ffc000 0xf7ffc540 - 0xf7ffd262 is .text in system-supplied DSO at 0xf7ffc000 --Type <RET> for more, q to quit, c to continue without paging-- 0xf7ffd262 - 0xf7ffd2c2 is .altinstructions in system-supplied DSO at 0xf7ffc000 0xf7ffd2c2 - 0xf7ffd2e2 is .altinstr_replacement in system-supplied DSO at 0xf7ffc000 (gdb) x /26b 0x0804a000 0x804a000: 72 'H' 101 'e' 108 'l' 108 'l' 111 'o' 32 ' ' 87 'W' 111 'o' 0x804a008: 114 'r' 108 'l' 100 'd' 33 '!' 10 '\n' 72 'H' 101 'e' 108 'l' 0x804a010: 108 'l' 111 'o' 32 ' ' 87 'W' 111 'o' 114 'r' 108 'l' 100 'd' 0x804a018: 33 '!' 10 '\n' (gdb) x /26xb 0x0804a000 0x804a000: 0x48 0x65 0x6c 0x6c 0x6f 0x20 0x57 0x6f 0x804a008: 0x72 0x6c 0x64 0x21 0x0a 0x48 0x65 0x6c 0x804a010: 0x6c 0x6f 0x20 0x57 0x6f 0x72 0x6c 0x64 0x804a018: 0x21 0x0a (gdb) x /26cb 0x0804a000 0x804a000: 72 'H' 101 'e' 108 'l' 108 'l' 111 'o' 32 ' ' 87 'W' 111 'o' 0x804a008: 114 'r' 108 'l' 100 'd' 33 '!' 10 '\n' 72 'H' 101 'e' 108 'l' 0x804a010: 108 'l' 111 'o' 32 ' ' 87 'W' 111 'o' 114 'r' 108 'l' 100 'd' 0x804a018: 33 '!' 10 '\n' (gdb) si 0x08049015 in _start () (gdb)
The edx register has been set to 0xd now, which reflects the length of our message, whose address is stored in ecx.

Enter si again:
|eax 0x4 4 ecx 0x804a000 134520832 edx 0xd 13 | |eax 0xd 13 esp 0xffffd950 0xffffd950 edx 0xd 13 | |esi 0x0 0 edi 0x0 0 eip 0x8049015 0x8049015 <_start+21 | |eflags 0x202 [ IF ] cs 0x23 35 ss 0x2b 7 43 7 <print_hex | |ds 0x2b 43 es 0x2b 43 fs 0x0 0 | |gs 0x0 0 k0 0x0 0 k1 0x0 0 | |k2 0x0 0 k3 0x0 0 k4 0x0 0 | |k5 0x0 0 k6 0x0 0 k7 0x0 0 | | | | | | | | | | | | | | | | | | | | | | | |------------------------------------------------------------------------------------------------------------------------------------------------------------------------| |B+ 0x8049000 <_start> mov eax,0x4 | | 0x8049005 <_start+5> mov ebx,0x1 | | 0x804900a <_start+10> lea ecx,ds:0x804a000 | | 0x8049010 <_start+16> mov edx,0xd | | > 0x8049015 <_start+21> int 0x80 | | 0x8049015 <_start+21> int 0x800x4 | | > 0x8049017 <print_hex_message> mov eax,0x4 | | 0x8049021 <print_hex_message+10> lea ecx,ds:0x804a00d | | 0x8049027 <print_hex_message+16> mov edx,0xd | | 0x804902c <print_hex_message+21> int 0x80 | | 0x804902e <exit_program> mov eax,0x1 | | 0x8049033 <exit_program+5> mov ebx,0x0 | | 0x8049038 <exit_program+10> int 0x80 | | 0x804903a add BYTE PTR [eax],al | | 0x804903c add BYTE PTR [eax],al | | 0x804903e add BYTE PTR [eax],al | | 0x8049040 add BYTE PTR [eax],al | | 0x8049042 add BYTE PTR [eax],al | | 0x8049044 add BYTE PTR [eax],al | | 0x8049046 add BYTE PTR [eax],al | |------------------------------------------------------------------------------------------------------------------------------------------------------------------------| native process 96349 In: _start L?? PC: 0x8049015 0xf7ffc31c - 0xf7print_hex_messagec in system-supplied DSO at 0xf7ffc000 7 0xf7ffc3b8 - 0xf7ffc40c is .note in system-supplied DSO at 0xf7ffc000 0xf7ffc40c - 0xf7ffc430 is .eh_frame_hdr in system-supplied DSO at 0xf7ffc000 0xf7ffc430 - 0xf7ffc53c is .eh_frame in system-supplied DSO at 0xf7ffc000 0xf7ffc540 - 0xf7ffd262 is .text in system-supplied DSO at 0xf7ffc000 --Type <RET> for more, q to quit, c to continue without paging-- 0xf7ffd262 - 0xf7ffd2c2 is .altinstructions in system-supplied DSO at 0xf7ffc000 0xf7ffd2c2 - 0xf7ffd2e2 is .altinstr_replacement in system-supplied DSO at 0xf7ffc000 (gdb) x /26xb 0x0804a000 0x804a000: 0x48 0x65 0x6c 0x6c 0x6f 0x20 0x57 0x6f 0x804a008: 0x72 0x6c 0x64 0x21 0x0a 0x48 0x65 0x6c 0x804a010: 0x6c 0x6f 0x20 0x57 0x6f 0x72 0x6c 0x64 0x804a018: 0x21 0x0a (gdb) x /26cb 0x0804a000 0x804a000: 72 'H' 101 'e' 108 'l' 108 'l' 111 'o' 32 ' ' 87 'W' 111 'o' 0x804a008: 114 'r' 108 'l' 100 'd' 33 '!' 10 '\n' 72 'H' 101 'e' 108 'l' 0x804a010: 108 'l' 111 'o' 32 ' ' 87 'W' 111 'o' 114 'r' 108 'l' 100 'd' 0x804a018: 33 '!' 10 '\n' (gdb) si 0x08049015 in _start () (gdb) si Hello World! 0x08049017 in print_hex_message () (gdb)
Our interrupt 0x80 instruction was reached, and it invoked the write syscall, passing the parameters we set in the ebx, ecx, and edx registers. ebx was set to 0x1, for stdout, so our program wrote "Hello World!\n" to the terminal. Note, this output may be injected into the gdb command frame, which can corrupt the dislay output. Enter ctrl + l (Lower case L) to redraw the screen and fix this. In my example, it appears that there are duplicate 0x8049015 instruction lines and a phantom 0x800x4 interrupt instruction. Re-drawing the output corrects this.

GDB shows that we have reached the print_hex_message label which will execute the same steps as before to invoke a write syscall.
We can continue the program to completion by entering:
continue

Our gdb command window should display:
(gdb) continue Continuing. [Inferior 1 (process 96349) exited normally] (gdb)
This indicates that our program has completed without error.

We can exit gdb by entering:
quit


x86 Hello World Exercises

Exercise 1.

The GNU assembler can embed debugging symbols into object files. This can facilitate debugging your programs, and allows you to step through your source code when debugging.

Use the following commands to re-assemble and re-link the hello_x86 program:
x86_64-linux-gnu-as --32 -g -o hello_x86.o hello_x86.asm x86_64-linux-gnu-ld -m elf_i386 -o hello_x86 hello_x86.o

Note: the -g switch enables gdb debugging symbols.
Open the executable with gdb and step through the source.

Exercise 2.

Examine a disassembly of the hello_x86.o object file by entering:
x86_64-linux-gnu-objdump -d -M intel hello_x86.o
What happened to the print_message label? Why doesn't it appear?
Why is there no address in the lea ecx instruction?

Exercise 3.

Examine the binary file's header info with:
readelf -h hello_x86
What are the magic byte(s)?
What are the other flags in the ELF header?

Exercise 4.


Load the hello_x86 executable in gdb and examine the machine code with the following instructions:
(gdb) info file Symbols from "/home/pete/Documents/ASM/hello_world/x86/hello_x86". Local exec file: `/home/pete/Documents/ASM/hello_world/x86/hello_x86', file type elf32-i386. Entry point: 0x8049000 0x08049000 - 0x0804903a is .text 0x0804a000 - 0x0804a01a is .data (gdb) set $code_start = 0x08049000 (gdb) set $code_end = 0x0804903a (gdb) print ($code_end - $code_start) $1 = 58 (gdb) x /58xb $code_start 0x8049000 <_start>: 0xb8 0x04 0x00 0x00 0x00 0xbb 0x01 0x00 0x8049008 <_start+8>: 0x00 0x00 0x8d 0x0d 0x00 0xa0 0x04 0x08 0x8049010 <_start+16>: 0xba 0x0d 0x00 0x00 0x00 0xcd 0x80 0xb8 0x8049018 <print_hex_message+1>: 0x04 0x00 0x00 0x00 0xbb 0x01 0x00 0x00 0x8049020 <print_hex_message+9>: 0x00 0x8d 0x0d 0x0d 0xa0 0x04 0x08 0xba 0x8049028 <print_hex_message+17>: 0x0d 0x00 0x00 0x00 0xcd 0x80 0xb8 0x01 0x8049030 <exit_program+2>: 0x00 0x00 0x00 0xbb 0x00 0x00 0x00 0x00 0x8049038 <exit_program+10>: 0xcd 0x80 (gdb)

How many machine code instructions can you recognize?

Challenge Exercise:

Demonstrate your mastery of this section by re-writing the Hello World binary entirely in machine code. Write and execute the program without the use of an assembler or linker.