x86 Assembly — Pete's Tech Blog

x86 Assembly Introduction

x86 Hello World

It is time to write some code. The following code was written for GAS to demonstrate a basic Hello World program for x86. It was written with the Intel syntax (AT&T is the default for GAS). You can copy this code in to your favorite editor and read the very thorough comments to get a complete breakdown of the code.

.intel_syntax noprefix /* Directive for GAS (GNU Assembler) to use Intel sytanx instead of AT&T for x86 */ .section .data /* The .section directive is used to define or switch to an existing section in the object file */ /* The .data section is used to store static variables that are stored in memory for the entire duration of a programs execution. - Values for variables in this section are stored in the program binary - Variables are initialized with their stored values at run time - The .data section is writeable in memeory, so values here can be changed during execution */ message: /* A colon after a name is used to define the contents of a label */ /* Labels are user-defined names given to memory addresses When this code is assembled, the message label will be replaced with its memory address in the data section */ .ascii "Hello World!\n" /* The .ascii directive lets the assembler know that the following data should be interpreted as an ASCII string */ /* This value could be defined using the actual hex bytes, but it is easier to let the assembler convert the ASCII string to the byte value */ hex_message: .byte 0x48, 0x65, 0x6c, 0x6c, 0x6f, 0x20, 0x57, 0x6f, 0x72, 0x6c, 0x64, 0x21, 0x0a /* This stores the same message without using the .ascii directive as a short-cut Instead it defines a list of bytes to be stored in the .data section of memory These bytes have the same value as .ascii "Hello World!\n" */ .section .text /* The .text section stores the executable instructions of the program */ /* The text section is loaded as read-only */ .global _start /* The .global directive defines a global scope for a label */ /* _start is a special label used by GAS that indicates this is the starting point to begin executing code The .global directive indicates that the label _start should be accessible to other object files outside of the object file generated from this assembly source */ _start: /* This is the label for the memory address for the beginning of our code */ print_message: /* Print the message stored in the .data section using a write syscall Linux syscals for x86 can be referenced from these sources: https://chromium.googlesource.com/chromiumos/docs/+/master/constants/syscalls.md#x86-32_bit https://syscalls.w3challs.com/ x86 header file definitions for Linux syscalls can typically be found in: /usr/include/x86_64-linux-gnu/asm/unistd_32.h */ mov eax, 0x00000004 /* set the eax register to 0x4 which is the value for the write syscall */ /* mov is an opcode mnemonic that instructs the processor to copy a value In this instance mov is used to copy the immediate (hardcoded) value 0x00000004 to the eax register While 0x00000004 is used 0x4 or just 4 in decimal and it would write the same value When copying an immediate value to a 32-bit register, the machine code for mov is b8 + the 3-bit register bitmask The register masks for the GP registers are eax, 000 (0x0), ecx 001 (0x1), edx 010 (0xa), and ebx 011 (0x3) mov eax, 0x000000004 would be written in machine code as: b8 04 00 00 00 With little endian data values, individual bytes are read from left to right, but bytes are read right to left For example: mov eax, 0x12345678 would be written as: b8 78 56 34 12 */ mov ebx, 0x00000001 /* ebx stores the first argument for write */ /* This argument passes the file descriptor (fd) we want to write to FD 1 is stdout in Linux which will write the output to the terminal mov ebx, 0x00000001 will be written as bb 01 00 00 00 */ lea ecx, message /* ecx stores the second argument for write */ /* The Load Effective Address (lea) opcode loads the address for the message tag into ecx This is similar to using a pointer or refence rather than passing by value in C/C++ The machine opcode for lea is 8d, lea uses a 1 byte ModR/M bitmask to represent the addressing mode, the destination register, and the source register or displacement mode In this instance, the address of the message label is resolved and loaded directly into ecx as an immediate value The bitmask for this method of loading is: bit 7 6 5 4 3 2 1 0 function Mod Reg R/M For direct displacement addressing the both Mod bits are 0, and the R/M bits are 101 We know the ecx register mask is 001, so the complete bitmask mask would be: 0 0 0 0 1 1 0 1 which is 0x0d Therefore, this operation would be stored in machine code as: 8d 0d xx xx xx xx where xx is the address for message */ mov edx, 0x0000000d /* edx stores the third and final argument for write */ /* This argument passes how many bytes we want to write, starting at the address passed in argument 2 with ecx the machine opcode for mov for immediate values is b8 and the bitmask for edx is 010, so this instruction would be stored as ba 0d 00 00 00 in the binary */ int 0x80 /* interrupt (int) 0x80 invokes the x86 syscall */ /* This will execute the write syscall with the parameters that we loaded into registers ebx, ecx, and edx The return value will be written back to eax The interrupt machine code is 0xcd and the interrupt is 0x80, so this would be stored in the binary as cd 80 */ print_hex_message: /* This writes the same message, but uses the hex_message label instead of the .ascii message label */ mov eax, 0x00000004 mov ebx, 0x00000001 lea ecx, hex_message mov edx, 0x0000000d int 0x80 exit_program: /* This invokes the exit syscall which returns the exit code for the program */ mov eax, 0x00000001 /* eax is set to 1 for an exit syscall */ /* /usr/include/sysexits.h defines system exit codes If the program is terminated with a signal, then it will return 128 + SIGNAL /usr/include/asm-generic/signal.h defines the signals */ mov ebx, 0x00000000 /* ebx is set to our desired exit code - 0 is a successful exit */ int 0x80 /* invoke syscall */

These are links to the syscall tables referenced in the code comments:
Google syscall reference
W3 syscall reference

Assemble Link and Run

Once you have copied the code and have thoroughly read through the comments, it is time to make it executable. The GNU Assembler for x86_64 which was installed with the gcc-12 package can be used to assemble the source code into an object file. To assemble the source, navigate to the directory where the source is saved and enter the command:

x86_64-linux-gnu-as --32 -o hello_x86.o hello_x86.asm

This command assumes that you saved the source file as hello_x86.asm. It directs GAS to create a 32-bit object file named hello_x86.o and to use the hello_x86.asm file as an input source.

Once the object file is created, we can use the GNU linker to create the final executable binary. Enter the command:

x86_64-linux-gnu-ld -m elf_i386 -o hello_x86 hello_x86.o

The -m elf_i386 parameter directs the linker to create a legacy i386 32-bit binary in Executable and Linkable Format (ELF). The -o paramter specifies the output file as hello_x86 and the hello_x86.o is the only object file to be linked into the binary.

Once the executable is created, it can be run with ./hello_x86 It should produce the output:

Hello World! Hello World!

You can check the return value of the program immediately after it exists with:

echo $? 0