It is time to write some code. The following code was written for GAS to demonstrate a basic Hello World program for x86.
It was written with the Intel syntax (AT&T is the default for GAS). You can copy this code in to your favorite
editor and read the very thorough comments to get a complete breakdown of the code.
.intel_syntax noprefix /* Directive for GAS (GNU Assembler) to use Intel sytanx instead of AT&T for x86 */
.section .data /* The .section directive is used to define or switch to an existing section in the object file */
/*
The .data section is used to store static variables that are stored in memory for the entire duration of a programs execution.
- Values for variables in this section are stored in the program binary
- Variables are initialized with their stored values at run time
- The .data section is writeable in memeory, so values here can be changed during execution
*/
message: /* A colon after a name is used to define the contents of a label */
/*
Labels are user-defined names given to memory addresses
When this code is assembled, the message label will be replaced with its memory address in the data section
*/
.ascii "Hello World!\n" /* The .ascii directive lets the assembler know that the following data should be interpreted as an ASCII string */
/*
This value could be defined using the actual hex bytes, but it is easier to let the assembler
convert the ASCII string to the byte value
*/
hex_message:
.byte 0x48, 0x65, 0x6c, 0x6c, 0x6f, 0x20, 0x57, 0x6f, 0x72, 0x6c, 0x64, 0x21, 0x0a
/*
This stores the same message without using the .ascii directive as a short-cut
Instead it defines a list of bytes to be stored in the .data section of memory
These bytes have the same value as .ascii "Hello World!\n"
*/
.section .text /* The .text section stores the executable instructions of the program */
/*
The text section is loaded as read-only
*/
.global _start /* The .global directive defines a global scope for a label */
/*
_start is a special label used by GAS that indicates this is the starting point to begin executing code
The .global directive indicates that the label _start should be accessible to other object files outside of the
object file generated from this assembly source
*/
_start: /* This is the label for the memory address for the beginning of our code */
print_message:
/*
Print the message stored in the .data section using a write syscall
Linux syscals for x86 can be referenced from these sources:
https://chromium.googlesource.com/chromiumos/docs/+/master/constants/syscalls.md#x86-32_bit
https://syscalls.w3challs.com/
x86 header file definitions for Linux syscalls can typically be found in: /usr/include/x86_64-linux-gnu/asm/unistd_32.h
*/
mov eax, 0x00000004 /* set the eax register to 0x4 which is the value for the write syscall */
/*
mov is an opcode mnemonic that instructs the processor to copy a value
In this instance mov is used to copy the immediate (hardcoded) value 0x00000004 to the eax register
While 0x00000004 is used 0x4 or just 4 in decimal and it would write the same value
When copying an immediate value to a 32-bit register, the machine code for mov is b8 + the 3-bit register bitmask
The register masks for the GP registers are eax, 000 (0x0), ecx 001 (0x1), edx 010 (0xa), and ebx 011 (0x3)
mov eax, 0x000000004 would be written in machine code as: b8 04 00 00 00
With little endian data values, individual bytes are read from left to right, but bytes are read right to left
For example: mov eax, 0x12345678 would be written as: b8 78 56 34 12
*/
mov ebx, 0x00000001 /* ebx stores the first argument for write */
/*
This argument passes the file descriptor (fd) we want to write to
FD 1 is stdout in Linux which will write the output to the terminal
mov ebx, 0x00000001 will be written as bb 01 00 00 00
*/
lea ecx, message /* ecx stores the second argument for write */
/*
The Load Effective Address (lea) opcode loads the address for the message tag into ecx
This is similar to using a pointer or refence rather than passing by value in C/C++
The machine opcode for lea is 8d, lea uses a 1 byte ModR/M bitmask to represent the addressing mode,
the destination register, and the source register or displacement mode
In this instance, the address of the message label is resolved and loaded directly into ecx as an immediate value
The bitmask for this method of loading is:
bit 7 6 5 4 3 2 1 0
function Mod Reg R/M
For direct displacement addressing the both Mod bits are 0, and the R/M bits are 101
We know the ecx register mask is 001, so the complete bitmask mask would be:
0 0 0 0 1 1 0 1 which is 0x0d
Therefore, this operation would be stored in machine code as: 8d 0d xx xx xx xx where xx is the address for message
*/
mov edx, 0x0000000d /* edx stores the third and final argument for write */
/*
This argument passes how many bytes we want to write, starting at the address passed in argument 2 with ecx
the machine opcode for mov for immediate values is b8 and the bitmask for edx is 010,
so this instruction would be stored as ba 0d 00 00 00 in the binary
*/
int 0x80 /* interrupt (int) 0x80 invokes the x86 syscall */
/*
This will execute the write syscall with the parameters that we loaded into registers ebx, ecx, and edx
The return value will be written back to eax
The interrupt machine code is 0xcd and the interrupt is 0x80, so this would be stored in the binary as cd 80
*/
print_hex_message:
/*
This writes the same message, but uses the hex_message label instead of the .ascii message label
*/
mov eax, 0x00000004
mov ebx, 0x00000001
lea ecx, hex_message
mov edx, 0x0000000d
int 0x80
exit_program:
/*
This invokes the exit syscall which returns the exit code for the program
*/
mov eax, 0x00000001 /* eax is set to 1 for an exit syscall */
/*
/usr/include/sysexits.h defines system exit codes
If the program is terminated with a signal, then it will return 128 + SIGNAL
/usr/include/asm-generic/signal.h defines the signals
*/
mov ebx, 0x00000000 /* ebx is set to our desired exit code - 0 is a successful exit */
int 0x80 /* invoke syscall */
These are links to the syscall tables referenced in the code comments:
Google syscall reference
W3 syscall reference
Google syscall reference
W3 syscall reference