Welcome to the Multi-architecture Assembly Tutorial

x86 Assembly Introduction

The intent of this section is to provide foundational knowledge which will be used for x86 assembly, as well as other architectures. You will install and configure your tools, write an x86 assembly program, assemble it, link it, run it, debug it, and understand every single byte contained in its code.

Tool Installation

This tutorial assumes you are running a Debian-based distribution of Linux such as Ubuntu with a processor that has the x86_64 architecture. While you can certainly use other configurations, this tutorial provides instructions using the GNU Assembler, GNU Debugger, and QEMU suite of emulation tools. As long as you can install the same tools, then you should be able to follow along.

For this section you will need to install gcc and gdb with:

sudo apt install gcc-12 gdb

Memory Types

There are multiple levels of data storage available in a computer generally in the order of:

Cold storage such as magnetic or solid state flash drives:

Used for long term storage of data
Do not require electrical power to maintain data
Generally have the largest capacity but lowest seek and transfer speeds

Dynamic Random Access Memory (RAM or DRAM):

Stores data for use by the Operating System with currently running programs
Must have the data constantly refreshed with power (uses capacitors)
Faser seek and transfer speeds than cold storage, but lower capacity

Static Random Access Memory (SRAM or cache RAM):

Doesn't require constant refreshing
Will lose data with loss of voltage (uses transistors for storage)
Typically stores data for the CPU to operate on quickly and repeatedly
Typically stored on different levels (L1, L2, L3) in decreasing proximity to the CPU dye (but increasing capacity)

CPU Registers:

Provides the most immediate storage for the processor
Located directly in the CPU core (also transistors)
Smallest of all memory systems (sized based on the CPU architector)

The original 8086 processor used 16-bit registers with four different function types:

General purpose registers, used for temporary data storage from other operations, addresses, variables, counters, etc.
Memory segment registers, used for managing segmented memory addressing (this is not used for modern RAM layouts which are flat)
Index and pointer registers, used to track memory locations for instructions, stack memory, and string operations
Flag register, used to hold state flags for operations

x86 Registers

Below is a breakdown of the CPU registers found in the 8086

General Purpose x86 Registers: ax bx cx dx (16-bit Word registers) ah al bh bl ch cl dh dl (High and low bit portion)

The GP registers aren't dedicated to any single function, but they do have common uses to include:

ax - accumulator register typically stores results from other operations
bx - typically used to store the base address of memory for an array or an offset address
cx - typically used as a counter register
dx - typically used as the data extension register for division and multiplication operations to store the operand

x86 Memory Segment Registers: (code segment) (data segment) (stack segment) (extra segment) cs ds ss es

The segment registers were used to address more than 64k of memory in the 8086 and 20286:

cs - held the base adddress of the current code segment
ds - held the base address of the current data segment
ss - held the base address of the current stack segment
es - held held the base address of any additional data segment

These registers are no longer used for memory segmentation since modern x86 CPUs use a flat memory model

Pointers and Index Registers: (base pointer) (stack pointer) bp sp

sp - always stores the address of the newest element in the stack, this will be the lowest memory address in the stack
bp - when multiple functions are nested, the base pointer stores the address of the calling function's base pointer

The stack will be examined in detail when we explore functions.

(instruction pointer) (source index) (dest. index) ip si di

ip - stores the address of the next instruction that should be executed, this is an essential register
si - typically used to store a memory location for string or memory operations, the address where the value is copied from
di - typically used to store a memory location for string or memory operations, the address where the value is copied to

Flag register:

16-bit register that stores states for arithmatic logic unit (ALU) operations
Contains a bitmap showing what flags are set

bit 0x0F 0E 0D 0C 0B 0A 09 08 07 06 05 04 03 02 01 00 Flag 00 00 00 00 OF DF IF TF SF ZF 00 AF 00 PF 00 CF

00: Not used or reserved
OF: Overflow Flag, indicates whether (1) or not (0) an overflow occured during a math operation
DF: Direction Flag, indicates the direction that strings should be process in memory, lower to higher memory (0) or higher to lower (1)
IF: Interrupt Flag, indicates whether (1) or not (0) maskable (optional) hardware interrupts should be processed
TF: Trap Flag, indicates whether (1) or not (0) commands should be executed a single step at a time
SF: Sign Flag, indicates the sign of a number for signed operations, negative (1), or positive (0)
ZF: Zero Flag, indicates whether the result of a logic test is zero (1) or not zero (0)
AF: Auxiliary Flag, indicates whether there is a carry (1) or not (0) between the low nibble and high nibble or 8-bit instructions
PF: Parity Flag, indicates whether there is an even (1) or odd (0) number of bytes in a value
CF: Carry Flag, indicates whether (1) or not (0) there is a value that is carried or borrowed by math operations

A complete reference to the x86 architecture can be downloaded from Intel here

Extended Registers

The 80386 processor extended the registers from the 16-bits used by the 80286 and 8086 to 32-bits. An "e" was added to the register names to indicate that they are used for the extended 32-bit operations.

GP Registers: 32-bit (dword/long) 16-bit (word) (8086) 8-bit (byte) (8086) eax ax ah al ebx bx bh bl ecx cb ch cl edx dx dh dl Pointer and Index Registers: esp sp ebp bp esi si edi di eip ip

Segment Registers:
The 80386 also added two additional segment registers.

fs (frame segment), used in x86 for managing protected mode segmented memory
gs (general segment), used for x86 protected memory, used by x86_64 still for special purpose OS tasks

Extended Flag (EFLAG):
80386 also extended the x86 flag register to 32-bits

bit 0x1F 1E 1D 1C AB 1A 19 18 17 16 15 14 13 12 11 10 0F 0E 0D 0C 0B 0A 09 08 07 06 05 04 03 02 01 00 Flag 00 00 00 00 00 00 00 00 00 00 00 00 00 00 VM RF 00 NT PL PL OF DF IF TF SF ZF 00 AF 00 PF 00 CF

00: Not used or reserved
PL: Privilege level flag, this is a 2-bit flag indicating the privilege level of an IO instruction from 0-3 (system defined)
NT: Nested task flag, indicates whether (1) or not (0) a process is executing as a next task (system defined)
RF: Resume flag, indicates whether (1) or not (0) execution should resume after a debug exception
VM: Virtual mode flag, indicates whether (1) or not (0) commands should be executed in virtual-8086 mode

Assembly Source Code

Assembly source code is processed by an assembler to convert it in to machine code that the target CPU architecture will understand and output it to a file format that the OS will be able to execute.

The assembler:

Translates mnemonic opcodes into machine code
Resolves symbolic addresses and labels to actual memory addresses
Calculates relative offsets for branching instructions (calls, jumps, etc.)
Processes assembler directives that define data, set memory alignment, and specify output file sections
Generates output files that can be linked together to create an executable program

The assembler follows this order when processing assembly source files:

Process assembler directives
Assign memory addresses to labels
Calculate the size and layout of the program in memory
Translate opcodes to machine code
Resolve symbolic addresses
Generate an output file

Throughout this course we will be using the GNU Assembler (GAS). GAS has many directives, but some of the more common ones include:

.align n: Align the next data item on an n-byte boundary.
.ascii "string": Store the string in memory without a null terminator.
.asciz "string": Store the string in memory with a null terminator.
.balign n: Same as .align, but pads with zeros instead of NOP instructions.
.byte n1, n2, ...: Store a sequence of 8-bit bytes in memory.
.comm symbol, length: Declare a common block of the specified length for symbol.
.data: Switch to the data section for subsequent data items.
.equ symbol, expression: Set the value of a symbol to a constant expression.
.fill repeat, size, value: Generate a block of data with the specified size, repeat times, initialized to the given value.
.globl symbol: Mark a symbol as global, making it accessible by other object files during the linking process.
.local symbol: Mark a symbol as local, meaning it will not be accessible by other object files.
.long n1, n2, ...: Store a sequence of 32-bit integers in memory.
.org new_location: Set the assembly location counter to the specified new_location.
.section name, flags: Switch to a named section with the specified flags.
.short n1, n2, ...: Store a sequence of 16-bit integers in memory.
.size symbol, expression: Set the size of a symbol to the given expression.
.space n: Insert n bytes of zero-initialized space into the output.
.string "string": Same as .asciz, store the string in memory with a null terminator.
.text: Switch to the text section for subsequent instructions.
.word n1, n2, ...: Store a sequence of 16-bit or 32-bit integers in memory, depending on the target architecture.

Hello World

It is time to write some code. The following code was written for GAS to demonstrate a basic Hello World program for x86. It was written with the Intel syntax (AT&T is the default for GAS). You can copy this code in to your favorite editor and read the very thorough comments to get a complete breakdown of the code.

.intel_syntax noprefix /* Directive for GAS (GNU Assembler) to use Intel sytanx instead of AT&T for x86 */ .section .data /* The .section directive is used to define or switch to an existing section in the object file */ /* The .data section is used to store static variables that are stored in memory for the entire duration of a programs execution. - Values for variables in this section are stored in the program binary - Variables are initialized with their stored values at run time - The .data section is writeable in memeory, so values here can be changed during execution */ message: /* A colon after a name is used to define the contents of a label */ /* Labels are user-defined names given to memory addresses When this code is assembled, the message label will be replaced with its memory address in the data section */ .ascii "Hello World!\n" /* The .ascii directive lets the assembler know that the following data should be interpreted as an ASCII string */ /* This value could be defined using the actual hex bytes, but it is easier to let the assembler convert the ASCII string to the byte value */ hex_message: .byte 0x48, 0x65, 0x6c, 0x6c, 0x6f, 0x20, 0x57, 0x6f, 0x72, 0x6c, 0x64, 0x21, 0x0a /* This stores the same message without using the .ascii directive as a short-cut Instead it defines a list of bytes to be stored in the .data section of memory These bytes have the same value as .ascii "Hello World!\n" */ .section .text /* The .text section stores the executable instructions of the program */ /* The text section is loaded as read-only */ .global _start /* The .global directive defines a global scope for a label */ /* _start is a special label used by GAS that indicates this is the starting point to begin executing code The .global directive indicates that the label _start should be accessible to other object files outside of the object file generated from this assembly source */ _start: /* This is the label for the memory address for the beginning of our code */ print_message: /* Print the message stored in the .data section using a write syscall Linux syscals for x86 can be referenced from these sources: https://chromium.googlesource.com/chromiumos/docs/+/master/constants/syscalls.md#x86-32_bit https://syscalls.w3challs.com/ x86 header file definitions for Linux syscalls can typically be found in: /usr/include/x86_64-linux-gnu/asm/unistd_32.h */ mov eax, 0x00000004 /* set the eax register to 0x4 which is the value for the write syscall */ /* mov is an opcode mnemonic that instructs the processor to copy a value In this instance mov is used to copy the immediate (hardcoded) value 0x00000004 to the eax register While 0x00000004 is used 0x4 or just 4 in decimal and it would write the same value When copying an immediate value to a 32-bit register, the machine code for mov is b8 + the 3-bit register bitmask The register masks for the GP registers are eax, 000 (0x0), ecx 001 (0x1), edx 010 (0xa), and ebx 011 (0x3) mov eax, 0x000000004 would be written in machine code as: b8 04 00 00 00 With little endian data values, individual bytes are read from left to right, but bytes are read right to left For example: mov eax, 0x12345678 would be written as: b8 78 56 34 12 */ mov ebx, 0x00000001 /* ebx stores the first argument for write */ /* This argument passes the file descriptor (fd) we want to write to FD 1 is stdout in Linux which will write the output to the terminal mov ebx, 0x00000001 will be written as bb 01 00 00 00 */ lea ecx, message /* ecx stores the second argument for write */ /* The Load Effective Address (lea) opcode loads the address for the message tag into ecx This is similar to using a pointer or refence rather than passing by value in C/C++ The machine opcode for lea is 8d, lea uses a 1 byte ModR/M bitmask to represent the addressing mode, the destination register, and the source register or displacement mode In this instance, the address of the message label is resolved and loaded directly into ecx as an immediate value The bitmask for this method of loading is: bit 7 6 5 4 3 2 1 0 function Mod Reg R/M For direct displacement addressing the both Mod bits are 0, and the R/M bits are 101 We know the ecx register mask is 001, so the complete bitmask mask would be: 0 0 0 0 1 1 0 1 which is 0x0d Therefore, this operation would be stored in machine code as: 8d 0d xx xx xx xx where xx is the address for message */ mov edx, 0x0000000d /* edx stores the third and final argument for write */ /* This argument passes how many bytes we want to write, starting at the address passed in argument 2 with ecx the machine opcode for mov for immediate values is b8 and the bitmask for edx is 010, so this instruction would be stored as ba 0d 00 00 00 in the binary */ int 0x80 /* interrupt (int) 0x80 invokes the x86 syscall */ /* This will execute the write syscall with the parameters that we loaded into registers ebx, ecx, and edx The return value will be written back to eax The interrupt machine code is 0xcd and the interrupt is 0x80, so this would be stored in the binary as cd 80 */ print_hex_message: /* This writes the same message, but uses the hex_message label instead of the .ascii message label */ mov eax, 0x00000004 mov ebx, 0x00000001 lea ecx, hex_message mov edx, 0x0000000d int 0x80 exit_program: /* This invokes the exit syscall which returns the exit code for the program */ mov eax, 0x00000001 /* eax is set to 1 for an exit syscall */ /* /usr/include/sysexits.h defines system exit codes If the program is terminated with a signal, then it will return 128 + SIGNAL /usr/include/asm-generic/signal.h defines the signals */ mov ebx, 0x00000000 /* ebx is set to our desired exit code - 0 is a successful exit */ int 0x80 /* invoke syscall */

These are links to the syscall tables referenced in the code comments:
Google syscall reference
W3 syscall reference

Assemble Link and Run

Once you have copied the code and have thoroughly read through the comments, it is time to make it executable. The GNU Assembler for x86_64 which was installed with the gcc-12 package can be used to assemble the source code into an object file. To assemble the source, navigate to the directory where the source is saved and enter the command:

x86_64-linux-gnu-as --32 -o hello_x86.o hello_x86.asm

This command assumes that you saved the source file as hello_x86.asm. It directs GAS to create a 32-bit object file named hello_x86.o and to use the hello_x86.asm file as an input source.

Once the object file is created, we can use the GNU linker to create the final executable binary. Enter the command:

x86_64-linux-gnu-ld -m elf_i386 -o hello_x86 hello_x86.o

The -m elf_i386 parameter directs the linker to create a legacy i386 32-bit binary in Executable and Linkable Format (ELF). The -o paramter specifies the output file as hello_x86 and the hello_x86.o is the only object file to be linked into the binary.

Once the executable is created, it can be run with ./hello_x86 It should produce the output:

Hello World! Hello World!

You can check the return value of the program immediately after it exists with:

echo $? 0

Debugging in GDB

GDB is a powerful text based debugger that is run from the terminal. We will use it extensively throughout this tutorial, and start by examing the hello_x86 binary we just created. To begin, invoke the gdb debugger with:
gdb hello_x86

pete@XPS:~/Documents/ASM/hello_world/x86$ gdb hello_x86 GNU gdb (Ubuntu 12.1-0ubuntu1~22.04) 12.1 Copyright (C) 2022 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: . Find the GDB manual and other documentation resources online at: . For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from hello_x86... (No debugging symbols found in hello_x86) (gdb)

We are greated with the gdb prompt.

We wrote our assembly using the Intel syntax, so we will switch our disassembly output to that syntax with:

set disassembly-flavor intel

We want to view the disassembly code and the CPU registers while we are running our program, we can enable them with:

lay asm lay reg

Your terminal window should now look similar this this:

|------------------------------------------------------------------------------------------------------------------------------------------| | | | | | | | | | | | | | | | | | | | [ Register Values Unavailable ] | | | | | | | | | | | | | | | | | | | |------------------------------------------------------------------------------------------------------------------------------------------| | | | | | | | | | | | | | | | | | | | [ No Assembly Available ] | | | | | | | | | | | | | | | | | |------------------------------------------------------------------------------------------------------------------------------------------| exec No process In: L?? PC: ?? (gdb) lay reg (gdb)

Note: If at any time the screen output appears corrupted, you can enter ctrl + l to redraw the screen.
You can use ctrl + x then o or p to switch between the assembly, register, and gdb command frames.

We will now set a break point for debugging our code by entering:

break _start

This will set a break point at the beginning of the _start label.

We can now begin our program execution by entering:

run

Your terminal should now look similar to the output below:

Our program is now running inside the debugger. We can see that our breakpoint was set at the beginning of the _start label which is at memory address 0x0804900. The first instruction at that address should be highlighted in the assembly frame, and if we look at the register group, we can see that our eip register has the address of the next instruction to mov 0x4 into eax.

We will now execute a single instruction by entering:

The first instruction was executed, and looking at the registers, we can see that now eax holds the value 0x4, and eip holds the address of the next instruction at 0x08049005. This is also highlighted in our assembly frame and pointed to with the > symbol. The >_start+5< tag indicates that this memory location is offset 5 bytes from the beginning of our _start label, which means our first instruction was 5 bytes long. Or gdb command window indicates we are at memory address 0x08049005 in the _start label.

Enter si to execute the next instruction:

Notice that ebx has been set to 0x1 and the eip register has been updated again to point to the next instruction.

Enter si again:

|-Register group: general------------------------------------------------------------------------------------------------------------------------------------------------| |eax 0x4 4 ecx 0x804a000 134520832 edx 0x0 0 | |ebx 0x1 1 esp 0xffffd950 0xffffd950 ebp 0x0 0x0 | |esi 0x0 0 edi 0x0 0 eip 0x8049010 0x8049010 <_start+16 | |eflags 0x202 [ IF ] cs 0x23 35 ss 0x2b 43 | |ds 0x2b 43 es 0x2b 43 fs 0x0 0 | |gs 0x0 0 k0 0x0 0 k1 0x0 0 | |k2 0x0 0 k3 0x0 0 k4 0x0 0 | |k5 0x0 0 k6 0x0 0 k7 0x0 0 | | | | | | | | | | | | | | | | | | | | | | | |------------------------------------------------------------------------------------------------------------------------------------------------------------------------| |B+ 0x8049000 <_start> mov eax,0x4 | | 0x8049005 <_start+5> mov ebx,0x1 | | 0x804900a <_start+10> lea ecx,ds:0x804a000 | | > 0x8049010 <_start+16> mov edx,0xd | | 0x8049015 <_start+21> int 0x80 | | 0x8049017 <print_hex_message> mov eax,0x4 | | 0x804901c <print_hex_message+5> mov ebx,0x1 | | 0x8049021 <print_hex_message+10> lea ecx,ds:0x804a00d | | 0x8049027 <print_hex_message+16> mov edx,0xd | | 0x804902c <print_hex_message+21> int 0x80 | | 0x804902e <exit_program> mov eax,0x1 | | 0x8049033 <exit_program+5> mov ebx,0x0 | | 0x8049038 <exit_program+10> int 0x80 | | 0x804903a add BYTE PTR [eax],al | | 0x804903c add BYTE PTR [eax],al | | 0x804903e add BYTE PTR [eax],al | | 0x8049040 add BYTE PTR [eax],al | | 0x8049042 add BYTE PTR [eax],al | | 0x8049044 add BYTE PTR [eax],al | | 0x8049046 add BYTE PTR [eax],al | |------------------------------------------------------------------------------------------------------------------------------------------------------------------------| native process 84922 In: _start L?? PC: 0x8049010 (gdb) lay reg (gdb) break _start Breakpoint 1 at 0x8049000 (gdb) run Starting program: /home/pete/Documents/ASM/hello_world/x86/hello_x86 Breakpoint 1, 0x08049000 in _start () (gdb) si 0x08049005 in _start () (gdb) si 0x0804900a in _start () (gdb) si 0x08049010 in _start () (gdb)

Now we can see our ecx register has been loaded with a memory address. This should be the memory address of our message label.

We can view a list of variable labels in our program by entering the command:

info variables

Which outputs:

(gdb) info variables All defined variables: Non-debugging symbols: 0x0804a000 message 0x0804a00d hex_message 0x0804a01a __bss_start 0x0804a01a _edata 0x0804a01c _end (gdb)

We can see that the address for our message matches what is loaded into the ecx register. Let's look at the actual data stored at this address. To view the data, we need to let gdb know how many bytes we want to view. We can see that message starts at 0x0804a000, and hex_message is the next label at 0x0804a00d, which means message should be 0xd (or 13) bytes long.

To view 13 bytes starting at the message label, enter:

x /13xb 0x0804a000

Which outputs:

(gdb) x /13xb 0x0804a000 0x804a000: 0x48 0x65 0x6c 0x6c 0x6f 0x20 0x57 0x6f 0x804a008: 0x72 0x6c 0x64 0x21 0x0a

This shows us the 13 bytes of data stored at 0x804a000 in hexadecimal, but since we know that the data contains ASCII characters, let's output the data in character format.

Enter:

x /13cb 0x0804a000

This outputs:

(gdb) x /13cb 0x0804a000 0x804a000: 72 'H' 101 'e' 108 'l' 108 'l' 111 'o' 32 ' ' 87 'W' 111 'o' 0x804a008: 114 'r' 108 'l' 100 'd' 33 '!' 10 '\n' (gdb)

We can easily recognize our "Hello World!\n" message.

Now let's view the contents of the entire .data section of our program. To do so, wee need to find the memory range that it occupies.

To view the memory ranges for our program's sections, enter:

info file

This outputs:

Symbols from "/home/pete/Documents/ASM/hello_world/x86/hello_x86". Native process: Using the running image of child process 84922. While running this, GDB does not access memory from... Local exec file: `/home/pete/Documents/ASM/hello_world/x86/hello_x86', file type elf32-i386. Entry point: 0x8049000 0x08049000 - 0x0804903a is .text 0x0804a000 - 0x0804a01a is .data 0xf7ffc0b4 - 0xf7ffc0f4 is .hash in system-supplied DSO at 0xf7ffc000 0xf7ffc0f4 - 0xf7ffc140 is .gnu.hash in system-supplied DSO at 0xf7ffc000 0xf7ffc140 - 0xf7ffc1f0 is .dynsym in system-supplied DSO at 0xf7ffc000 0xf7ffc1f0 - 0xf7ffc2b0 is .dynstr in system-supplied DSO at 0xf7ffc000 0xf7ffc2b0 - 0xf7ffc2c6 is .gnu.version in system-supplied DSO at 0xf7ffc000 0xf7ffc2c8 - 0xf7ffc31c is .gnu.version_d in system-supplied DSO at 0xf7ffc000 0xf7ffc31c - 0xf7ffc3ac is .dynamic in system-supplied DSO at 0xf7ffc000 0xf7ffc3ac - 0xf7ffc3b8 is .rodata in system-supplied DSO at 0xf7ffc000 0xf7ffc3b8 - 0xf7ffc40c is .note in system-supplied DSO at 0xf7ffc000 0xf7ffc40c - 0xf7ffc430 is .eh_frame_hdr in system-supplied DSO at 0xf7ffc000 0xf7ffc430 - 0xf7ffc53c is .eh_frame in system-supplied DSO at 0xf7ffc000 0xf7ffc540 - 0xf7ffd262 is .text in system-supplied DSO at 0xf7ffc000 --Type for more, q to quit, c to continue without paging--

The .text and .data entries are the two entires we care about right now. The entries listed below them are Dynamic Shared Objects (DSO) that we are not currently interested in. Looking at the entry for our .data section, we can see that it occupies memory addresses 0x0804a000 - 0x0804a01a.

We can have gdb calculate the size of the .data section for us by entering the command:

print (0x0804a01a - 0x0804a000)

This subtracts the starting address of .data from the ending address and prints the result, which should be 26 bytes.

To view all 26 bytes stored in the .data section, enter:

x /26xb 0x0804a000

This outputs:

(gdb) x /26xb 0x0804a000 0x804a000: 0x48 0x65 0x6c 0x6c 0x6f 0x20 0x57 0x6f 0x804a008: 0x72 0x6c 0x64 0x21 0x0a 0x48 0x65 0x6c 0x804a010: 0x6c 0x6f 0x20 0x57 0x6f 0x72 0x6c 0x64 0x804a018: 0x21 0x0a (gdb)

Again, since we know this should contain ASCII characters, we can output it in character format with:

x /26cb 0x0804a000

Which outputs:

0x804a000: 72 'H' 101 'e' 108 'l' 108 'l' 111 'o' 32 ' ' 87 'W' 111 'o' 0x804a008: 114 'r' 108 'l' 100 'd' 33 '!' 10 '\n' 72 'H' 101 'e' 108 'l' 0x804a010: 108 'l' 111 'o' 32 ' ' 87 'W' 111 'o' 114 'r' 108 'l' 100 'd' 0x804a018: 33 '!' 10 '\n' (gdb)

This shows the data for both our message label which starts at 0x804a000, and our hex_message label at 0x804a00d.

Let's continue to step through our program execution by entering si:

|-Register group: general------------------------------------------------------------------------------------------------------------------------------------------------| |eax 0x4 4 ecx 0x804a000 134520832 | |edx 0xd 13 ebx 0x1 1 | |esp 0xffffd950 0xffffd950 ebp 0x0 0x0 | |esi 0x0 0 edi 0x0 0 | |eip 0x8049015 0x8049015 <_start+21> eflags 0x202 [ IF ] | |cs 0x23 35 ss 0x2b 43 | |ds 0x2b 43 es 0x2b 43 | |fs 0x0 0 gs 0x0 0 | |k0 0x0 0 k1 0x0 0 | |k2 0x0 0 k3 0x0 0 | |k4 0x0 0 k5 0x0 0 | |k6 0x0 0 k7 0x0 0 | | | | | | | | | | | | | | | |------------------------------------------------------------------------------------------------------------------------------------------------------------------------| |B+ 0x8049000 <_start> mov eax,0x4 | | 0x8049005 <_start+5> mov ebx,0x1 | | 0x804900a <_start+10> lea ecx,ds:0x804a000 | | 0x8049010 <_start+16> mov edx,0xd | | > 0x8049015 <_start+21> int 0x80 | | 0x8049017 <print_hex_message> mov eax,0x4 | | 0x804901c <print_hex_message+5> mov ebx,0x1 | | 0x8049021 <print_hex_message+10> lea ecx,ds:0x804a00d | | 0x8049027 <print_hex_message+16> mov edx,0xd | | 0x804902c <print_hex_message+21> int 0x80 | | 0x804902e <exit_program> mov eax,0x1 | | 0x8049033 <exit_program+5> mov ebx,0x0 | | 0x8049038 <exit_program+10> int 0x80 | | 0x804903a add BYTE PTR [eax],al | | 0x804903c add BYTE PTR [eax],al | | 0x804903e add BYTE PTR [eax],al | | 0x8049040 add BYTE PTR [eax],al | | 0x8049042 add BYTE PTR [eax],al | | 0x8049044 add BYTE PTR [eax],al | | 0x8049046 add BYTE PTR [eax],al | |------------------------------------------------------------------------------------------------------------------------------------------------------------------------| native process 84922 In: _start L?? PC: 0x8049015 0xf7ffc430 - 0xf7ffc53c is .eh_frame in system-supplied DSO at 0xf7ffc000 0xf7ffc540 - 0xf7ffd262 is .text in system-supplied DSO at 0xf7ffc000 --Type for more, q to quit, c to continue without paging-- 0xf7ffd262 - 0xf7ffd2c2 is .altinstructions in system-supplied DSO at 0xf7ffc000 0xf7ffd2c2 - 0xf7ffd2e2 is .altinstr_replacement in system-supplied DSO at 0xf7ffc000 (gdb) x /26b 0x0804a000 0x804a000: 72 'H' 101 'e' 108 'l' 108 'l' 111 'o' 32 ' ' 87 'W' 111 'o' 0x804a008: 114 'r' 108 'l' 100 'd' 33 '!' 10 '\n' 72 'H' 101 'e' 108 'l' 0x804a010: 108 'l' 111 'o' 32 ' ' 87 'W' 111 'o' 114 'r' 108 'l' 100 'd' 0x804a018: 33 '!' 10 '\n' (gdb) x /26xb 0x0804a000 0x804a000: 0x48 0x65 0x6c 0x6c 0x6f 0x20 0x57 0x6f 0x804a008: 0x72 0x6c 0x64 0x21 0x0a 0x48 0x65 0x6c 0x804a010: 0x6c 0x6f 0x20 0x57 0x6f 0x72 0x6c 0x64 0x804a018: 0x21 0x0a (gdb) x /26cb 0x0804a000 0x804a000: 72 'H' 101 'e' 108 'l' 108 'l' 111 'o' 32 ' ' 87 'W' 111 'o' 0x804a008: 114 'r' 108 'l' 100 'd' 33 '!' 10 '\n' 72 'H' 101 'e' 108 'l' 0x804a010: 108 'l' 111 'o' 32 ' ' 87 'W' 111 'o' 114 'r' 108 'l' 100 'd' 0x804a018: 33 '!' 10 '\n' (gdb) si 0x08049015 in _start () (gdb)

The edx register has been set to 0xd now, which reflects the length of our message, whose address is stored in ecx.

Enter si again:

|eax 0x4 4 ecx 0x804a000 134520832 edx 0xd 13 | |eax 0xd 13 esp 0xffffd950 0xffffd950 edx 0xd 13 | |esi 0x0 0 edi 0x0 0 eip 0x8049015 0x8049015 <_start+21 | |eflags 0x202 [ IF ] cs 0x23 35 ss 0x2b 7 43 7 <print_hex | |ds 0x2b 43 es 0x2b 43 fs 0x0 0 | |gs 0x0 0 k0 0x0 0 k1 0x0 0 | |k2 0x0 0 k3 0x0 0 k4 0x0 0 | |k5 0x0 0 k6 0x0 0 k7 0x0 0 | | | | | | | | | | | | | | | | | | | | | | | |------------------------------------------------------------------------------------------------------------------------------------------------------------------------| |B+ 0x8049000 <_start> mov eax,0x4 | | 0x8049005 <_start+5> mov ebx,0x1 | | 0x804900a <_start+10> lea ecx,ds:0x804a000 | | 0x8049010 <_start+16> mov edx,0xd | | > 0x8049015 <_start+21> int 0x80 | | 0x8049015 <_start+21> int 0x800x4 | | > 0x8049017 <print_hex_message> mov eax,0x4 | | 0x8049021 <print_hex_message+10> lea ecx,ds:0x804a00d | | 0x8049027 <print_hex_message+16> mov edx,0xd | | 0x804902c <print_hex_message+21> int 0x80 | | 0x804902e <exit_program> mov eax,0x1 | | 0x8049033 <exit_program+5> mov ebx,0x0 | | 0x8049038 <exit_program+10> int 0x80 | | 0x804903a add BYTE PTR [eax],al | | 0x804903c add BYTE PTR [eax],al | | 0x804903e add BYTE PTR [eax],al | | 0x8049040 add BYTE PTR [eax],al | | 0x8049042 add BYTE PTR [eax],al | | 0x8049044 add BYTE PTR [eax],al | | 0x8049046 add BYTE PTR [eax],al | |------------------------------------------------------------------------------------------------------------------------------------------------------------------------| native process 96349 In: _start L?? PC: 0x8049015 0xf7ffc31c - 0xf7print_hex_messagec in system-supplied DSO at 0xf7ffc000 7 0xf7ffc3b8 - 0xf7ffc40c is .note in system-supplied DSO at 0xf7ffc000 0xf7ffc40c - 0xf7ffc430 is .eh_frame_hdr in system-supplied DSO at 0xf7ffc000 0xf7ffc430 - 0xf7ffc53c is .eh_frame in system-supplied DSO at 0xf7ffc000 0xf7ffc540 - 0xf7ffd262 is .text in system-supplied DSO at 0xf7ffc000 --Type for more, q to quit, c to continue without paging-- 0xf7ffd262 - 0xf7ffd2c2 is .altinstructions in system-supplied DSO at 0xf7ffc000 0xf7ffd2c2 - 0xf7ffd2e2 is .altinstr_replacement in system-supplied DSO at 0xf7ffc000 (gdb) x /26xb 0x0804a000 0x804a000: 0x48 0x65 0x6c 0x6c 0x6f 0x20 0x57 0x6f 0x804a008: 0x72 0x6c 0x64 0x21 0x0a 0x48 0x65 0x6c 0x804a010: 0x6c 0x6f 0x20 0x57 0x6f 0x72 0x6c 0x64 0x804a018: 0x21 0x0a (gdb) x /26cb 0x0804a000 0x804a000: 72 'H' 101 'e' 108 'l' 108 'l' 111 'o' 32 ' ' 87 'W' 111 'o' 0x804a008: 114 'r' 108 'l' 100 'd' 33 '!' 10 '\n' 72 'H' 101 'e' 108 'l' 0x804a010: 108 'l' 111 'o' 32 ' ' 87 'W' 111 'o' 114 'r' 108 'l' 100 'd' 0x804a018: 33 '!' 10 '\n' (gdb) si 0x08049015 in _start () (gdb) si Hello World! 0x08049017 in print_hex_message () (gdb)

Our interrupt 0x80 instruction was reached, and it invoked the write syscall, passing the parameters we set in the ebx, ecx, and edx registers. ebx was set to 0x1, for stdout, so our program wrote "Hello World!\n" to the terminal. Note, this output may be injected into the gdb command frame, which can corrupt the dislay output. Enter ctrl + l (Lower case L) to redraw the screen and fix this. In my example, it appears that there are duplicate 0x8049015 instruction lines and a phantom 0x800x4 interrupt instruction. Re-drawing the output corrects this.

GDB shows that we have reached the print_hex_message label which will execute the same steps as before to invoke a write syscall.
We can continue the program to completion by entering:

continue

Our gdb command window should display:

(gdb) continue Continuing. [Inferior 1 (process 96349) exited normally] (gdb)

This indicates that our program has completed without error.

We can exit gdb by entering:

quit

Hello World Exercises

Exercise 1.

The GNU assembler can embed debugging symbols into object files. This can facilitate debugging your programs, and allows you to step through your source code when debugging.

Use the following commands to re-assemble and re-link the hello_x86 program:

x86_64-linux-gnu-as --32 -g -o hello_x86.o hello_x86.asm x86_64-linux-gnu-ld -m elf_i386 -o hello_x86 hello_x86.o

Note: the -g switch enables gdb debugging symbols.
Open the executable with gdb and step through the source.

Exercise 2.

Examine a disassembly of the hello_x86.o object file by entering:

x86_64-linux-gnu-objdump -d -M intel hello_x86.o

What happened to the print_message label? Why doesn't it appear?
Why is there no address in the lea ecx instruction?

Exercise 3.

Examine the binary file's header info with:

readelf -h hello_x86

What are the magic byte(s)?
What are the other flags in the ELF header?

Exercise 4.

Load the hello_x86 executable in gdb and examine the machine code with the following instructions:

(gdb) info file Symbols from "/home/pete/Documents/ASM/hello_world/x86/hello_x86". Local exec file: `/home/pete/Documents/ASM/hello_world/x86/hello_x86', file type elf32-i386. Entry point: 0x8049000 0x08049000 - 0x0804903a is .text 0x0804a000 - 0x0804a01a is .data (gdb) set $code_start = 0x08049000 (gdb) set $code_end = 0x0804903a (gdb) print ($code_end - $code_start) $1 = 58 (gdb) x /58xb $code_start 0x8049000 <_start>: 0xb8 0x04 0x00 0x00 0x00 0xbb 0x01 0x00 0x8049008 <_start+8>: 0x00 0x00 0x8d 0x0d 0x00 0xa0 0x04 0x08 0x8049010 <_start+16>: 0xba 0x0d 0x00 0x00 0x00 0xcd 0x80 0xb8 0x8049018 <print_hex_message+1>: 0x04 0x00 0x00 0x00 0xbb 0x01 0x00 0x00 0x8049020 <print_hex_message+9>: 0x00 0x8d 0x0d 0x0d 0xa0 0x04 0x08 0xba 0x8049028 <print_hex_message+17>: 0x0d 0x00 0x00 0x00 0xcd 0x80 0xb8 0x01 0x8049030 <exit_program+2>: 0x00 0x00 0x00 0xbb 0x00 0x00 0x00 0x00 0x8049038 <exit_program+10>: 0xcd 0x80 (gdb)

How many machine code instructions can you recognize?

Challenge Exercise:

Demonstrate your mastery of this section by re-writing the Hello World binary entirely in machine code. Write and execute the program without the use of an assembler or linker.

ARM Assembly Introduction

Now that you have learned some assembly fundamentals, it is time examine a different architecture.
ARM processors are an extremely popular choice for devices such as smartphones, tablets, TVs, routers, IoT systems, and other embedded devices. In this section we will examine the basic ARM 32-bit architecture, write a Hello World program, cross-assemble it, link it, run it by emulating an ARM processor on our x86_64 machine, and debug it with GDB.

Tool Installation

For this section you will need to install gcc for arm, gdb for multiple architectures, and qemu user tools for emulation
To install the necessary packages enter:

sudo apt install gcc-arm-linux-gnueabihf gdb-multiarch qemu-user

ARM Registers

While we haven't examined the x86_64 architecture yet, you will discover that the 32-bit implementation of ARM is much more similar to its 64-bit counterpart than the x86/i386 architecture is to its 64-bit x86_64/AMD64 counterpart. This is because ARM has kept much more parity developing the 32-bit and 64-bit implementations of its architecture. We will first be examining the common 32-bit registers used by ARMv7 or ARMv8 (when operating in 32-bit mode). As with Intel for the x86 processor, extensive documentation for the ARM architecture is available here.
It should also be noted that ARM devides its processors in to 3 profiles:

a - Application profile, used for general purpose computing
m - Microcontroller profile, used for small low-power applications such as sensors
r - Real-time profile, used in applications that require predictable and consisting timing with processor results

ARM uses a version number to refer to the major revision of the architecture and instruction sets, such as v7, v8, v9, etc. Different versions may support either 32-bit or 64-bit operations, or both.

32-bit ARMv7 registers can be broken down as follows:

General purpose registers Special Function Registers Program Status Registers (Similar to the x86 eflag register) Floating point registers r0 r13 sp (stack pointer) (equivalent to x86 esp) cpsr (current program status register) 32-bit (float) r1 r14 lr (link register) (equivalent to x86 ebp) spsr (saved program status register) s0 to s31 r2 r15 pc (program counter) (equivalent to x86 eip) r3 64-bit (double) r4 d0 to d15 r5 r6 r7 r8 r9 r10 r11 r12

The flags for the cpsr are shown below:

bit 0x1F 1E 1D 1C 1B 1A 19 18 17 16 15 14 13 12 11 10 0F 0E 0D 0C 0B 0A 09 08 07 06 05 04 03 02 01 00 flag N Z C V Q 00 00 00 SSBS PAN DIT 00 | GE | 00 00 00 00 00 00 E A I F T 00 | M |

Below are the cpsr flag functions:

N: (Negative) flag, indicates whether the result of the last operation was negative (1) or positive (0)
Z: (Zero) flag, indicates whether the result of the last operation was zero (1) or not zero (0)
C: (Carry) flag, indicates whether there was a carry (1) or not (0) during the last arithmetic operation
V: (Overflow) flag, indicates whether an overflow occurred (1) or not (0) during the last arithmetic operation
Q (Saturation) flag, indicates whether saturation occurred (1) or not (0) during the last operation
SSBS (Speculative Store Bypass Safe) flag, indicates wether speculative loading of data is permitted (1) or not (0)
PAN (Privileged Access Never) flag, indicates wether privileged instructions can be executed in User mode (1) or not (0)
DIT (Data Independent Timing) flag, indicates if wether the processor can (0) execute instructions with timing independent timing of data processing or not (1)
GE (Greater than or equal), indicate the results of signed comparisons between operands
IT (If-Then) flags, indicate the execution state of the If-Then instruction
J (Jazelle) flag, indicates whether the processor is executing in Jazelle (Java support) mode (1) or not (0)
E (Endianness) flag, indicates the endianness of the processor, either little-endian (0) or big-endian (1)
A (Auxiliary carry), indicates whether there was a carry (1) or not (0) between the low nibble and high nibble during an 8-bit operation
I (Interrupt) flag, indicates whether maskable (optional) hardware interrupts should be processed (1) or not (0)
F (Fast Interrupt), indicates whether fast interrupt exceptions should be processed (1) or not (0)
T (Thumb) flag, indicates the execution state of the processor, either Thumb (1) or ARM (0)
M (Processor mode) flags, indicate the current processor mode, such as User, System, FIQ, IRQ, Supervisor, Abort, Undefined, or Monitor

While there are unique flags stored by ARM32 in the cpsr register, six of them are the same as in the x86 eflag:

Flag in ARM Flag in x86 Flag N SF Negative Z ZF Zero C CF Carry V OF Overflow A AF Auxiliary I IF Interrupt

The spsr is used to save the state of the cpsr registers when the processor changes privilege modes. This frees the cpsr to load flags for the current state and allows the previous state to be restored later.

ARM Hello World

We are now ready to write a hello world program for ARM. We will build upon what we have already learned from our x86 hello world, and note the differences for GNU ARM assembly.

We will be using Linux syscall table references again. This time for ARM 32-bit.

.section .rodata /* The .rodata section will be stored as read-only in memory. This section is included by GAS in the overall .data section, but it is flagged as read-only */ b_STDOUT = 0x01 /* This defines b_STDOUT as a byte sized constant with a value of 0x01 */ b_WRITE = 0x04 /* This defines b_WRITE as a byte sized contant with avalue of 0x04 */ .section .data hello_msg: .ascii "Hello World!\n" end_hello_msg: len_hello_msg = (end_hello_msg - hello_msg) /* This declares a variable len_hello_msg and assigns it the difference between the end_hello_msg label address and the hello_msg label address. Parenthesis are not necessary in this instance, len_hello_msg = end_hello_msg - hello_msg would evaluate the same */ unused_label: .hword 0xbeef /* This label is here to illustrate how GAS stores label addresses for ARM assembly that are not assigned during the program execution, vs. those that are. Note: The size of a word depends on the processor architecture, for ARM32 a word is 32 bits (4 bytes), so to store 2 bytes of data, we use the half-word (.hword) directive */ .section .text .global _start _start: /* Write "At start" and "Hello World!" to stdout Write syscall reference: r7 r0 (arg0) r1(arg1) r2(arg2) 0x04 unsigned int fd const char *buf size_t count */ print_start_msg: ldr r7, =b_WRITE /* The load register (ldr) instruction is similar to the lea instruction for the x86 processor in that, it loads a calculated memory address or immediate value into a register. Like the eax register for x86, r7 is used to determine the syscall function for ARM Using the = character with ldr is an ARM specific pseudo-instruction that specifies a symbol name which represents a constant value or an address. The assembler will determine the type of value and modify the instruction to either load a relative memory address or an immediate value. For this instruction, ldr will load an immediate value into r7 because b_STDOUT is a constant and not the label for a memory address. For more information on this instruction, refer to this reference: https://developer.arm.com/documentation/dui0041/c/Babbfdih */ ldr r0, =b_STDOUT /* Another constant value loaded for the FD */ adr r1, start_msg /* Address (adr) loads the address of a label into a register. The major functional difference between ldr and adr is that adr can only reference memory locations inside the .text section of code, while ldr can resolve addresses and values from any section. While both ldr and adr could load addresses from labels in the .text section, adr is more efficient for this specific task and should be used for that purpose */ ldr r2, =len_start_msg /* This will resolve to the value of len_start_msg, and load it into r2 */ svc #00000000 /* When writing ARM assembly for GAS, the # character is use to prefix an immediate value assignment This SuperVisor (svc) call is similar to the int 0x80 call for the x86. It will initiate the execution of the syscall by calling a system interrupt. svc creates an exception and passes the immediate value to the exception handler. In earlier versions of ARM svc was called swi (SoftWare Interrupt), but they effectively the same */ write_hello_msg: ldr r7, =b_WRITE ldr r0, =b_STDOUT ldr r1, =hello_msg /* For this instruction, hello_msg is a label for a memory address located in the .data section. Using the ldr pseudo-operation, the assembler will create an immediate values at the end the .text section to store the label address in. It will then reference the memory location for that immediate value and assign it to r1. It uses the Program Counter (PC) register as a base address and offsets from PC to the address. This is essentially what we did with the adr instruction, except the assembly is copying the address for the label in .data and placing it in the .text section to assign. */ ldr r2, =len_hello_msg svc #0x00000000 exit_normally: /* exit syscall reference: r7 r0 (arg0) 0x01 int error_code */ mov r7, #0x00000001 /* Like with x86 assembly, mov can be used to load an immediate value into a register */ mov r0, #0x00000000 svc #0x00000000 /* The following section of code was added to show how you can also place variables and labels for data in the .text section after your code. They must be placed after your code, because they are not executable instructions. They should never be reached by your program's noraml execution or it will crash. */ start_msg: .ascii "At start\n" len_start_msg = . - start_msg /* In GAS, the . character is used to reference the current position in memory, so instead of creating the label end_start_msg and writing "len_start_msg = (end_start_msg - start_msg)" we can just write this as shorthand. */

Assemble Link and Run

Once you have copied the code and have thoroughly read through the comments, it is time to make it executable. GNU provides a cross-assembler for the ARM instruction set which is included in the gcc-arm-linux-gnueabihf package. To assemble the source, navigate to the directory where the source is saved and enter the command:

arm-linux-gnueabihf-as -o hello_arm32.o hello_arm32.asm

This command assumes that you saved the source file as hello_arm32.asm. It directs GAS to create a 32-bit object file named hello_arm32.o and to use the hello_arm32.asm file as an input source.
Enter the command:

arm-linux-gnueabihf-ld -o hello_arm32 hello_arm32.o

Now that the executable is created, we will use Quick Emulator (QEMU) to execute it natively on our x86 system. Enter the command:

qemu-arm hello_arm32

It should produce the output:

At start Hello World!

Debugging in GDB

QEMU has an option that allows GDB to connect to it over a network socket. To run our program in QEMU as a GDB server enter:

qemu-arm -g 2345 hello_arm32 &

This will launch our program in the background with QEMU and bind to port 2345. Port 2345 is an arbitrary and can be changed to whatever you want to bind to.

Once QEMU is running, we will launch GDB for multiarchitectures, open our binary as a template, and then connect to the running process in QEMU. To do so enter:

$gdb-multiarch (gdb) file hello_arm32 (gdb) target remote localhost:2345

You should see an output similar to the following:

pete@XPS:~/Documents/ASM/hello_world/ARM32$ gdb-multiarch GNU gdb (Ubuntu 12.1-0ubuntu1~22.04) 12.1 Copyright (C) 2022 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: . Find the GDB manual and other documentation resources online at: . For help, type "help". Type "apropos word" to search for commands related to "word". (gdb) file hello_arm32 Reading symbols from hello_arm32... (No debugging symbols found in hello_arm32) (gdb) target remote localhost:2345 Remote debugging using localhost:2345 0x00010074 in _start () (gdb)

Notice that we do not need to set a break point and run the program, because QEMU has already set a break at the _start label and executed it.

We can now open our layouts with:

lay asm lay reg

You should now have the familiar layout of registers, assembly, and commands.
Note, there will be no register values loaded as we haven't stepped into an instruction yet.
Let's examine our first instruction:

| > 0x10074 <_start> mov r7, #4

The assembly source was ldr r7, =b_WRITE, but because b_WRITE was a constant value, the assembler translated this to just moving its immediate value into the register.
The next instruction is the same as the first, so let's step into our instructions until we reach the third line:

|-Register group: general------------------------------------------------------------------------------------------------------------------------------------------------| |r0 0x1 1 r1 0x40800b39 1082133305 r2 0x0 0 | |r3 0x0 0 r4 0x0 0 r5 0x0 0 | |r6 0x0 0 r7 0x4 4 r8 0x0 0 | |r9 0x0 0 r10 0x200bc 131260 r11 0x0 0 | |r12 0x0 0 sp 0x408009d0 0x408009d0 lr 0x0 0 | |pc 0x1007c 0x1007c <_start+8> cpsr 0x10 16 fpscr 0x0 0 | |fpsid 0x410430f0 1090793712 fpexc 0x40000000 1073741824 AFSR0_EL1 0x0 0 | |AFSR1_EL1 0x0 0 DBGDIDR 0x3515f021 890630177 DBGDSAR 0x0 0 | |DBGBVR 0x0 0 DBGBCR 0x0 0 DBGWVR 0x0 0 | |DBGWCR 0x0 0 PAR 0x0 0 DBGBVR 0x0 0 | |DBGBCR 0x0 0 DBGWVR 0x0 0 DBGWCR 0x0 0 | |TEECR 0x0 0 MIDR_EL1 0x412fc0f1 1093648625 CTR 0x8444c004 -2075869180 | |TCMTR 0x0 0 TTBR0_EL1 0x0 0 PMCCNTR 0x0 0 | |TLBTR 0x0 0 TTBR1_EL1 0x0 0 MIDR 0x412fc0f1 1093648625 | |TTBCR 0x0 0 MPIDR_EL1 0x80000000 -2147483648 TTBCR2 0x0 0 | |REVIDR_EL1 0x0 0 MIDR 0x412fc0f1 1093648625 JIDR 0x0 0 | |CLIDR 0xa200023 169869347 DFAR 0x0 0 WFAR 0x0 0 | |IFAR 0x0 0 JMCR 0x0 0 AIDR 0x0 0 | |CSSELR 0x0 0 ID_PFR2 0x10 16 VBAR 0x0 0 | |------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | 0x10074 <_start> mov r7, #4 | | 0x10078 <_start+4> mov r0, #1 | | > 0x1007c <_start+8> add r1, pc, #36 ; 0x24 | | 0x10080 <_start+12> ldr r2, [pc, #44] ; 0x100b4 <start_msg+12> | | 0x10084 <_start+16> svc 0x00000000 | | 0x10088 <write_hello_msg> mov r7, #4 | | 0x1008c <write_hello_msg+4> ldr r1, [pc, #36] ; 0x100b8 <start_msg+16> | | 0x10090 <write_hello_msg+8> mov r0, #1 | | 0x10094 <write_hello_msg+12> mov r2, #13 | | 0x10098 <write_hello_msg+16> svc 0x00000000 | | 0x1009c <exit_normally> mov r7, #1 | | 0x100a0 <exit_normally+4> mov r0, #0 | | 0x100a4 <exit_normally+8> svc 0x00000000 | | 0x100a8 <start_msg> ; <UNDEFINED> instruction: 0x73207441 | | 0x100ac <start_msg+4> ldrbtvc r6, [r2], #-372 ; 0xfffffe8c | | 0x100b0 <start_msg+8> andeq r0, r0, r10 | | 0x100b4 <start_msg+12> andeq r0, r0, r9 | | 0x100b8 <start_msg+16> strheq r0, [r2], -r12 | | 0x100bc cfstr64vs mvdx6, [r12], #-288 ; 0xfffffee0 | | 0x100c0 svcvs 0x0057206f | |------------------------------------------------------------------------------------------------------------------------------------------------------------------------| remote Thread 1.383292 In: _start L?? PC: 0x1007c (gdb) lay reg (gdb) si 0x00010078 in _start () (gdb) si 0x0001007c in _start () (gdb)

Our instruction:
adr r1, start_msg
has been translated to:
add r1, pc, #36

The add instruction takes the destination register to store the result, and the two arguments to add. In this instance, the immediate value #36 is being added to the pc (program counter) register's value.
This is where things can be confusing. While GDB lists the pc register currently as 0x1007c, ARM's pc register actually stays two instructions ahead of the program, and since these are 32-bit instructions, the value of the pc register will actually be 8 bytes more than our current line (32 bits * 2 = 64 bits = 8 bytes).

While you would expect add r1,pc, #36 to store, 0x100a0 in the register, if we step forward one instruction:

r1 0x100a8 65704

We see that 0x100a8 is in fact stored in r1.

If we look further down our assembly layout, we can see that it is our start_msg label:

0x100a8 <start_msg> ; <UNDEFINED> instruction: 0x73207441

Notice that the disassembler is attempting to interpret the data as instructions, this is because it resides in the .text section with our code, but it does not contain valid assembly instructions.

Now let's examine our ldr instruction:

> 0x10080 <_start+12> ldr r2, [pc, #44]

Our original instruction was:
ldr r2, =len_start_msg
This was resolved by the assembler to:
ldr r2, [pc, #44]

len_start_msg is a variable symbol, so =len_start_msg will evaluate to loading the value for that symbol.
pc, #44 takes the current pc register value and adds 44 to it.
[pc, #44] evaluates the data at that address and loads it to the destination register r2.
We know that the pc value will be two instructions ahead, so if we add 0x10080 + 8 + 44:

(gdb) print/x (0x10080 + 8 + 44) $2 = 0x100b4

And we know that the length of "At start\n" should be 9 bytes, so 0x09 should be stored at 0x100b4:

(gdb) x /1xb $2 0x100b4 : 0x09

And we can see that 0x09 is indeed stored at that location.
Notice when we printed our address calculation, GDB automatically stored it in the variable $2 to allow for easy referencing.

Let's step forward in our program to the next ldr instruction:

0x1008c ldr r1, [pc, #36] ; 0x100b8

For this instruction, the assembly is loading the value [] stored at the offset of the pc register + 36 bytes.
This should evaluate to the value stored at:

(gdb) print /x (0x1008c + 8 + 36) $3 = 0x100b8

What is stored at 0x100b8 ?

(gdb) x /4xb $3 0x100b8 : 0xbc 0x00 0x02 0x00

This is the memory addres 0x000200bc in little endian.
Where is this address?

(gdb) info file Symbols from "/home/pete/Documents/ASM/hello_world/ARM32/hello_arm32". Remote target using gdb-specific protocol: `/home/pete/Documents/ASM/hello_world/ARM32/hello_arm32', file type elf32-littlearm. Entry point: 0x10074 0x00010074 - 0x000100bc is .text 0x000200bc - 0x000200cb is .data While running this, GDB does not access memory from... Local exec file: `/home/pete/Documents/ASM/hello_world/ARM32/hello_arm32', file type elf32-littlearm. Entry point: 0x10074 0x00010074 - 0x000100bc is .text 0x000200bc - 0x000200cb is .data (gdb)

We can see it is in our data section:

(gdb) x /13cb 0x000200bc 0x200bc: 72 'H' 101 'e' 108 'l' 108 'l' 111 'o' 32 ' ' 87 'W' 111 'o' 0x200c4: 114 'r' 108 'l' 100 'd' 33 '!' 10 '\n'

And there is our Hello World! message.

The assembler retrieved the address of our hello_msg label from the .data section,
then it appended that address value to the end of our .text section of code,
then it loaded that address into the r1 register by offsetting from the pc register
to the memory address in the .text section that contained the memory address for the actual data.

ARM Loops and Stack Intro

The Stack

The stack is a special area of RAM that is reserved for a program by the Operating System. It is used primarily as memory that the program can use to organize and store local variables and function arguments which require more space than can be stored in available CPU registers.

The maximum stack size for a program is determined by the operating system. In Linux, the default maximum stack size in Kb can be output with:

ulimit -s 8192

The stack limit above is 8Mb.

We will examine the stack in greater details in future sections, but for now understand these characteristics:

The stack is a linear data structure that follows a Last-In, First-Out (LIFO) principle
The last element added is always the first to be removed
New data can be "pushed" onto the stack or "popped" off the stack
The stack "grows down" in memory, which can be confusing because the "top" of the stack will always have the lowest memory address
The sp register stores the memory address for the top of the stack

Loops

A loop is a simple logical construct which repeatedly executes instructions until a condition is met.
To demonstrate this functionality, we will write a program which will execute a block of code 10 times.
The code will print the counter for the loop, showing what iteration it is on, and will utilize the stack to facilitate this:

.section .rodata /* Linux Syscall constants */ b_STDOUT = 0x01 b_WRITE = 0x04 /* Offset to convert a value to a single digit ASCII character decimal */ b_ASCII_OFFSET = 0x30 .section .data begin_msg: .ascii "Starting while loop:\n" len_begin_msg = ( . - begin_msg) end_msg: .ascii "Loop ended.\n" len_end_msg = ( . - end_msg) .section .text .global _start _start: print_begin_msg: ldr r7, =b_WRITE ldr r0, =b_STDOUT ldr r1, =begin_msg ldr r2, =len_begin_msg svc #0 mov r3, #0x0 /* This sets r3 to 0 to prepare it to use as our counter for the loop. Use of r3 is arbitrary, any GP register will do, but r3 is the next register not used by the write syscall, which will be used in the loop */ begin_while: print_counter: ldr r7, =b_WRITE ldr r0, =b_STDOUT ldr r1, =b_ASCII_OFFSET /* Start with a value of 0x30 */ add r1, r1, r3 /* Add our counter value to 0x30 to get the ASCII decimal number for the counter 0x30 is the hex value for the decimal ASCII 0, 0x31 is 1 etc. */ orr r1, r1, #0x0a00 /* The orr instruction performs a logical or between two registers or immediate values. This effectively combines the 0x0a value for an ASCII newline character with our original value for the ASCII value of the loop counter and stores both in r1 */ push {r1} /* The push instruction will store the values in a list of registers in the memory stack. The values will be placed on the stack in order of the register numbers, so the lowest number register will by at the top of the stack and the highest number at the bottom. This command can be written in several different forms: sub sp, sp, #4 str r1, [sp] This subtracts 4 bytes from the stack pointer address, then it stores (str) the value of r1 at the stack pointer address str r1, [sp, #-4]! This executes the same thing in a single instruction: it stores r1 at the stack pointer address minus 4 bytes, then the ! character decrements the stack pointer */ mov r1, sp /* The stack pointer stores the memory address of the last data placed on the stack The last dat placed on the stack was the value stored in r1, which contains our two ASCII character codes. This instruction will store the memory address to that location in the stack in r1. This is necessary, because the write syscall takes a memory address as an argument for a string to write, not the actual value. */ mov r2, #0x2 /* We will set arg3 for the syscall to 2 bytes, because we will print both the number character and the newline character. */ svc #0 add sp, sp, #0x4 /* This will move the stack pointer back up to its original position. This will allow us to overwrite the previous characters every time the loop runs. If we did not include this instruction, the stack would continue to grow every time the loop ran. */ cmp r3, #0x9 /* This instruction performs a subtraction operation between the value in register r3 and the immediate value 9. The compare instruction (cmp) disgards the results of the subtraction operation, but it updates the zero (Z) and negative (N) flags in the cpsr appropriately: If the values are equal, the zero flag will be set to 1. If the result of the subtraction is negative, then the negative flag is set to 1. The carry (C) and overflow (V) flags are also set based on the result. This instruction is the same as writing: subs r0, r3, #0x09 The subtract and set flags (subs) instruction performs the same operation as cmp, except it has the option of storing the result in a register. Even though r0 can be used to store the result in the example above, by convention this indicates that the value should be disgarded. For a simple comparison, this isn't useful, but if we wanted to compare values and store the result in r1, we could write: subs r1, r3, #0x09 */ bge end_while /* The branch greater or equal to (bge) instruction checks the values of the cpsr flags If the zero flag is (1), it means the that the comparison was equal and it branches to the end_while label by setting the pc to the end_while label's memory address. If the negative flag is not set, that means that r3 was greater than #0x9, so the program execution will also move to the end_while label. */ add r3, r3, #0x01 /* Increment our counter by 1 */ b begin_while /* branch (b) is an unconditional branch instruction, this will always change the pc to the address of the begin_while label and continue execution. */ end_while: print_end_msg: ldr r7, =b_WRITE ldr r0, =b_STDOUT ldr r1, =end_msg ldr r2, =len_end_msg svc #00000000 exit_normally: mov r7, #0x00000001 mov r0, #0x00000000 svc #0x00000000

After reading through the source code and studying the comments, assemble and link it. Run it in qemu so that we can debug it with gdb.

ARM Debugging Loops

Once we are attached to our remote program in gdb, we will open our assembly and register layouts as before.
We are familiar with the write syscall already, so we will advance forward to the start of our loop.
Enter:

advance print_counter

Your output should look similar to this:

|-Register group: general------------------------------------------------------------------------------------------------------------------------------------------------| |r0 0x15 21 r1 0x200ec 131308 | |r2 0x15 21 r3 0x0 0 | |r4 0x0 0 r5 0x0 0 | |r6 0x0 0 r7 0x4 4 | |r8 0x0 0 r9 0x0 0 | |r10 0x200ec 131308 r11 0x0 0 | |r12 0x0 0 sp 0x408009c0 0x408009c0 | |lr 0x0 0 pc 0x1008c 0x1008c <print_counter> | |cpsr 0x10 16 fpscr 0x0 0 | |fpsid 0x410430f0 1090793712 fpexc 0x40000000 1073741824 | |AFSR0_EL1 0x0 0 AFSR1_EL1 0x0 0 | |DBGDIDR 0x3515f021 890630177 DBGDSAR 0x0 0 | |DBGBVR 0x0 0 DBGBCR 0x0 0 | |DBGWVR 0x0 0 DBGWCR 0x0 0 | |PAR 0x0 0 DBGBVR 0x0 0 | |DBGBCR 0x0 0 DBGWVR 0x0 0 | |DBGWCR 0x0 0 TEECR 0x0 0 | |MIDR_EL1 0x412fc0f1 1093648625 CTR 0x8444c004 -2075869180 | |TCMTR 0x0 0 TTBR0_EL1 0x0 0 | |------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | 0x10074 <_start> mov r7, #4 | | 0x10078 <_start+4> mov r0, #1 | | 0x1007c <_start+8> ldr r1, [pc, #96] ; 0x100e4 <exit_normally+12> | | 0x10080 <_start+12> mov r2, #21 | | 0x10084 <_start+16> svc 0x00000000 | | 0x10088 <_start+20> mov r3, #0 | | > 0x1008c <print_counter> mov r7, #4 | | 0x10090 <print_counter+4> mov r0, #1 | | 0x10094 <print_counter+8> mov r1, #48 ; 0x30 | | 0x10098 <print_counter+12> add r1, r1, r3 | | 0x1009c <print_counter+16> orr r1, r1, #2560 ; 0xa00 | | 0x100a0 <print_counter+20> push {r1} ; (str r1, [sp, #-4]!) | | 0x100a4 <print_counter+24> mov r1, sp | | 0x100a8 <print_counter+28> mov r2, #2 | | 0x100ac <print_counter+32> svc 0x00000000 | | 0x100b0 <print_counter+36> add sp, sp, #4 | | 0x100b4 <print_counter+40> subs r0, r3, #9 | | 0x100b8 <print_counter+44> bge 0x100c4 <print_end_msg> | | 0x100bc <print_counter+48> add r3, r3, #1 | | 0x100c0 <print_counter+52> b 0x1008c <print_counter> | |------------------------------------------------------------------------------------------------------------------------------------------------------------------------| remote Thread 1.30460 In: print_counter L?? PC: 0x1008c (gdb) lay reg (gdb) advance print_counter 0x0001008c in print_counter () (gdb)

Let's step into our instructions from here and examine the orr instruction at 0x1009c:

|-Register group: general------------------------------------------------------------------------------------------------------------------------------------------------| |r0 0x1 1 r1 0x30 48 | |r2 0x15 21 r3 0x0 0 | |r4 0x0 0 r5 0x0 0 | |r6 0x0 0 r7 0x4 4 | |r8 0x0 0 r9 0x0 0 | |r10 0x200ec 131308 r11 0x0 0 | |r12 0x0 0 sp 0x408009c0 0x408009c0 | |lr 0x0 0 pc 0x1009c 0x1009c <print_counter+16> | |cpsr 0x10 16 fpscr 0x0 0 | |fpsid 0x410430f0 1090793712 fpexc 0x40000000 1073741824 | |AFSR0_EL1 0x0 0 AFSR1_EL1 0x0 0 | |DBGDIDR 0x3515f021 890630177 DBGDSAR 0x0 0 | |DBGBVR 0x0 0 DBGBCR 0x0 0 | |DBGWVR 0x0 0 DBGWCR 0x0 0 | |PAR 0x0 0 DBGBVR 0x0 0 | |DBGBCR 0x0 0 DBGWVR 0x0 0 | |DBGWCR 0x0 0 TEECR 0x0 0 | |MIDR_EL1 0x412fc0f1 1093648625 CTR 0x8444c004 -2075869180 | |TCMTR 0x0 0 TTBR0_EL1 0x0 0 | |------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | 0x10074 <_start> mov r7, #4 | | 0x10078 <_start+4> mov r0, #1 | | 0x1007c <_start+8> ldr r1, [pc, #96] ; 0x100e4 <exit_normally+12> | | 0x10080 <_start+12> mov r2, #21 | | 0x10084 <_start+16> svc 0x00000000 | | 0x10088 <_start+20> mov r3, #0 | | 0x1008c <print_counter> mov r7, #4 | | 0x10090 <print_counter+4> mov r0, #1 | | 0x10094 <print_counter+8> mov r1, #48 ; 0x30 | | 0x10098 <print_counter+12> add r1, r1, r3 | | > 0x1009c <print_counter+16> orr r1, r1, #2560 ; 0xa00 |

As we step through the instruction we can see that the value of r1 changes from 0x30 to 0x0a30.
It now contains the value for ASCII "0\n"

r1 0xa30 2608

Now lets step to the push instruction:

|-Register group: general------------------------------------------------------------------------------------------------------------------------------------------------| |r0 0x1 1 r1 0xa30 2608 r2 0x15 21 | |r3 0x0 0 r4 0x0 0 r5 0x0 0 | |r6 0x0 0 r7 0x4 4 r8 0x0 0 | |r9 0x0 0 r10 0x200ec 131308 r11 0x0 0 | |r12 0x0 0 sp 0x408009c0 0x408009c0 lr 0x0 0 | |pc 0x100a0 0x100a0 <print_count cpsr 0x10 16 fpscr 0x0 0 | |fpsid 0x410430f0 1090793712 fpexc 0x40000000 1073741824 AFSR0_EL1 0x0 0 | |AFSR1_EL1 0x0 0 DBGDIDR 0x3515f021 890630177 DBGDSAR 0x0 0 | |DBGBVR 0x0 0 DBGBCR 0x0 0 DBGWVR 0x0 0 | |DBGWCR 0x0 0 PAR 0x0 0 DBGBVR 0x0 0 | |DBGBCR 0x0 0 DBGWVR 0x0 0 DBGWCR 0x0 0 | |TEECR 0x0 0 MIDR_EL1 0x412fc0f1 1093648625 CTR 0x8444c004 -2075869180 | |TCMTR 0x0 0 TTBR0_EL1 0x0 0 PMCCNTR 0x0 0 | |TLBTR 0x0 0 TTBR1_EL1 0x0 0 MIDR 0x412fc0f1 1093648625 | |TTBCR 0x0 0 MPIDR_EL1 0x80000000 -2147483648 TTBCR2 0x0 0 | |REVIDR_EL1 0x0 0 MIDR 0x412fc0f1 1093648625 JIDR 0x0 0 | |CLIDR 0xa200023 169869347 DFAR 0x0 0 WFAR 0x0 0 | |IFAR 0x0 0 JMCR 0x0 0 AIDR 0x0 0 | |CSSELR 0x0 0 ID_PFR2 0x10 16 VBAR 0x0 0 | |------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | 0x10074 <_start> mov r7, #4 | | 0x10078 <_start+4> mov r0, #1 | | 0x1007c <_start+8> ldr r1, [pc, #96] ; 0x100e4 <exit_normally+12> | | 0x10080 <_start+12> mov r2, #21 | | 0x10084 <_start+16> svc 0x00000000 | | 0x10088 <_start+20> mov r3, #0 | | 0x1008c <print_counter> mov r7, #4 | | 0x10090 <print_counter+4> mov r0, #1 | | 0x10094 <print_counter+8> mov r1, #48 ; 0x30 | | 0x10098 <print_counter+12> add r1, r1, r3 | | 0x1009c <print_counter+16> orr r1, r1, #2560 ; 0xa00 | | > 0x100a0 <print_counter+20> push {r1} ; (str r1, [sp, #-4]!) |

Notice the value of the sp register before the push:

sp 0x408009c0

Now after the push:

sp 0x408009bc

This is 4 bytes lower than the previous address. Lets examine the data at that address:

(gdb) x /4xb 0x408009bc 0x408009bc: 0x30 0x0a 0x00 0x00

We can see the value of r1 is now at that address.
The instruction mov r1, sp will store that address in r1 to pass to the write syscall.

Let's now examine the cmp instruction at 0x100b4:

> 0x100b4 cmp r3, #9

(gdb) info registers cpsr cpsr 0x10 16 (gdb) si 0x000100b8 in print_counter () (gdb) info registers cpsr cpsr 0x80000010 -2147483632

As we step through the instruction, we can see the value of the flags in cpsr change.

What flags are now set? We could print the value in binary with:

(gdb) print/t 0x80000010 $1 = 10000000000000000000000000010000

This is still difficult to read and determine what flags are set.

We know that the Z, N, C, and V flags are set by the cmp instruction, so let's format the output to show those flags.
Enter the following script to show a formatted output for the flags:

printf "N=%d Z=%d C=%d V=%d\n", (($cpsr & (1 << 31)) != 0), (($cpsr & (1 << 30)) != 0), (($cpsr & (1 << 29)) != 0), (($cpsr & (1 << 28)) != 0)

This is a script in C-style code which takes the cpsr register value performs a bitwise and operation on a bit that is shifted left to the position of the corresponding flag bit, if the bit is set, the statement will be non-zero and evaluate true, which will print a 1.

We can see from this script that the negative bit was set by the comparison, because 0 - 9 = -9 which is negative.

N=1 Z=0 C=0 V=0

This would be lengthy to type out every time we want to check those flags, so lets open a text editor and save the script as cpsr_cmp.gdb
We can now run the script inside gdb by entering:

source cpsr_cmp.gdb

This is assuming you placed it in the same path as the current executable.
Otherwise you must use the path to the script.

We ant to iterate through our loop, but we don't want to manually step through every instruction over and over again.
We can automate this process by using another script. First, we will set a break point at the end of our loop with:

(gdb) break *0x100c0 Breakpoint 1 at 0x100c0

The * character lets gdb know that the value is a memory address and not a label name.

Enter:

continue

To skip down to our break point.

Now we can write our script.
Enter:

(gdb) set $count = 0 (gdb) while $count < 8 >source cpsr_cmp.gdb >continue >set $count = $count +1 >end

We just wrote a while loop to debug our while loop.

Your output should be similar to this (note you may have to enter ctrl + l to re-draw your screen):

|-Register group: general------------------------------------------------------------------------------------------------------------------------------------------------| |r0 0x2 2 r1 0x408009bc 1082132924 | |r2 0x2 2 r3 0x9 9 | |r4 0x0 0 r5 0x0 0 | |r6 0x0 0 r7 0x4 4 | |r8 0x0 0 r9 0x0 0 | |r10 0x200ec 131308 r11 0x0 0 | |r12 0x0 0 sp 0x408009c0 0x408009c0 | |lr 0x0 0 pc 0x100c0 0x100c0 <print_counter+52> | |cpsr 0x80000010 -2147483632 fpscr 0x0 0 | |fpsid 0x410430f0 1090793712 fpexc 0x40000000 1073741824 | |AFSR0_EL1 0x0 0 AFSR1_EL1 0x0 0 | |DBGDIDR 0x3515f021 890630177 DBGDSAR 0x0 0 | |DBGBVR 0x0 0 DBGBCR 0x0 0 | |DBGWVR 0x0 0 DBGWCR 0x0 0 | |PAR 0x0 0 DBGBVR 0x0 0 | |DBGBCR 0x0 0 DBGWVR 0x0 0 | |DBGWCR 0x0 0 TEECR 0x0 0 | |MIDR_EL1 0x412fc0f1 1093648625 CTR 0x8444c004 -2075869180 | |TCMTR 0x0 0 TTBR0_EL1 0x0 0 | |------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | 0x10098 <print_counter+12> add r1, r1, r3 | | 0x1009c <print_counter+16> orr r1, r1, #2560 ; 0xa00 | | 0x100a0 <print_counter+20> push {r1} ; (str r1, [sp, #-4]!) | | 0x100a4 <print_counter+24> mov r1, sp | | 0x100a8 <print_counter+28> mov r2, #2 | | 0x100ac <print_counter+32> svc 0x00000000 | | 0x100b0 <print_counter+36> add sp, sp, #4 | | 0x100b4 <print_counter+40> cmp r3, #9 | | 0x100b8 <print_counter+44> bge 0x100c4 <print_end_msg> | | 0x100bc <print_counter+48> add r3, r3, #1 | |B+> 0x100c0 <print_counter+52> b 0x1008c <print_counter> | | 0x100c4 <print_end_msg> mov r7, #4 | | 0x100c8 <print_end_msg+4> mov r0, #1 | | 0x100cc <print_end_msg+8> ldr r1, [pc, #20] ; 0x100e8 <exit_normally+16> | | 0x100d0 <print_end_msg+12> mov r2, #12 | | 0x100d4 <print_end_msg+16> svc 0x00000000 | | 0x100d8 <exit_normally> mov r7, #1 | | 0x100dc <exit_normally+4> mov r0, #0 | | 0x100e0 <exit_normally+8> svc 0x00000000 | | 0x100e4 <exit_normally+12> andeq r0, r2, r12, ror #1 | |------------------------------------------------------------------------------------------------------------------------------------------------------------------------| remote Thread 1.39901 In: print_counter L?? PC: 0x100c0 N=1 Z=0 C=0 V=0 Breakpoint 1, 0x000100c0 in print_counter () N=1 Z=0 C=0 V=0 Breakpoint 1, 0x000100c0 in print_counter () N=1 Z=0 C=0 V=0 Breakpoint 1, 0x000100c0 in print_counter () N=1 Z=0 C=0 V=0 Breakpoint 1, 0x000100c0 in print_counter () N=1 Z=0 C=0 V=0 Breakpoint 1, 0x000100c0 in print_counter () N=1 Z=0 C=0 V=0 Breakpoint 1, 0x000100c0 in print_counter () N=1 Z=0 C=0 V=0 Breakpoint 1, 0x000100c0 in print_counter () (gdb)

Notice we are now on what should be the last iteration of the loop.
Let's advance to the bge instruction with:

advance *0x100b8

Let's look at what flags were set with our cmp instruction:

(gdb) source cpsr_cmp.gdb N=0 Z=1 C=1 V=0

Notice that the zero bit is now set and the negative bit is no longer set.

The zero bit is set because r3 was equal to 9.
The negative bit is not set because the result of 9 - 9 isn't negative.
Both of these condititions should cause our branch condition to be met.

Let's test this branch by setting a break point where we should jump to:

(gdb) break *0x100c4 Breakpoint 2 at 0x100c4

Enter continue to advance the program to the next break:

|------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | 0x1008c <print_counter> mov r7, #4 | | 0x10090 <print_counter+4> mov r0, #1 | | 0x10094 <print_counter+8> mov r1, #48 ; 0x30 | | 0x10098 <print_counter+12> add r1, r1, r3 | | 0x1009c <print_counter+16> orr r1, r1, #2560 ; 0xa00 | | 0x100a0 <print_counter+20> push {r1} ; (str r1, [sp, #-4]!) | | 0x100a4 <print_counter+24> mov r1, sp | | 0x100a8 <print_counter+28> mov r2, #2 | | 0x100ac <print_counter+32> svc 0x00000000 | | 0x100b0 <print_counter+36> add sp, sp, #4 | | 0x100b4 <print_counter+40> cmp r3, #9 | | 0x100b8 <print_counter+44> bge 0x100c4 <print_end_msg> | | 0x100bc <print_counter+48> add r3, r3, #1 | |B+ 0x100c0 <print_counter+52> b 0x1008c <print_counter> | |B+> 0x100c4 <print_end_msg> mov r7, #4 | | 0x100c8 <print_end_msg+4> mov r0, #1 | | 0x100cc <print_end_msg+8> ldr r1, [pc, #20] ; 0x100e8 <exit_normally+16> | | 0x100d0 <print_end_msg+12> mov r2, #12 | | 0x100d4 <print_end_msg+16> svc 0x00000000 | | 0x100d8 <exit_normally> mov r7, #1 | |------------------------------------------------------------------------------------------------------------------------------------------------------------------------| remote Thread 1.40929 In: print_end_msg L?? PC: 0x100c4 N=1 Z=0 C=0 V=0 Breakpoint 1, 0x000100c0 in print_counter () N=1 Z=0 C=0 V=0 Breakpoint 1, 0x000100c0 in print_counter () N=1 Z=0 C=0 V=0 Breakpoint 1, 0x000100c0 in print_counter () (gdb) si 0x0001008c in print_counter () (gdb) advance *0x100b8 0x000100b8 in print_counter () (gdb) source cpsr_cmp.gdb N=0 Z=1 C=1 V=0 (gdb) break *0x100c4 Breakpoint 2 at 0x100c4 (gdb) continue Continuing. Breakpoint 2, 0x000100c4 in print_end_msg () (gdb)

Notice that our program had two breakpoints set, one at 0x100c0 which would branch back to the start of our loop,
and another at 0x100c4 which will continue the rest of the program.
Our condition to branch was met by both the zero bit being set to 1 and the negative bit being set to 0, so execution moved to 0x100c4.

Enter continue one last time to finish executing the remainder of the program:

(gdb) continue Continuing. [Inferior 1 (process 1) exited normally] (gdb)

ARM Loop Exercises

Exercise 1.

Find the other flag bit that is set in the cpsr. Why is it set?

Exercise 2.

Write a gdb script that prints all of the cpsr flags in the format of the cpsr_cmp script.

Exercise 3:

Re-write the loop to allow for more than 10 iterations while printing the correct iteration number.

ARM ABI and Calling Convention

What is an ABI?

An Application Binary Interface (ABI) is a hardware-level interface used between software executables.
ABIs are similar to APIs, in that an API is a source code level interface between source code,
but while APIs are high-level and hardware indepedent. ABIs are low-level and hardware dependent.

ABI's determine:

How to pass arguments to a function
How to pass a function's return
What register's must be preserved and what registers can be be clobbered (over-written)
How data is organized in memory
How system calls are performed

We will reference the The Proceedure Call Standard for ARM32 from the ARM32 ABI to write our next program.

This document defines the following calling convention:

The registers r4-r8, r10, and r11 are used to hold the values of local variables
Registers r12-r15 have special roles: IP, SP, LR, and PC
A subroutine must preserve the contents of the registers r4-r8, r10, r11 and SP
The first four registers r0-r3 (a1-a4) are used to pass argument values into a subroutine and to return values
r0-r3 may also be used to hold intermediate values within a routine (but, in general, only between subroutine calls)

Register	Synonym	Special	Role in the procedure call standard
r15		PC	The Program Counter.
r14		LR	The Link Register.
r13		SP	The Stack Pointer.
r12		IP	The Intra-Procedure-call scratch register.
r11	v8	FP	Frame Pointer or Variable-register 8.
r10	v7		Vairable-register 7.
r9	v6	SB TR	Platform register or Variable-register 6. The meaning of this register is defined by the platform standard.
r8	v5		Variable-register 5.
r7	v4		Variable-register 4.
r6	v3		Variable-register 3.
r5	v2		Variable-register 2.
r4	v1	a4	Argument / scratch register 4.
r2	a3
r0	a1		Argument / result / scratch register 1.

ARM Functions and User Input

It's time to learn more about stack management in ARM assembly by creating a program with function calls and user input.
Examine the following source code:

.section .rodata /* Linux Syscall constants */ STDIN = 0x00 STDOUT = 0x01 EXIT = 0x01 READ = 0x03 WRITE = 0x04 ERR_INVALID_INPUT = 0x01 ERR_BUFF_OVERFLOW = 0x02 /* Valid ACII values for decimal numbers */ b_MIN_ASCII = 0x30 b_MAX_ASCII = 0x39 /* Termination character */ b_NEWLINE = 0x0a /* Input buffer size */ b_BUFFER_SIZE = 0x08 .section .data /* String variables used to prompt user and show output */ first_num_msg: .ascii "Enter the first number to add: " len_first_num_msg = ( . - first_num_msg) second_num_msg: .ascii "Enter the second number to add: " len_second_num_msg = ( . - second_num_msg) sum_msg: .ascii "The sum is: " len_sum_msg = ( . - sum_msg) invalid_msg: .ascii "ERROR: Invalid input detected\n" len_invalid_msg = ( . - invalid_msg) overflow_msg: .ascii "ERROR: Buffer overflow detected\n" len_overflow_msg = ( . - overflow_msg) .section .bss /* The block started by symbol (bss) section stores unitialized variables. They will be zero initialized in memory, so we will start with clean buffers */ first_number_buffer: .skip b_BUFFER_SIZE second_number_buffer: .skip b_BUFFER_SIZE .section .text .global _start _start: /* Prompt user to enter integers numbers, add them, and print the result */ movw r4, #0xbeef /* Load the lower-half (16-bits) of r4 */ movt r4, #0xdead /* Load the upper-half of r4 */ movw r5, #0xbabe movt r5, #0xdeed movw r6, #0xface movt r6, #0xcafe movw r7, #0xdeaf movt r7, #0xfade movw r8, #0xbabe movt r8, #0xbead movw r9, #0xface movt r9, #0xdeaf movw r10, #0xbade movt r10, #0xcade /* The above instructions set all the variable registers this is done only for demonstration purposes to provide easy data to view on the stack when debugging. The movw and movt instructions are used because ARM32 cannot load some 32-bit constants into registers with a single instruction, so you must load the lower and upper haves separately. */ prompt_for_first_number: ldr r7, =WRITE ldr r0, =STDOUT ldr r1, =first_num_msg ldr r2, =len_first_num_msg svc #0 get_first_number: ldr r0, =first_number_buffer ldr r1, =b_BUFFER_SIZE bl get_number /* r0: buffer address r1: buffer length --> r0: unsigned integer */ push {r0} /* Save the first number to the stack */ prompt_for_second_number: ldr r7, =WRITE ldr r0, =STDOUT ldr r1, =second_num_msg ldr r2, =len_second_num_msg svc #0 get_second_number: ldr r0, =second_number_buffer ldr r1, =b_BUFFER_SIZE bl get_number push {r0} /* Save the second number to the stack */ print_sum_msg: ldr r7, =WRITE ldr r0, =STDOUT ldr r1, =sum_msg ldr r2, =len_sum_msg svc #0 pop {r0,r1} /* Pop both numbers off the stack */ bl print_sum /* r0: unsigned integer r1: unsigned integer --> void */ exit_normally: ldr r7, =EXIT mov r0, #0 svc #0 exit_with_invalid_error: ldr r7, =WRITE ldr r0, =STDOUT ldr r1, =invalid_msg ldr r2, =len_invalid_msg svc #0 ldr r7, =EXIT ldr r0, =ERR_INVALID_INPUT svc #0 exit_with_overflow_error: ldr r7, =WRITE ldr r0, =STDOUT ldr r1, =overflow_msg ldr r2, =len_overflow_msg svc #0 ldr r7, =EXIT ldr r0, =ERR_BUFF_OVERFLOW svc #0 get_number: /* purpose: read a natural number from user input usage: arg0 (r0) the memory address to store ASCII input from STDIN arg1 (r1) the size of the memory buffer to store the input returns: r0: 32-bit positive integer error handling: Invalid input will result in no return and a program exit with error code 0x1 Only characters 0123456789 \n (0x0a) and (0x00) are valid */ push {fp, lr} /* Preserve the caller's frame pointer and link register (previous pc) */ mov fp, sp /* Set the frame pointer to the current stack pointer */ push {r4-r10} /* Preserve the caller's variable registers */ /* r1 and r0 are scratch registers that will get cloberred by the read syscall's arguments so we need to preserve them */ push {r1} /* Store r1 on the stack, which is the length of our buffer */ push {r0} /* Store r0 on the stack, which is the memory address we will save our input to */ ldr r7, =READ ldr r0, =STDIN pop {r1} /* r1=r0 from stack, this sets the address for the read syscall to our function's arg0 input */ pop {r2} /* r2=r1 from stack, this sets the size of the input buffer for the read syscall to our functions arg1 input */ svc #0 cmp r0, r2 /* Compare the number of bytes read to our buffer (r0) to the size of our buffer (r2) */ bge if_newline_check /* A full buffer should always end with a newline character */ b endif_newline_check if_newline_check: sub r6, r2, #0x1 /* The byte offset is 1 less than the length */ ldrb r4, [r1, r6] /* Load the last byte from the buffer */ ldr r5, =b_NEWLINE cmp r4, r5 /* If the last character isn't a newline, then there was a buffer overflow */ bne if_buffer_overflow_found b endif_buffer_overflow_found if_buffer_overflow_found: b exit_with_overflow_error /* Unconditional branch to exit the program with an overflow error */ endif_buffer_overflow_found: endif_newline_check: /* Buffer length was valid */ mov r0, r1 /* Move the buffer address into r0 to pass to validate_input */ mov r1, r2 /* Move the buffer length into r1 to pass to validate_input */ bl validate_input /* r0: buffer address, r1: buffer length --> r0: 0x0 is valid 0x1 is invalid */ cmp r0, #0x1 /* Test for invalid flag */ beq invalid_number /* branch to invalid number error handling */ valid_number: mov r0, r1 /* validate_input passes any valid number back in r1, get number passes that back in r0 */ pop {r4-r10} /* This restores the original values of r4-r11 from the stack */ pop {fp} /* Restore the previous fp to the current fp*/ pop {pc} /* This sets the pc to the lr value, so that execution resumes where this function was called from in the caller's function */ invalid_number: pop {r4-r10, fp} b exit_with_invalid_error /* Unconditional branch to exit the program with an invalid input error */ validate_input: /* parameters: arg0 (r0) the memory address to validate ASCII decimal input from arg1 (r1) the size the input memory address buffer returns: r0: 0x0 for valid decimal number, 0x1 for invalid decimal number r1: Unchanged if number was invalid, the value of the number if the number was valid */ push {fp, lr} /* Preserve the callers fp and pc */ mov fp, sp /* Set the frame pointer to the current stack pointer */ push {r4-r10} /* Preserve the caller's variable registers */ /* Register use: r0: buffer address passed to function */ mov r3, #0x0 /* Loop counter for each byte stored in the input buffer */ mov r6, #0x0 /* This will hold a flag which indicates a terminating character was found */ /* We need to check that all characters are valid decimal characters, and count them */ validate_loop: ldrb r4, [r0] /* Load one byte from the buffer memory location */ /* Newline is a valid termination */ ldr r5, =b_NEWLINE cmp r4, r5 moveq r6, #0x1 /* Flag the terminiation character if the comparison was equal */ beq valid ldr r5, =b_MAX_ASCII /* If character is greater than b_MAX_ASCII then it is invalid */ cmp r4, r5 bgt invalid ldr r5, =b_MIN_ASCII /* If character is less than b_MIN_ASCII then it is invalid */ cmp r4, r5 blt invalid add r3, r3, #0x01 /* Increment counter by 1 */ cmp r3, r1 /* Check if we have looped through all the characters */ bge end_validate_loop /* End loop */ add r0, r0, #0x01 /* Increment the memory buffer address by 1 */ b validate_loop /* Continue loop */ end_validate_loop: valid: convert_to_decimal: /* If all characters were valid, we can convert them to a decimal value. Register use: r0: buffer address passed to function r1: length of buffer passed to function r3: counter (starting with actual length) r4: current character from buffer r5: min ASCII value r6: running total r7: exponent r8: base r9: product of base and exponent r10: temp var */ cmp r6, #0x1 beq if_terminating_char b endif_terminating_char if_terminating_char: /* Check if a number wasn't entered and only enter was pressed */ cmp r3, #0x0 beq empty sub r0, r0, #0x1 /* Point to the previous character before the newline */ endif_terminating_char: mov r1, r3 /* Clobber r1 with the actual length of our number string */ ldr r5, =b_MIN_ASCII /* Reset r5 to the minimum ASCII value */ mov r6, #0 /* Reset r6 to 0 for the running total */ mov r8, #10 /* Set r8 to base 10 */ mov r9, #1 /* Set r9 to 1 for the first exponent multiplication */ /* The first digit doesn't need to be multiplied by the base and exponent */ ldrb r4, [r0] sub r4, r4, r5 /* subtract b_MIN_ASCII value from the current character to get the decimal digit */ add r6, r6, r4 /* Add the decimal digit to the running total */ sub r3, r3, #1 /* Decrement our counter by one */ sub r0, r0, #1 /* Decrement our buffer address by one */ digit_loop: cmp r3, #0 ble end_digit_loop ldrb r4, [r0] /* Load one character from the current memory position from the buffer */ sub r4, r4, r5 /* subtract b_MIN_ASCII value from the current character to get the decimal digit */ sub r10, r1, r3 /* Get the exponent value */ mov r7, r10 /* Exponent loop */ exponent_loop: mul r10, r9, r8 /* Find the product of the exponent and base */ mov r9, r10 sub r7, r7, #1 /* Decrement the exponent counter */ cmp r7, #0 /* Our exponent will increase for each digit */ ble end_exponent_loop b exponent_loop end_exponent_loop: mul r10, r4, r9 /* The new digit value is the product of the exponent product and the digit */ add r6, r6, r10 /* Add the digit to the running total */ mov r9, #1 /* Reset r9 to 1 */ sub r3, r3, #1 /* Decrement our counter by one */ sub r0, r0, #1 /* Decrement our buffer address by one */ b digit_loop end_digit_loop: mov r0, #0x0 /* Return code of 0 indicates a valid number */ mov r1, r6 /* The final running total is passed back as the number */ pop {r4-r10} /* Restore variable registers */ pop {fp, pc} /* Restore variable fp and resume execution from lr address */ invalid: mov r0, #0x1 /* Return code of 1 indicates an invalid number */ pop {r4-r10} /* Restore variable registers */ pop {fp, pc} /* Restore variable fp and resume execution from lr address */ empty: /* The user entered an empty number */ mov r0, #0x0 /* It is valid, but equivalent to zero */ mov r1, #0x0 pop {r4-r10} /* Restore variable registers */ pop {fp, pc} /* Restore variable fp and resume execution from lr address */ print_sum: /* parameters: arg0 (r0) the first number to add arg1 (r1) the second number to add returns: void */ push {fp, lr} /* Preserve the callers fp and pc */ mov fp, sp /* Set the frame pointer to the current stack pointer */ push {r4-r10} /* Preserve the caller's variable registers */ add r0, r0, r1 /* Adds both numbers and clobbers r0 with the sum */ /* Variable registers: r4: counter r5: b_MIN_ASCII / b_NEWLINE r6: divisor / newline flag r7: quotient r8: divisor * quotient product r9: remainder/decimal digit/null pad r10: the base address of our string on the stack */ sub sp, sp, #0x0c /* Make room on the stack, 12 bytes can hold 10 characters for a 32-bit integer */ mov r10, sp /* Store the base address of our string */ mov r4, #0x0b /* We are storing little endian, so we need to start at the end of the stack for our loop*/ ldr r5, =b_NEWLINE mov r6, #10 /* Set the divisor to 10 */ /* Store a newline which will be read last for little endian */ strb r5, [sp, r4] /* Store the newline character on the stack */ sub r4, r4, #0x1 /* Decrement our counter to reflect writing the newline character */ ldr r5, =b_MIN_ASCII digit_to_ASCII_loop: cmp r0, #0x0 bne if_more_digits else_no_more_digits: /* Null pad the rest of the string */ mov r5, #0x00 strb r5, [sp, r4] b endif_more_digits if_more_digits: sdiv r7, r0, r6 /* Divide the hex number by 10 */ mul r8, r7, r6 /* Multiply the quotient by 10 */ sub r9, r0, r8 /* Find the remainder */ add r9, r9, r5 /* Add b_MIN_ASCII to the remainder to convert it to the ASCII decimal */ strb r9, [sp, r4] /* Store the ASCII character on the stack */ mov r0, r7 /* Overwrite the original number with the quotient */ endif_more_digits: sub r4, r4, #0x1 cmp r4, #0x0 bge digit_to_ASCII_loop print_sum_syscall: ldr r7, =WRITE ldr r0, =STDOUT mov r1, r10 /* Set the write address to the base of our string */ mov r2, #0x0c svc #00000000 add sp, sp, #0x0c /* Move the stack pointer back */ pop {r4-r10, fp, pc} /* Restore the stack and return to the caller */