Now that you have learned some assembly fundamentals, it is time examine a different architecture.
ARM processors are an extremely popular choice for devices such as smartphones, tablets, TVs, routers, IoT systems, and other embedded devices.
In this section we will examine the basic ARM 32-bit architecture, write a Hello World program, cross-assemble it, link it, run it by emulating
an ARM processor on our x86_64 machine, and debug it with GDB.
ARM Assembly Introduction
ARM Tool Installation
For this section you will need to install gcc for arm, gdb for multiple architectures, and qemu user tools for emulation
To install the necessary packages enter:
To install the necessary packages enter:
sudo apt install gcc-arm-linux-gnueabihf gdb-multiarch qemu-user
ARM Registers
While we haven't examined the x86_64 architecture yet, you will discover that the 32-bit implementation of ARM
is much more similar to its 64-bit counterpart than the x86/i386 architecture is to its 64-bit x86_64/AMD64 counterpart.
This is because ARM has kept much more parity developing the 32-bit and 64-bit implementations of its architecture.
We will first be examining the common 32-bit registers used by ARMv7 or ARMv8 (when operating in 32-bit mode).
As with Intel for the x86 processor, extensive documentation for the
ARM architecture is available here.
It should also be noted that ARM devides its processors in to 3 profiles:
ARM uses a version number to refer to the major revision of the architecture and instruction sets, such as v7, v8, v9, etc. Different versions may support either 32-bit or 64-bit operations, or both.
32-bit ARMv7 registers can be broken down as follows:
The flags for the cpsr are shown below:
Below are the cpsr flag functions:
The spsr is used to save the state of the cpsr registers when the processor changes privilege modes. This frees the cpsr to load flags for the current state and allows the previous state to be restored later.
It should also be noted that ARM devides its processors in to 3 profiles:
- a - Application profile, used for general purpose computing
- m - Microcontroller profile, used for small low-power applications such as sensors
- r - Real-time profile, used in applications that require predictable and consisting timing with processor results
ARM uses a version number to refer to the major revision of the architecture and instruction sets, such as v7, v8, v9, etc. Different versions may support either 32-bit or 64-bit operations, or both.
32-bit ARMv7 registers can be broken down as follows:
General purpose registers Special Function Registers Program Status Registers (Similar to the x86 eflag register) Floating point registers
r0 r13 sp (stack pointer) (equivalent to x86 esp) cpsr (current program status register) 32-bit (float)
r1 r14 lr (link register) (equivalent to x86 ebp) spsr (saved program status register) s0 to s31
r2 r15 pc (program counter) (equivalent to x86 eip)
r3 64-bit (double)
r4 d0 to d15
r5
r6
r7
r8
r9
r10
r11
r12
The flags for the cpsr are shown below:
bit 0x1F 1E 1D 1C 1B 1A 19 18 17 16 15 14 13 12 11 10 0F 0E 0D 0C 0B 0A 09 08 07 06 05 04 03 02 01 00
flag N Z C V Q 00 00 00 SSBS PAN DIT 00 | GE | 00 00 00 00 00 00 E A I F T 00 | M |
Below are the cpsr flag functions:
- N: (Negative) flag, indicates whether the result of the last operation was negative (1) or positive (0)
- Z: (Zero) flag, indicates whether the result of the last operation was zero (1) or not zero (0)
- C: (Carry) flag, indicates whether there was a carry (1) or not (0) during the last arithmetic operation
- V: (Overflow) flag, indicates whether an overflow occurred (1) or not (0) during the last arithmetic operation
- Q (Saturation) flag, indicates whether saturation occurred (1) or not (0) during the last operation
- SSBS (Speculative Store Bypass Safe) flag, indicates wether speculative loading of data is permitted (1) or not (0)
- PAN (Privileged Access Never) flag, indicates wether privileged instructions can be executed in User mode (1) or not (0)
- DIT (Data Independent Timing) flag, indicates if wether the processor can (0) execute instructions with timing independent timing of data processing or not (1)
- GE (Greater than or equal), indicate the results of signed comparisons between operands
- IT (If-Then) flags, indicate the execution state of the If-Then instruction
- J (Jazelle) flag, indicates whether the processor is executing in Jazelle (Java support) mode (1) or not (0)
- E (Endianness) flag, indicates the endianness of the processor, either little-endian (0) or big-endian (1)
- A (Auxiliary carry), indicates whether there was a carry (1) or not (0) between the low nibble and high nibble during an 8-bit operation
- I (Interrupt) flag, indicates whether maskable (optional) hardware interrupts should be processed (1) or not (0)
- F (Fast Interrupt), indicates whether fast interrupt exceptions should be processed (1) or not (0)
- T (Thumb) flag, indicates the execution state of the processor, either Thumb (1) or ARM (0)
- M (Processor mode) flags, indicate the current processor mode, such as User, System, FIQ, IRQ, Supervisor, Abort, Undefined, or Monitor
Flag in ARM Flag in x86 Flag
N SF Negative
Z ZF Zero
C CF Carry
V OF Overflow
A AF Auxiliary
I IF Interrupt
The spsr is used to save the state of the cpsr registers when the processor changes privilege modes. This frees the cpsr to load flags for the current state and allows the previous state to be restored later.
ARM Hello World
We are now ready to write a hello world program for ARM. We will build upon what we have already learned
from our x86 hello world, and note the differences for GNU ARM assembly.
We will be using Linux syscall table references again. This time for ARM 32-bit.
We will be using Linux syscall table references again. This time for ARM 32-bit.
.section .rodata
/*
The .rodata section will be stored as read-only in memory.
This section is included by GAS in the overall .data section, but it is flagged as read-only
*/
b_STDOUT = 0x01 /* This defines b_STDOUT as a byte sized constant with a value of 0x01 */
b_WRITE = 0x04 /* This defines b_WRITE as a byte sized contant with avalue of 0x04 */
.section .data
hello_msg:
.ascii "Hello World!\n"
end_hello_msg:
len_hello_msg = (end_hello_msg - hello_msg)
/*
This declares a variable len_hello_msg and assigns it the difference between
the end_hello_msg label address and the hello_msg label address.
Parenthesis are not necessary in this instance, len_hello_msg = end_hello_msg - hello_msg would
evaluate the same
*/
unused_label:
.hword 0xbeef
/*
This label is here to illustrate how GAS stores label addresses for ARM assembly
that are not assigned during the program execution, vs. those that are.
Note: The size of a word depends on the processor architecture, for ARM32 a word
is 32 bits (4 bytes), so to store 2 bytes of data, we use the half-word (.hword) directive
*/
.section .text
.global _start
_start:
/*
Write "At start" and "Hello World!" to stdout
Write syscall reference:
r7 r0 (arg0) r1(arg1) r2(arg2)
0x04 unsigned int fd const char *buf size_t count
*/
print_start_msg:
ldr r7, =b_WRITE
/*
The load register (ldr) instruction is similar to the lea instruction for the x86 processor
in that, it loads a calculated memory address or immediate value into a register.
Like the eax register for x86, r7 is used to determine the syscall function for ARM
Using the = character with ldr is an ARM specific pseudo-instruction that
specifies a symbol name which represents a constant value or an address.
The assembler will determine the type of value and modify the instruction
to either load a relative memory address or an immediate value.
For this instruction, ldr will load an immediate value into r7 because
b_STDOUT is a constant and not the label for a memory address.
For more information on this instruction, refer to this reference:
https://developer.arm.com/documentation/dui0041/c/Babbfdih
*/
ldr r0, =b_STDOUT /* Another constant value loaded for the FD */
adr r1, start_msg
/*
Address (adr) loads the address of a label into a register.
The major functional difference between ldr and adr is that adr can only reference
memory locations inside the .text section of code, while ldr can resolve
addresses and values from any section.
While both ldr and adr could load addresses from labels in the .text section,
adr is more efficient for this specific task and should be used for that purpose
*/
ldr r2, =len_start_msg /* This will resolve to the value of len_start_msg, and load it into r2 */
svc #00000000
/*
When writing ARM assembly for GAS, the # character is use to prefix an immediate value assignment
This SuperVisor (svc) call is similar to the int 0x80 call for the x86.
It will initiate the execution of the syscall by calling a system interrupt.
svc creates an exception and passes the immediate value to the exception handler.
In earlier versions of ARM svc was called swi (SoftWare Interrupt), but they effectively the same
*/
write_hello_msg:
ldr r7, =b_WRITE
ldr r0, =b_STDOUT
ldr r1, =hello_msg
/*
For this instruction, hello_msg is a label for a memory address located in the .data section.
Using the ldr pseudo-operation, the assembler will create
an immediate values at the end the .text section to store the label address in.
It will then reference the memory location for that immediate value and assign it to r1.
It uses the Program Counter (PC) register as a base address and offsets from PC to the address.
This is essentially what we did with the adr instruction, except the assembly is copying the
address for the label in .data and placing it in the .text section to assign.
*/
ldr r2, =len_hello_msg
svc #0x00000000
exit_normally:
/*
exit syscall reference:
r7 r0 (arg0)
0x01 int error_code
*/
mov r7, #0x00000001 /* Like with x86 assembly, mov can be used to load an immediate value into a register */
mov r0, #0x00000000
svc #0x00000000
/*
The following section of code was added to show how you can also place
variables and labels for data in the .text section after your code.
They must be placed after your code, because they are not executable instructions.
They should never be reached by your program's noraml execution or it will crash.
*/
start_msg:
.ascii "At start\n"
len_start_msg = . - start_msg
/*
In GAS, the . character is used to reference the current position in memory,
so instead of creating the label end_start_msg and writing "len_start_msg = (end_start_msg - start_msg)"
we can just write this as shorthand.
*/
ARM Assemble Link and Run
Once you have copied the code and have thoroughly read through the comments, it is time to make it executable.
GNU provides a cross-assembler for the ARM instruction set which is included in the gcc-arm-linux-gnueabihf package.
To assemble the source, navigate to the directory where the source is saved and enter the command:
Enter the command:
It should produce the output:
arm-linux-gnueabihf-as -o hello_arm32.o hello_arm32.asm
This command assumes that you saved the source file as hello_arm32.asm. It directs GAS to create a 32-bit object
file named hello_arm32.o and to use the hello_arm32.asm file as an input source.
Enter the command:
arm-linux-gnueabihf-ld -o hello_arm32 hello_arm32.o
Now that the executable is created, we will use Quick Emulator (QEMU) to execute it natively on our x86 system.
Enter the command:
qemu-arm hello_arm32
It should produce the output:
At start
Hello World!
ARM Debugging in GDB
QEMU has an option that allows GDB to connect to it over a network socket.
To run our program in QEMU as a GDB server enter:
Once QEMU is running, we will launch GDB for multiarchitectures, open our binary as a template, and then connect to the running process in QEMU. To do so enter:
You should see an output similar to the following:
We can now open our layouts with:
Note, there will be no register values loaded as we haven't stepped into an instruction yet.
Let's examine our first instruction:
The next instruction is the same as the first, so let's step into our instructions until we reach the third line:
Our instruction:
adr r1, start_msg
has been translated to:
add r1, pc, #36
The add instruction takes the destination register to store the result, and the two arguments to add. In this instance, the immediate value #36 is being added to the pc (program counter) register's value.
This is where things can be confusing. While GDB lists the pc register currently as 0x1007c, ARM's pc register actually stays two instructions ahead of the program, and since these are 32-bit instructions, the value of the pc register will actually be 8 bytes more than our current line (32 bits * 2 = 64 bits = 8 bytes).
While you would expect add r1,pc, #36 to store, 0x100a0 in the register, if we step forward one instruction:
If we look further down our assembly layout, we can see that it is our start_msg label:
Now let's examine our ldr instruction:
ldr r2, =len_start_msg
This was resolved by the assembler to:
ldr r2, [pc, #44]
len_start_msg is a variable symbol, so =len_start_msg will evaluate to loading the value for that symbol.
pc, #44 takes the current pc register value and adds 44 to it.
[pc, #44] evaluates the data at that address and loads it to the destination register r2.
We know that the pc value will be two instructions ahead, so if we add 0x10080 + 8 + 44:
Notice when we printed our address calculation, GDB automatically stored it in the variable $2 to allow for easy referencing.
Let's step forward in our program to the next ldr instruction:
For this instruction, the assembly is loading the value [] stored at the offset of the pc register + 36 bytes.
This should evaluate to the value stored at:
What is stored at 0x100b8 ?
This is the memory addres 0x000200bc in little endian.
Where is this address?
The assembler retrieved the address of our hello_msg label from the .data section,
then it appended that address value to the end of our .text section of code,
then it loaded that address into the r1 register by offsetting from the pc register
to the memory address in the .text section that contained the memory address for the actual data.
qemu-arm -g 2345 hello_arm32 &
This will launch our program in the background with QEMU and bind to port 2345.
Port 2345 is an arbitrary and can be changed to whatever you want to bind to. Once QEMU is running, we will launch GDB for multiarchitectures, open our binary as a template, and then connect to the running process in QEMU. To do so enter:
$gdb-multiarch
(gdb) file hello_arm32
(gdb) target remote localhost:2345
You should see an output similar to the following:
pete@framework16:~/Documents/ASM/hello_world/ARM32$ gdb-multiarch
GNU gdb (Ubuntu 12.1-0ubuntu1~22.04) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".
(gdb) file hello_arm32
Reading symbols from hello_arm32...
(No debugging symbols found in hello_arm32)
(gdb) target remote localhost:2345
Remote debugging using localhost:2345
0x00010074 in _start ()
(gdb)
Notice that we do not need to set a break point and run the program,
because QEMU has already set a break at the _start label and executed it. We can now open our layouts with:
lay asm
lay reg
You should now have the familiar layout of registers, assembly, and commands. Note, there will be no register values loaded as we haven't stepped into an instruction yet.
Let's examine our first instruction:
| > 0x10074 <_start> mov r7, #4
The assembly source was ldr r7, =b_WRITE, but because b_WRITE was a constant value,
the assembler translated this to just moving its immediate value into the register. The next instruction is the same as the first, so let's step into our instructions until we reach the third line:
|-Register group: general------------------------------------------------------------------------------------------------------------------------------------------------|
|r0 0x1 1 r1 0x40800b39 1082133305 r2 0x0 0 |
|r3 0x0 0 r4 0x0 0 r5 0x0 0 |
|r6 0x0 0 r7 0x4 4 r8 0x0 0 |
|r9 0x0 0 r10 0x200bc 131260 r11 0x0 0 |
|r12 0x0 0 sp 0x408009d0 0x408009d0 lr 0x0 0 |
|pc 0x1007c 0x1007c <_start+8> cpsr 0x10 16 fpscr 0x0 0 |
|fpsid 0x410430f0 1090793712 fpexc 0x40000000 1073741824 AFSR0_EL1 0x0 0 |
|AFSR1_EL1 0x0 0 DBGDIDR 0x3515f021 890630177 DBGDSAR 0x0 0 |
|DBGBVR 0x0 0 DBGBCR 0x0 0 DBGWVR 0x0 0 |
|DBGWCR 0x0 0 PAR 0x0 0 DBGBVR 0x0 0 |
|DBGBCR 0x0 0 DBGWVR 0x0 0 DBGWCR 0x0 0 |
|TEECR 0x0 0 MIDR_EL1 0x412fc0f1 1093648625 CTR 0x8444c004 -2075869180 |
|TCMTR 0x0 0 TTBR0_EL1 0x0 0 PMCCNTR 0x0 0 |
|TLBTR 0x0 0 TTBR1_EL1 0x0 0 MIDR 0x412fc0f1 1093648625 |
|TTBCR 0x0 0 MPIDR_EL1 0x80000000 -2147483648 TTBCR2 0x0 0 |
|REVIDR_EL1 0x0 0 MIDR 0x412fc0f1 1093648625 JIDR 0x0 0 |
|CLIDR 0xa200023 169869347 DFAR 0x0 0 WFAR 0x0 0 |
|IFAR 0x0 0 JMCR 0x0 0 AIDR 0x0 0 |
|CSSELR 0x0 0 ID_PFR2 0x10 16 VBAR 0x0 0 |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 0x10074 <_start> mov r7, #4 |
| 0x10078 <_start+4> mov r0, #1 |
| > 0x1007c <_start+8> add r1, pc, #36 ; 0x24 |
| 0x10080 <_start+12> ldr r2, [pc, #44] ; 0x100b4 <start_msg+12> |
| 0x10084 <_start+16> svc 0x00000000 |
| 0x10088 <write_hello_msg> mov r7, #4 |
| 0x1008c <write_hello_msg+4> ldr r1, [pc, #36] ; 0x100b8 <start_msg+16> |
| 0x10090 <write_hello_msg+8> mov r0, #1 |
| 0x10094 <write_hello_msg+12> mov r2, #13 |
| 0x10098 <write_hello_msg+16> svc 0x00000000 |
| 0x1009c <exit_normally> mov r7, #1 |
| 0x100a0 <exit_normally+4> mov r0, #0 |
| 0x100a4 <exit_normally+8> svc 0x00000000 |
| 0x100a8 <start_msg> ; <UNDEFINED> instruction: 0x73207441 |
| 0x100ac <start_msg+4> ldrbtvc r6, [r2], #-372 ; 0xfffffe8c |
| 0x100b0 <start_msg+8> andeq r0, r0, r10 |
| 0x100b4 <start_msg+12> andeq r0, r0, r9 |
| 0x100b8 <start_msg+16> strheq r0, [r2], -r12 |
| 0x100bc cfstr64vs mvdx6, [r12], #-288 ; 0xfffffee0 |
| 0x100c0 svcvs 0x0057206f |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
remote Thread 1.383292 In: _start L?? PC: 0x1007c
(gdb) lay reg
(gdb) si
0x00010078 in _start ()
(gdb) si
0x0001007c in _start ()
(gdb)
Our instruction:
adr r1, start_msg
has been translated to:
add r1, pc, #36
The add instruction takes the destination register to store the result, and the two arguments to add. In this instance, the immediate value #36 is being added to the pc (program counter) register's value.
This is where things can be confusing. While GDB lists the pc register currently as 0x1007c, ARM's pc register actually stays two instructions ahead of the program, and since these are 32-bit instructions, the value of the pc register will actually be 8 bytes more than our current line (32 bits * 2 = 64 bits = 8 bytes).
While you would expect add r1,pc, #36 to store, 0x100a0 in the register, if we step forward one instruction:
r1 0x100a8 65704
We see that 0x100a8 is in fact stored in r1. If we look further down our assembly layout, we can see that it is our start_msg label:
0x100a8 <start_msg> ; <UNDEFINED> instruction: 0x73207441
Notice that the disassembler is attempting to interpret the data as instructions, this is
because it resides in the .text section with our code, but it does not contain valid
assembly instructions. Now let's examine our ldr instruction:
> 0x10080 <_start+12> ldr r2, [pc, #44]
Our original instruction was: ldr r2, =len_start_msg
This was resolved by the assembler to:
ldr r2, [pc, #44]
len_start_msg is a variable symbol, so =len_start_msg will evaluate to loading the value for that symbol.
pc, #44 takes the current pc register value and adds 44 to it.
[pc, #44] evaluates the data at that address and loads it to the destination register r2.
We know that the pc value will be two instructions ahead, so if we add 0x10080 + 8 + 44:
(gdb) print/x (0x10080 + 8 + 44)
$2 = 0x100b4
And we know that the length of "At start\n" should be 9 bytes, so 0x09 should be stored at 0x100b4:
(gdb) x /1xb $2
0x100b4 <start_msg+12>: 0x09
And we can see that 0x09 is indeed stored at that location. Notice when we printed our address calculation, GDB automatically stored it in the variable $2 to allow for easy referencing.
Let's step forward in our program to the next ldr instruction:
0x1008c <write_hello_msg+4> ldr r1, [pc, #36] ; 0x100b8 <start_msg+16>
For this instruction, the assembly is loading the value [] stored at the offset of the pc register + 36 bytes.
This should evaluate to the value stored at:
(gdb) print /x (0x1008c + 8 + 36)
$3 = 0x100b8
What is stored at 0x100b8 ?
(gdb) x /4xb $3
0x100b8 <start_msg+16>: 0xbc 0x00 0x02 0x00
This is the memory addres 0x000200bc in little endian.
Where is this address?
(gdb) info file
Symbols from "/home/pete/Documents/ASM/hello_world/ARM32/hello_arm32".
Remote target using gdb-specific protocol:
`/home/pete/Documents/ASM/hello_world/ARM32/hello_arm32', file type elf32-littlearm.
Entry point: 0x10074
0x00010074 - 0x000100bc is .text
0x000200bc - 0x000200cb is .data
While running this, GDB does not access memory from...
Local exec file:
`/home/pete/Documents/ASM/hello_world/ARM32/hello_arm32', file type elf32-littlearm.
Entry point: 0x10074
0x00010074 - 0x000100bc is .text
0x000200bc - 0x000200cb is .data
(gdb)
We can see it is in our data section:
(gdb) x /13cb 0x000200bc
0x200bc: 72 'H' 101 'e' 108 'l' 108 'l' 111 'o' 32 ' ' 87 'W' 111 'o'
0x200c4: 114 'r' 108 'l' 100 'd' 33 '!' 10 '\n'
And there is our Hello World! message. The assembler retrieved the address of our hello_msg label from the .data section,
then it appended that address value to the end of our .text section of code,
then it loaded that address into the r1 register by offsetting from the pc register
to the memory address in the .text section that contained the memory address for the actual data.
ARM Loops and Stack Intro
The Stack
The stack is a special area of RAM that is reserved for a program by the Operating System. It is used primarily as memory that the program can use to organize and store local variables and function arguments which require more space than can be stored in available CPU registers.The maximum stack size for a program is determined by the operating system. In Linux, the default maximum stack size in Kb can be output with:
ulimit -s
8192
The stack limit above is 8Mb. We will examine the stack in greater details in future sections, but for now understand these characteristics:
- The stack is a linear data structure that follows a Last-In, First-Out (LIFO) principle
- The last element added is always the first to be removed
- New data can be "pushed" onto the stack or "popped" off the stack
- The stack "grows down" in memory, which can be confusing because the "top" of the stack will always have the lowest memory address
- The sp register stores the memory address for the top of the stack
Loops
A loop is a simple logical construct which repeatedly executes instructions until a condition is met.To demonstrate this functionality, we will write a program which will execute a block of code 10 times.
The code will print the counter for the loop, showing what iteration it is on, and will utilize the stack to facilitate this:
.section .rodata
/* Linux Syscall constants */
b_STDOUT = 0x01
b_WRITE = 0x04
/* Offset to convert a value to a single digit ASCII character decimal */
b_ASCII_OFFSET = 0x30
.section .data
begin_msg:
.ascii "Starting while loop:\n"
len_begin_msg = ( . - begin_msg)
end_msg:
.ascii "Loop ended.\n"
len_end_msg = ( . - end_msg)
.section .text
.global _start
_start:
print_begin_msg:
ldr r7, =b_WRITE
ldr r0, =b_STDOUT
ldr r1, =begin_msg
ldr r2, =len_begin_msg
svc #0
mov r3, #0x0
/*
This sets r3 to 0 to prepare it to use as our counter for the loop.
Use of r3 is arbitrary, any GP register will do, but r3 is the next register
not used by the write syscall, which will be used in the loop
*/
begin_while:
print_counter:
ldr r7, =b_WRITE
ldr r0, =b_STDOUT
ldr r1, =b_ASCII_OFFSET /* Start with a value of 0x30 */
add r1, r1, r3
/*
Add our counter value to 0x30 to get the ASCII decimal number for the counter
0x30 is the hex value for the decimal ASCII 0, 0x31 is 1 etc.
*/
orr r1, r1, #0x0a00
/*
The orr instruction performs a logical or between two registers or immediate values.
This effectively combines the 0x0a value for an ASCII newline character with our
original value for the ASCII value of the loop counter and stores both in r1
*/
push {r1}
/*
The push instruction will store the values in a list of registers in the memory stack.
The values will be placed on the stack in order of the register numbers,
so the lowest number register will by at the top of the stack and the highest number at the bottom.
This command can be written in several different forms:
sub sp, sp, #4
str r1, [sp]
This subtracts 4 bytes from the stack pointer address, then it stores (str) the value of
r1 at the stack pointer address
str r1, [sp, #-4]!
This executes the same thing in a single instruction: it stores r1 at the stack pointer
address minus 4 bytes, then the ! character decrements the stack pointer
*/
mov r1, sp
/*
The stack pointer stores the memory address of the last data placed on the stack
The last dat placed on the stack was the value stored in r1, which contains our
two ASCII character codes. This instruction will store the memory address to that
location in the stack in r1. This is necessary, because the write syscall takes
a memory address as an argument for a string to write, not the actual value.
*/
mov r2, #0x2
/*
We will set arg3 for the syscall to 2 bytes, because we will print both the number character and the newline character.
*/
svc #0
add sp, sp, #0x4
/*
This will move the stack pointer back up to its original position.
This will allow us to overwrite the previous characters every time the loop runs.
If we did not include this instruction, the stack would continue to grow every time the
loop ran.
*/
cmp r3, #0x9
/*
This instruction performs a subtraction operation between the value in register r3 and the immediate value 9.
The compare instruction (cmp) disgards the results of the subtraction operation, but it updates the zero (Z) and
negative (N) flags in the cpsr appropriately:
If the values are equal, the zero flag will be set to 1.
If the result of the subtraction is negative, then the negative flag is set to 1.
The carry (C) and overflow (V) flags are also set based on the result.
This instruction is the same as writing:
subs r0, r3, #0x09
The subtract and set flags (subs) instruction performs the same operation as cmp, except it has
the option of storing the result in a register. Even though r0 can be used to store the result
in the example above, by convention this indicates that the value should be disgarded.
For a simple comparison, this isn't useful, but if we wanted to compare values
and store the result in r1, we could write:
subs r1, r3, #0x09
*/
bge end_while
/*
The branch greater or equal to (bge) instruction checks the values of the cpsr flags
If the zero flag is (1), it means the that the comparison was equal and it branches to the
end_while label by setting the pc to the end_while label's memory address.
If the negative flag is not set, that means that r3 was greater than #0x9, so the
program execution will also move to the end_while label.
*/
add r3, r3, #0x01 /* Increment our counter by 1 */
b begin_while
/*
branch (b) is an unconditional branch instruction, this will always change the pc to the address of
the begin_while label and continue execution.
*/
end_while:
print_end_msg:
ldr r7, =b_WRITE
ldr r0, =b_STDOUT
ldr r1, =end_msg
ldr r2, =len_end_msg
svc #00000000
exit_normally:
mov r7, #0x00000001
mov r0, #0x00000000
svc #0x00000000
After reading through the source code and studying the comments, assemble and link it. Run it in qemu so that we can debug it with gdb.
ARM Debugging Loops
Once we are attached to our remote program in gdb, we will open our assembly and register layouts as before.
We are familiar with the write syscall already, so we will advance forward to the start of our loop.
Enter:
Let's step into our instructions from here and examine the orr instruction at 0x1009c:
As we step through the instruction we can see that the value of r1 changes from 0x30 to 0x0a30.
It now contains the value for ASCII "0\n"
Now lets step to the push instruction:
Now after the push:
This is 4 bytes lower than the previous address. Lets examine the data at that address:
We can see the value of r1 is now at that address.
The instruction mov r1, sp will store that address in r1 to pass to the write syscall.
Let's now examine the cmp instruction at 0x100b4:
What flags are now set? We could print the value in binary with:
We know that the Z, N, C, and V flags are set by the cmp instruction, so let's format the output to show those flags.
Enter the following script to show a formatted output for the flags:
We can see from this script that the negative bit was set by the comparison, because 0 - 9 = -9 which is negative.
This would be lengthy to type out every time we want to check those flags, so lets open a text editor and save the script as cpsr_cmp.gdb
We can now run the script inside gdb by entering:
Otherwise you must use the path to the script.
We ant to iterate through our loop, but we don't want to manually step through every instruction over and over again.
We can automate this process by using another script. First, we will set a break point at the end of our loop with:
Enter:
Now we can write our script.
Enter:
Your output should be similar to this (note you may have to enter ctrl + l to re-draw your screen):
Notice we are now on what should be the last iteration of the loop.
Let's advance to the bge instruction with:
Let's look at what flags were set with our cmp instruction:
The zero bit is set because r3 was equal to 9.
The negative bit is not set because the result of 9 - 9 isn't negative.
Both of these condititions should cause our branch condition to be met.
Let's test this branch by setting a break point where we should jump to:
Enter continue to advance the program to the next break:
and another at 0x100c4 which will continue the rest of the program.
Our condition to branch was met by both the zero bit being set to 1 and the negative bit being set to 0, so execution moved to 0x100c4.
Enter continue one last time to finish executing the remainder of the program:
We are familiar with the write syscall already, so we will advance forward to the start of our loop.
Enter:
advance print_counter
Your output should look similar to this:
|-Register group: general------------------------------------------------------------------------------------------------------------------------------------------------|
|r0 0x15 21 r1 0x200ec 131308 |
|r2 0x15 21 r3 0x0 0 |
|r4 0x0 0 r5 0x0 0 |
|r6 0x0 0 r7 0x4 4 |
|r8 0x0 0 r9 0x0 0 |
|r10 0x200ec 131308 r11 0x0 0 |
|r12 0x0 0 sp 0x408009c0 0x408009c0 |
|lr 0x0 0 pc 0x1008c 0x1008c <print_counter> |
|cpsr 0x10 16 fpscr 0x0 0 |
|fpsid 0x410430f0 1090793712 fpexc 0x40000000 1073741824 |
|AFSR0_EL1 0x0 0 AFSR1_EL1 0x0 0 |
|DBGDIDR 0x3515f021 890630177 DBGDSAR 0x0 0 |
|DBGBVR 0x0 0 DBGBCR 0x0 0 |
|DBGWVR 0x0 0 DBGWCR 0x0 0 |
|PAR 0x0 0 DBGBVR 0x0 0 |
|DBGBCR 0x0 0 DBGWVR 0x0 0 |
|DBGWCR 0x0 0 TEECR 0x0 0 |
|MIDR_EL1 0x412fc0f1 1093648625 CTR 0x8444c004 -2075869180 |
|TCMTR 0x0 0 TTBR0_EL1 0x0 0 |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 0x10074 <_start> mov r7, #4 |
| 0x10078 <_start+4> mov r0, #1 |
| 0x1007c <_start+8> ldr r1, [pc, #96] ; 0x100e4 <exit_normally+12> |
| 0x10080 <_start+12> mov r2, #21 |
| 0x10084 <_start+16> svc 0x00000000 |
| 0x10088 <_start+20> mov r3, #0 |
| > 0x1008c <print_counter> mov r7, #4 |
| 0x10090 <print_counter+4> mov r0, #1 |
| 0x10094 <print_counter+8> mov r1, #48 ; 0x30 |
| 0x10098 <print_counter+12> add r1, r1, r3 |
| 0x1009c <print_counter+16> orr r1, r1, #2560 ; 0xa00 |
| 0x100a0 <print_counter+20> push {r1} ; (str r1, [sp, #-4]!) |
| 0x100a4 <print_counter+24> mov r1, sp |
| 0x100a8 <print_counter+28> mov r2, #2 |
| 0x100ac <print_counter+32> svc 0x00000000 |
| 0x100b0 <print_counter+36> add sp, sp, #4 |
| 0x100b4 <print_counter+40> subs r0, r3, #9 |
| 0x100b8 <print_counter+44> bge 0x100c4 <print_end_msg> |
| 0x100bc <print_counter+48> add r3, r3, #1 |
| 0x100c0 <print_counter+52> b 0x1008c <print_counter> |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
remote Thread 1.30460 In: print_counter L?? PC: 0x1008c
(gdb) lay reg
(gdb) advance print_counter
0x0001008c in print_counter ()
(gdb)
Let's step into our instructions from here and examine the orr instruction at 0x1009c:
|-Register group: general------------------------------------------------------------------------------------------------------------------------------------------------|
|r0 0x1 1 r1 0x30 48 |
|r2 0x15 21 r3 0x0 0 |
|r4 0x0 0 r5 0x0 0 |
|r6 0x0 0 r7 0x4 4 |
|r8 0x0 0 r9 0x0 0 |
|r10 0x200ec 131308 r11 0x0 0 |
|r12 0x0 0 sp 0x408009c0 0x408009c0 |
|lr 0x0 0 pc 0x1009c 0x1009c <print_counter+16> |
|cpsr 0x10 16 fpscr 0x0 0 |
|fpsid 0x410430f0 1090793712 fpexc 0x40000000 1073741824 |
|AFSR0_EL1 0x0 0 AFSR1_EL1 0x0 0 |
|DBGDIDR 0x3515f021 890630177 DBGDSAR 0x0 0 |
|DBGBVR 0x0 0 DBGBCR 0x0 0 |
|DBGWVR 0x0 0 DBGWCR 0x0 0 |
|PAR 0x0 0 DBGBVR 0x0 0 |
|DBGBCR 0x0 0 DBGWVR 0x0 0 |
|DBGWCR 0x0 0 TEECR 0x0 0 |
|MIDR_EL1 0x412fc0f1 1093648625 CTR 0x8444c004 -2075869180 |
|TCMTR 0x0 0 TTBR0_EL1 0x0 0 |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 0x10074 <_start> mov r7, #4 |
| 0x10078 <_start+4> mov r0, #1 |
| 0x1007c <_start+8> ldr r1, [pc, #96] ; 0x100e4 <exit_normally+12> |
| 0x10080 <_start+12> mov r2, #21 |
| 0x10084 <_start+16> svc 0x00000000 |
| 0x10088 <_start+20> mov r3, #0 |
| 0x1008c <print_counter> mov r7, #4 |
| 0x10090 <print_counter+4> mov r0, #1 |
| 0x10094 <print_counter+8> mov r1, #48 ; 0x30 |
| 0x10098 <print_counter+12> add r1, r1, r3 |
| > 0x1009c <print_counter+16> orr r1, r1, #2560 ; 0xa00 |
As we step through the instruction we can see that the value of r1 changes from 0x30 to 0x0a30.
It now contains the value for ASCII "0\n"
r1 0xa30 2608
Now lets step to the push instruction:
|-Register group: general------------------------------------------------------------------------------------------------------------------------------------------------|
|r0 0x1 1 r1 0xa30 2608 r2 0x15 21 |
|r3 0x0 0 r4 0x0 0 r5 0x0 0 |
|r6 0x0 0 r7 0x4 4 r8 0x0 0 |
|r9 0x0 0 r10 0x200ec 131308 r11 0x0 0 |
|r12 0x0 0 sp 0x408009c0 0x408009c0 lr 0x0 0 |
|pc 0x100a0 0x100a0 <print_count cpsr 0x10 16 fpscr 0x0 0 |
|fpsid 0x410430f0 1090793712 fpexc 0x40000000 1073741824 AFSR0_EL1 0x0 0 |
|AFSR1_EL1 0x0 0 DBGDIDR 0x3515f021 890630177 DBGDSAR 0x0 0 |
|DBGBVR 0x0 0 DBGBCR 0x0 0 DBGWVR 0x0 0 |
|DBGWCR 0x0 0 PAR 0x0 0 DBGBVR 0x0 0 |
|DBGBCR 0x0 0 DBGWVR 0x0 0 DBGWCR 0x0 0 |
|TEECR 0x0 0 MIDR_EL1 0x412fc0f1 1093648625 CTR 0x8444c004 -2075869180 |
|TCMTR 0x0 0 TTBR0_EL1 0x0 0 PMCCNTR 0x0 0 |
|TLBTR 0x0 0 TTBR1_EL1 0x0 0 MIDR 0x412fc0f1 1093648625 |
|TTBCR 0x0 0 MPIDR_EL1 0x80000000 -2147483648 TTBCR2 0x0 0 |
|REVIDR_EL1 0x0 0 MIDR 0x412fc0f1 1093648625 JIDR 0x0 0 |
|CLIDR 0xa200023 169869347 DFAR 0x0 0 WFAR 0x0 0 |
|IFAR 0x0 0 JMCR 0x0 0 AIDR 0x0 0 |
|CSSELR 0x0 0 ID_PFR2 0x10 16 VBAR 0x0 0 |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 0x10074 <_start> mov r7, #4 |
| 0x10078 <_start+4> mov r0, #1 |
| 0x1007c <_start+8> ldr r1, [pc, #96] ; 0x100e4 <exit_normally+12> |
| 0x10080 <_start+12> mov r2, #21 |
| 0x10084 <_start+16> svc 0x00000000 |
| 0x10088 <_start+20> mov r3, #0 |
| 0x1008c <print_counter> mov r7, #4 |
| 0x10090 <print_counter+4> mov r0, #1 |
| 0x10094 <print_counter+8> mov r1, #48 ; 0x30 |
| 0x10098 <print_counter+12> add r1, r1, r3 |
| 0x1009c <print_counter+16> orr r1, r1, #2560 ; 0xa00 |
| > 0x100a0 <print_counter+20> push {r1} ; (str r1, [sp, #-4]!) |
Notice the value of the sp register before the push:
sp 0x408009c0
Now after the push:
sp 0x408009bc
This is 4 bytes lower than the previous address. Lets examine the data at that address:
(gdb) x /4xb 0x408009bc
0x408009bc: 0x30 0x0a 0x00 0x00
We can see the value of r1 is now at that address.
The instruction mov r1, sp will store that address in r1 to pass to the write syscall.
Let's now examine the cmp instruction at 0x100b4:
> 0x100b4 <print_counter+40> cmp r3, #9
(gdb) info registers cpsr
cpsr 0x10 16
(gdb) si
0x000100b8 in print_counter ()
(gdb) info registers cpsr
cpsr 0x80000010 -2147483632
As we step through the instruction, we can see the value of the flags in cpsr change. What flags are now set? We could print the value in binary with:
(gdb) print/t 0x80000010
$1 = 10000000000000000000000000010000
This is still difficult to read and determine what flags are set. We know that the Z, N, C, and V flags are set by the cmp instruction, so let's format the output to show those flags.
Enter the following script to show a formatted output for the flags:
printf "N=%d Z=%d C=%d V=%d\n", (($cpsr & (1 << 31)) != 0), (($cpsr & (1 << 30)) != 0), (($cpsr & (1 << 29)) != 0), (($cpsr & (1 << 28)) != 0)
This is a script in C-style code which takes the cpsr register value performs a bitwise and operation on a bit that is shifted left
to the position of the corresponding flag bit, if the bit is set, the statement will be non-zero and evaluate true, which will print a 1. We can see from this script that the negative bit was set by the comparison, because 0 - 9 = -9 which is negative.
N=1 Z=0 C=0 V=0
This would be lengthy to type out every time we want to check those flags, so lets open a text editor and save the script as cpsr_cmp.gdb
We can now run the script inside gdb by entering:
source cpsr_cmp.gdb
This is assuming you placed it in the same path as the current executable. Otherwise you must use the path to the script.
We ant to iterate through our loop, but we don't want to manually step through every instruction over and over again.
We can automate this process by using another script. First, we will set a break point at the end of our loop with:
(gdb) break *0x100c0
Breakpoint 1 at 0x100c0
The * character lets gdb know that the value is a memory address and not a label name. Enter:
continue
To skip down to our break point. Now we can write our script.
Enter:
(gdb) set $count = 0
(gdb) while $count < 8
>source cpsr_cmp.gdb
>continue
>set $count = $count +1
>end
We just wrote a while loop to debug our while loop. Your output should be similar to this (note you may have to enter ctrl + l to re-draw your screen):
|-Register group: general------------------------------------------------------------------------------------------------------------------------------------------------|
|r0 0x2 2 r1 0x408009bc 1082132924 |
|r2 0x2 2 r3 0x9 9 |
|r4 0x0 0 r5 0x0 0 |
|r6 0x0 0 r7 0x4 4 |
|r8 0x0 0 r9 0x0 0 |
|r10 0x200ec 131308 r11 0x0 0 |
|r12 0x0 0 sp 0x408009c0 0x408009c0 |
|lr 0x0 0 pc 0x100c0 0x100c0 <print_counter+52> |
|cpsr 0x80000010 -2147483632 fpscr 0x0 0 |
|fpsid 0x410430f0 1090793712 fpexc 0x40000000 1073741824 |
|AFSR0_EL1 0x0 0 AFSR1_EL1 0x0 0 |
|DBGDIDR 0x3515f021 890630177 DBGDSAR 0x0 0 |
|DBGBVR 0x0 0 DBGBCR 0x0 0 |
|DBGWVR 0x0 0 DBGWCR 0x0 0 |
|PAR 0x0 0 DBGBVR 0x0 0 |
|DBGBCR 0x0 0 DBGWVR 0x0 0 |
|DBGWCR 0x0 0 TEECR 0x0 0 |
|MIDR_EL1 0x412fc0f1 1093648625 CTR 0x8444c004 -2075869180 |
|TCMTR 0x0 0 TTBR0_EL1 0x0 0 |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 0x10098 <print_counter+12> add r1, r1, r3 |
| 0x1009c <print_counter+16> orr r1, r1, #2560 ; 0xa00 |
| 0x100a0 <print_counter+20> push {r1} ; (str r1, [sp, #-4]!) |
| 0x100a4 <print_counter+24> mov r1, sp |
| 0x100a8 <print_counter+28> mov r2, #2 |
| 0x100ac <print_counter+32> svc 0x00000000 |
| 0x100b0 <print_counter+36> add sp, sp, #4 |
| 0x100b4 <print_counter+40> cmp r3, #9 |
| 0x100b8 <print_counter+44> bge 0x100c4 <print_end_msg> |
| 0x100bc <print_counter+48> add r3, r3, #1 |
|B+> 0x100c0 <print_counter+52> b 0x1008c <print_counter> |
| 0x100c4 <print_end_msg> mov r7, #4 |
| 0x100c8 <print_end_msg+4> mov r0, #1 |
| 0x100cc <print_end_msg+8> ldr r1, [pc, #20] ; 0x100e8 <exit_normally+16> |
| 0x100d0 <print_end_msg+12> mov r2, #12 |
| 0x100d4 <print_end_msg+16> svc 0x00000000 |
| 0x100d8 <exit_normally> mov r7, #1 |
| 0x100dc <exit_normally+4> mov r0, #0 |
| 0x100e0 <exit_normally+8> svc 0x00000000 |
| 0x100e4 <exit_normally+12> andeq r0, r2, r12, ror #1 |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
remote Thread 1.39901 In: print_counter L?? PC: 0x100c0
N=1 Z=0 C=0 V=0
Breakpoint 1, 0x000100c0 in print_counter ()
N=1 Z=0 C=0 V=0
Breakpoint 1, 0x000100c0 in print_counter ()
N=1 Z=0 C=0 V=0
Breakpoint 1, 0x000100c0 in print_counter ()
N=1 Z=0 C=0 V=0
Breakpoint 1, 0x000100c0 in print_counter ()
N=1 Z=0 C=0 V=0
Breakpoint 1, 0x000100c0 in print_counter ()
N=1 Z=0 C=0 V=0
Breakpoint 1, 0x000100c0 in print_counter ()
N=1 Z=0 C=0 V=0
Breakpoint 1, 0x000100c0 in print_counter ()
(gdb)
Notice we are now on what should be the last iteration of the loop.
Let's advance to the bge instruction with:
advance *0x100b8
Let's look at what flags were set with our cmp instruction:
(gdb) source cpsr_cmp.gdb
N=0 Z=1 C=1 V=0
Notice that the zero bit is now set and the negative bit is no longer set. The zero bit is set because r3 was equal to 9.
The negative bit is not set because the result of 9 - 9 isn't negative.
Both of these condititions should cause our branch condition to be met.
Let's test this branch by setting a break point where we should jump to:
(gdb) break *0x100c4
Breakpoint 2 at 0x100c4
Enter continue to advance the program to the next break:
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 0x1008c <print_counter> mov r7, #4 |
| 0x10090 <print_counter+4> mov r0, #1 |
| 0x10094 <print_counter+8> mov r1, #48 ; 0x30 |
| 0x10098 <print_counter+12> add r1, r1, r3 |
| 0x1009c <print_counter+16> orr r1, r1, #2560 ; 0xa00 |
| 0x100a0 <print_counter+20> push {r1} ; (str r1, [sp, #-4]!) |
| 0x100a4 <print_counter+24> mov r1, sp |
| 0x100a8 <print_counter+28> mov r2, #2 |
| 0x100ac <print_counter+32> svc 0x00000000 |
| 0x100b0 <print_counter+36> add sp, sp, #4 |
| 0x100b4 <print_counter+40> cmp r3, #9 |
| 0x100b8 <print_counter+44> bge 0x100c4 <print_end_msg> |
| 0x100bc <print_counter+48> add r3, r3, #1 |
|B+ 0x100c0 <print_counter+52> b 0x1008c <print_counter> |
|B+> 0x100c4 <print_end_msg> mov r7, #4 |
| 0x100c8 <print_end_msg+4> mov r0, #1 |
| 0x100cc <print_end_msg+8> ldr r1, [pc, #20] ; 0x100e8 <exit_normally+16> |
| 0x100d0 <print_end_msg+12> mov r2, #12 |
| 0x100d4 <print_end_msg+16> svc 0x00000000 |
| 0x100d8 <exit_normally> mov r7, #1 |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
remote Thread 1.40929 In: print_end_msg L?? PC: 0x100c4
N=1 Z=0 C=0 V=0
Breakpoint 1, 0x000100c0 in print_counter ()
N=1 Z=0 C=0 V=0
Breakpoint 1, 0x000100c0 in print_counter ()
N=1 Z=0 C=0 V=0
Breakpoint 1, 0x000100c0 in print_counter ()
(gdb) si
0x0001008c in print_counter ()
(gdb) advance *0x100b8
0x000100b8 in print_counter ()
(gdb) source cpsr_cmp.gdb
N=0 Z=1 C=1 V=0
(gdb) break *0x100c4
Breakpoint 2 at 0x100c4
(gdb) continue
Continuing.
Breakpoint 2, 0x000100c4 in print_end_msg ()
(gdb)
Notice that our program had two breakpoints set, one at 0x100c0 which would branch back to the start of our loop, and another at 0x100c4 which will continue the rest of the program.
Our condition to branch was met by both the zero bit being set to 1 and the negative bit being set to 0, so execution moved to 0x100c4.
Enter continue one last time to finish executing the remainder of the program:
(gdb) continue
Continuing.
[Inferior 1 (process 1) exited normally]
(gdb)
ARM Loop Exercises
Exercise 1.
Find the other flag bit that is set in the cpsr. Why is it set?Exercise 2.
Write a gdb script that prints all of the cpsr flags in the format of the cpsr_cmp script.Exercise 3:
Re-write the loop to allow for more than 10 iterations while printing the correct iteration number.ARM ABI and Calling Convention
What is an ABI?
An Application Binary Interface (ABI) is a hardware-level interface used between software executables.ABIs are similar to APIs, in that an API is a source code level interface between source code,
but while APIs are high-level and hardware indepedent. ABIs are low-level and hardware dependent.
ABI's determine:
- How to pass arguments to a function
- How to pass a function's return
- What register's must be preserved and what registers can be be clobbered (over-written)
- How data is organized in memory
- How system calls are performed
We will reference the The Proceedure Call Standard for ARM32 from the ARM32 ABI to write our next program.
This document defines the following calling convention:
- The registers r4-r8, r10, and r11 are used to hold the values of local variables
- Registers r12-r15 have special roles: IP, SP, LR, and PC
- A subroutine must preserve the contents of the registers r4-r8, r10, r11 and SP
- The first four registers r0-r3 (a1-a4) are used to pass argument values into a subroutine and to return values
- r0-r3 may also be used to hold intermediate values within a routine (but, in general, only between subroutine calls)
Register | Synonym | Special | Role in the procedure call standard | |
---|---|---|---|---|
r15 | PC | The Program Counter. | ||
r14 | LR | The Link Register. | ||
r13 | SP | The Stack Pointer. | ||
r12 | IP | The Intra-Procedure-call scratch register. | ||
r11 | v8 | FP | Frame Pointer or Variable-register 8. | |
r10 | v7 | Vairable-register 7. | ||
r9 | v6 | SB TR |
Platform register or Variable-register 6. The meaning of this register is defined by the platform standard. |
|
r8 | v5 | Variable-register 5. | ||
r7 | v4 | Variable-register 4. | ||
r6 | v3 | Variable-register 3. | ||
r5 | v2 | Variable-register 2. | ||
r4 | v1 | a4 | Argument / scratch register 4. | |
r2 | a3 | Argument / result / scratch register 2. | ||
r0 | a1 | Argument / result / scratch register 1. |
ARM Functions and User Input
It's time to learn more about stack management in ARM assembly by creating a program with function calls and user input.
Examine the following source code:
Examine the following source code:
.section .rodata
// Linux Syscall constants
STDIN = 0x00
STDOUT = 0x01
EXIT = 0x01
READ = 0x03
WRITE = 0x04
ERR_INVALID_INPUT = 0x01
ERR_BUFF_OVERFLOW = 0x02
// Valid ACII values for decimal numbers
b_MIN_ASCII = 0x30
b_MAX_ASCII = 0x39
// Termination character
b_NEWLINE = 0x0a
// Input buffer size
b_BUFFER_SIZE = 0x08
.section .data
// String variables used to prompt user and show output
first_num_msg:
.ascii "Enter the first number to add: "
len_first_num_msg = ( . - first_num_msg)
second_num_msg:
.ascii "Enter the second number to add: "
len_second_num_msg = ( . - second_num_msg)
sum_msg:
.ascii "The sum is: "
len_sum_msg = ( . - sum_msg)
invalid_msg:
.ascii "ERROR: Invalid input detected\n"
len_invalid_msg = ( . - invalid_msg)
overflow_msg:
.ascii "ERROR: Buffer overflow detected\n"
len_overflow_msg = ( . - overflow_msg)
.section .bss
/* The block started by symbol (bss) section stores unitialized variables.
They will be zero initialized in memory, so we will start with clean buffers */
first_number_buffer:
.skip b_BUFFER_SIZE
second_number_buffer:
.skip b_BUFFER_SIZE
.section .text
.global _start
_start:
/*
Prompt user to enter integers numbers, add them, and print the result
*/
movw r4, #0xbeef // Load the lower-half (16-bits) of r4
movt r4, #0xdead // Load the upper-half of r4
movw r5, #0xbabe
movt r5, #0xdeed
movw r6, #0xface
movt r6, #0xcafe
movw r7, #0xdeaf
movt r7, #0xfade
movw r8, #0xbabe
movt r8, #0xbead
movw r9, #0xface
movt r9, #0xdeaf
movw r10, #0xbade
movt r10, #0xcade
/*
The above instructions set all the variable registers
this is done only for demonstration purposes to provide
easy data to view on the stack when debugging.
The movw and movt instructions are used because ARM32 cannot load
some 32-bit constants into registers with a single instruction,
so you must load the lower and upper haves separately.
*/
prompt_for_first_number:
ldr r7, =WRITE
ldr r0, =STDOUT
ldr r1, =first_num_msg
ldr r2, =len_first_num_msg
svc #0
get_first_number:
ldr r0, =first_number_buffer
ldr r1, =b_BUFFER_SIZE
bl get_number // r0: buffer address r1: buffer length --> r0: unsigned integer
push {r0} // Save the first number to the stack
prompt_for_second_number:
ldr r7, =WRITE
ldr r0, =STDOUT
ldr r1, =second_num_msg
ldr r2, =len_second_num_msg
svc #0
get_second_number:
ldr r0, =second_number_buffer
ldr r1, =b_BUFFER_SIZE
bl get_number
push {r0} // Save the second number to the stack
print_sum_msg:
ldr r7, =WRITE
ldr r0, =STDOUT
ldr r1, =sum_msg
ldr r2, =len_sum_msg
svc #0
pop {r0, r1} // Pop both numbers off the stack
bl print_sum // r0: unsigned integer r1: unsigned integer --> void
exit_normally:
ldr r7, =EXIT
mov r0, #0
svc #0
exit_with_invalid_error:
ldr r7, =WRITE
ldr r0, =STDOUT
ldr r1, =invalid_msg
ldr r2, =len_invalid_msg
svc #0
ldr r7, =EXIT
ldr r0, =ERR_INVALID_INPUT
svc #0
exit_with_overflow_error:
ldr r7, =WRITE
ldr r0, =STDOUT
ldr r1, =overflow_msg
ldr r2, =len_overflow_msg
svc #0
ldr r7, =EXIT
ldr r0, =ERR_BUFF_OVERFLOW
svc #0
get_number:
/*
purpose:
read a natural number from user input
usage:
arg0 (r0) the memory address to store ASCII input from STDIN
arg1 (r1) the size of the memory buffer to store the input
returns:
r0: 32-bit positive integer
error handling: Invalid input will result in no return and a program exit with error code 0x1
Only characters 0123456789 \n (0x0a) and (0x00) are valid
*/
push {fp, lr} // Preserve the caller's frame pointer and link register (previous pc)
mov fp, sp // Set the frame pointer to the current stack pointer
push {r4-r10} // Preserve the caller's variable registers
// r1 and r0 are scratch registers that will get cloberred by the read syscall's arguments so we need to preserve them
push {r1} // Store r1 on the stack, which is the length of our buffer
push {r0} // Store r0 on the stack, which is the memory address we will save our input to
ldr r7, =READ
ldr r0, =STDIN
pop {r1} // r1=r0 from stack, this sets the address for the read syscall to our function's arg0 input
pop {r2} // r2=r1 from stack, this sets the size of the input buffer for the read syscall to our functions arg1 input
svc #0
cmp r0, r2 // Compare the number of bytes read to our buffer (r0) to the size of our buffer (r2)
bge if_newline_check // A full buffer should always end with a newline character
b endif_newline_check
if_newline_check:
sub r6, r2, #0x1 // The byte offset is 1 less than the length
ldrb r4, [r1, r6] // Load the last byte from the buffer
ldr r5, =b_NEWLINE
cmp r4, r5 // If the last character isn't a newline, then there was a buffer overflow
bne if_buffer_overflow_found
b endif_buffer_overflow_found
if_buffer_overflow_found:
b exit_with_overflow_error // Unconditional branch to exit the program with an overflow error
endif_buffer_overflow_found:
endif_newline_check:
// Buffer length was valid
mov r0, r1 // Move the buffer address into r0 to pass to validate_input
mov r1, r2 // Move the buffer length into r1 to pass to validate_input
bl validate_input // r0: buffer address, r1: buffer length --> r0: 0x0 is valid 0x1 is invalid
cmp r0, #0x1 // Test for invalid flag
beq invalid_number // branch to invalid number error handling
valid_number:
mov r0, r1 // validate_input passes any valid number back in r1, get number passes that back in r0
pop {r4-r10} // This restores the original values of r4-r11 from the stack
pop {fp} // Restore the previous fp to the current fp
pop {pc} // This sets the pc to the lr value, so that execution resumes where this function was called from in the caller's function
invalid_number:
pop {r4-r10, fp}
b exit_with_invalid_error // Unconditional branch to exit the program with an invalid input error
validate_input:
/*
parameters:
arg0 (r0) the memory address to validate ASCII decimal input from
arg1 (r1) the size the input memory address buffer
returns:
r0: 0x0 for valid decimal number,
0x1 for invalid decimal number
r1: Unchanged if number was invalid,
the value of the number if the number was valid
*/
push {fp, lr} // Preserve the callers fp and pc
mov fp, sp // Set the frame pointer to the current stack pointer
push {r4-r10} // Preserve the caller's variable registers
/*
Register use:
r0: buffer address passed to function
*/
mov r3, #0x0 // Loop counter for each byte stored in the input buffer
mov r6, #0x0 // This will hold a flag which indicates a terminating character was found
// We need to check that all characters are valid decimal characters, and count them
validate_loop:
ldrb r4, [r0] // Load one byte from the buffer memory location
// Newline is a valid termination
ldr r5, =b_NEWLINE
cmp r4, r5
moveq r6, #0x1 // Flag the terminiation character if the comparison was equal
beq valid
ldr r5, =b_MAX_ASCII
// If character is greater than b_MAX_ASCII then it is invalid
cmp r4, r5
bgt invalid
ldr r5, =b_MIN_ASCII
// If character is less than b_MIN_ASCII then it is invalid
cmp r4, r5
blt invalid
add r3, r3, #0x01 // Increment counter by 1
cmp r3, r1 // Check if we have looped through all the characters
bge end_validate_loop // End loop
add r0, r0, #0x01 // Increment the memory buffer address by 1
b validate_loop // Continue loop
end_validate_loop:
valid:
convert_to_decimal:
/*
If all characters were valid, we can convert them to a decimal value.
Register use:
r0: buffer address passed to function
r1: length of buffer passed to function
r3: counter (starting with actual length)
r4: current character from buffer
r5: min ASCII value
r6: running total
r7: exponent
r8: base
r9: product of base and exponent
r10: temp var
*/
cmp r6, #0x1
beq if_terminating_char
b endif_terminating_char
if_terminating_char:
// Check if a number wasn't entered and only enter was pressed
cmp r3, #0x0
beq empty
sub r0, r0, #0x1 // Point to the previous character before the newline
endif_terminating_char:
mov r1, r3 // Clobber r1 with the actual length of our number string
ldr r5, =b_MIN_ASCII // Reset r5 to the minimum ASCII value
mov r6, #0 // Reset r6 to 0 for the running total
mov r8, #10 // Set r8 to base 10
mov r9, #1 // Set r9 to 1 for the first exponent multiplication
// The first digit doesn't need to be multiplied by the base and exponent
ldrb r4, [r0]
sub r4, r4, r5 // subtract b_MIN_ASCII value from the current character to get the decimal digit
add r6, r6, r4 // Add the decimal digit to the running total
sub r3, r3, #1 // Decrement our counter by one
sub r0, r0, #1 // Decrement our buffer address by one
digit_loop:
cmp r3, #0
ble end_digit_loop
ldrb r4, [r0] // Load one character from the current memory position from the buffer
sub r4, r4, r5 // subtract b_MIN_ASCII value from the current character to get the decimal digit
sub r10, r1, r3 // Get the exponent value
mov r7, r10
// Exponent loop
exponent_loop:
mul r10, r9, r8 // Find the product of the exponent and base
mov r9, r10
sub r7, r7, #1 // Decrement the exponent counter
cmp r7, #0 // Our exponent will increase for each digit
ble end_exponent_loop
b exponent_loop
end_exponent_loop:
mul r10, r4, r9 // The new digit value is the product of the exponent product and the digit
add r6, r6, r10 // Add the digit to the running total
mov r9, #1 // Reset r9 to 1
sub r3, r3, #1 // Decrement our counter by one
sub r0, r0, #1 // Decrement our buffer address by one
b digit_loop
end_digit_loop:
mov r0, #0x0 // Return code of 0 indicates a valid number
mov r1, r6 // The final running total is passed back as the number
pop {r4-r10} // Restore variable registers
pop {fp, pc} // Restore variable fp and resume execution from lr address
invalid:
mov r0, #0x1 // Return code of 1 indicates an invalid number
pop {r4-r10} // Restore variable registers
pop {fp, pc} // Restore variable fp and resume execution from lr address
empty: // The user entered an empty number
mov r0, #0x0 // It is valid, but equivalent to zero
mov r1, #0x0
pop {r4-r10} // Restore variable registers
pop {fp, pc} // Restore variable fp and resume execution from lr address
print_sum:
/*
parameters:
arg0 (r0) the first number to add
arg1 (r1) the second number to add
returns:
void
*/
push {fp, lr} // Preserve the callers fp and pc
mov fp, sp // Set the frame pointer to the current stack pointer
push {r4-r10} // Preserve the caller's variable registers
add r0, r0, r1 // Adds both numbers and clobbers r0 with the sum
/*
Variable registers:
r4: counter
r5: b_MIN_ASCII / b_NEWLINE
r6: divisor / newline flag
r7: quotient
r8: divisor * quotient product
r9: remainder/decimal digit/null pad
r10: the base address of our string on the stack
*/
sub sp, sp, #0x0c // Make room on the stack, 12 bytes can hold 10 characters for a 32-bit integer
mov r10, sp // Store the base address of our string
mov r4, #0x0b // We are storing little endian, so we need to start at the end of the stack for our loop
ldr r5, =b_NEWLINE
mov r6, #10 // Set the divisor to 10
// Store a newline which will be read last for little endian
strb r5, [sp, r4] // Store the newline character on the stack
sub r4, r4, #0x1 // Decrement our counter to reflect writing the newline character
ldr r5, =b_MIN_ASCII
digit_to_ASCII_loop:
cmp r0, #0x0
bne if_more_digits
else_no_more_digits: // Null pad the rest of the string
mov r5, #0x00
strb r5, [sp, r4]
b endif_more_digits
if_more_digits:
sdiv r7, r0, r6 // Divide the hex number by 10
mul r8, r7, r6 // Multiply the quotient by 10
sub r9, r0, r8 // Find the remainder
add r9, r9, r5 // Add b_MIN_ASCII to the remainder to convert it to the ASCII decimal
strb r9, [sp, r4] // Store the ASCII character on the stack
mov r0, r7 // Overwrite the original number with the quotient
endif_more_digits:
sub r4, r4, #0x1
cmp r4, #0x0
bge digit_to_ASCII_loop
print_sum_syscall:
ldr r7, =WRITE
ldr r0, =STDOUT
mov r1, r10 // Set the write address to the base of our string
mov r2, #0x0c
svc #00000000
add sp, sp, #0x0c // Move the stack pointer back
pop {r4-r10, fp, pc} // Restore the stack and return to the caller