ARM Assembly

ARM Debugging in GDB Part 1
QEMU has an option that allows GDB to connect to it over a network socket. To run our program in QEMU as a GDB server enter:
qemu-arm -g 2345 hello_arm32 &
This will launch our program in the background with QEMU and bind to port 2345. Port 2345 is an arbitrary and can be changed to whatever you want to bind to.

Once QEMU is running, we will launch GDB for multiarchitectures, open our binary as a template, and then connect to the running process in QEMU. To do so enter:
$gdb-multiarch
(gdb) file hello_arm32
(gdb) target remote localhost:2345

You should see an output similar to the following:
pete@framework16:~/Documents/ASM/hello_world/ARM32$ gdb-multiarch
GNU gdb (Ubuntu 12.1-0ubuntu1~22.04) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
		<http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word".
(gdb) file hello_arm32
Reading symbols from hello_arm32...
(No debugging symbols found in hello_arm32)
(gdb) target remote localhost:2345
Remote debugging using localhost:2345
0x00010074 in _start ()
(gdb)
Notice that we do not need to set a break point and run the program, because QEMU has already set a break at the _start label and executed it.

We can now open our layouts with:
lay asm
lay reg
You should now have the familiar layout of registers, assembly, and commands.
Note, there will be no register values loaded as we haven't stepped into an instruction yet.
Let's examine our first instruction:
> 0x10074 <_start>                mov     r7, #4
The assembly source was ldr r7, =b_WRITE, but because b_WRITE was a constant value, the assembler translated this to just moving its immediate value into the register.
The next instruction is the same as the first, so let's step into our instructions until we reach the third line:
|-Register group: general------------------------------------------------------------------------------------------------------------------------------------------------|
|r0             0x1                 1                    r1             0x40800b39          1082133305           r2             0x0                 0                    |
|r3             0x0                 0                    r4             0x0                 0                    r5             0x0                 0                    |
|r6             0x0                 0                    r7             0x4                 4                    r8             0x0                 0                    |
|r9             0x0                 0                    r10            0x200bc             131260               r11            0x0                 0                    |
|r12            0x0                 0                    sp             0x408009d0          0x408009d0           lr             0x0                 0                    |
|pc             0x1007c             0x1007c <_start+8>   cpsr           0x10                16                   fpscr          0x0                 0                    |
|fpsid          0x410430f0          1090793712           fpexc          0x40000000          1073741824           AFSR0_EL1      0x0                 0                    |
|AFSR1_EL1      0x0                 0                    DBGDIDR        0x3515f021          890630177            DBGDSAR        0x0                 0                    |
|DBGBVR         0x0                 0                    DBGBCR         0x0                 0                    DBGWVR         0x0                 0                    |
|DBGWCR         0x0                 0                    PAR            0x0                 0                    DBGBVR         0x0                 0                    |
|DBGBCR         0x0                 0                    DBGWVR         0x0                 0                    DBGWCR         0x0                 0                    |
|TEECR          0x0                 0                    MIDR_EL1       0x412fc0f1          1093648625           CTR            0x8444c004          -2075869180          |
|TCMTR          0x0                 0                    TTBR0_EL1      0x0                 0                    PMCCNTR        0x0                 0                    |
|TLBTR          0x0                 0                    TTBR1_EL1      0x0                 0                    MIDR           0x412fc0f1          1093648625           |
|TTBCR          0x0                 0                    MPIDR_EL1      0x80000000          -2147483648          TTBCR2         0x0                 0                    |
|REVIDR_EL1     0x0                 0                    MIDR           0x412fc0f1          1093648625           JIDR           0x0                 0                    |
|CLIDR          0xa200023           169869347            DFAR           0x0                 0                    WFAR           0x0                 0                    |
|IFAR           0x0                 0                    JMCR           0x0                 0                    AIDR           0x0                 0                    |
|CSSELR         0x0                 0                    ID_PFR2        0x10                16                   VBAR           0x0                 0                    |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|    0x10074 <_start>                mov     r7, #4                                                                                                                      |
|    0x10078 <_start+4>              mov     r0, #1                                                                                                                      |
|  > 0x1007c <_start+8>              add     r1, pc, #36     ; 0x24                                                                                                      |
|    0x10080 <_start+12>             ldr     r2, [pc, #44]   ; 0x100b4 <start_msg+12>                                                                                    |
|    0x10084 <_start+16>             svc     0x00000000                                                                                                                  |
|    0x10088 <write_hello_msg>       mov     r7, #4                                                                                                                      |
|    0x1008c <write_hello_msg+4>     ldr     r1, [pc, #36]   ; 0x100b8 <start_msg+16>                                                                                    |
|    0x10090 <write_hello_msg+8>     mov     r0, #1                                                                                                                      |
|    0x10094 <write_hello_msg+12>    mov     r2, #13                                                                                                                     |
|    0x10098 <write_hello_msg+16>    svc     0x00000000                                                                                                                  |
|    0x1009c <exit_normally>         mov     r7, #1                                                                                                                      |
|    0x100a0 <exit_normally+4>       mov     r0, #0                                                                                                                      |
|    0x100a4 <exit_normally+8>       svc     0x00000000                                                                                                                  |
|    0x100a8 <start_msg>                             ; <UNDEFINED> instruction: 0x73207441                                                                               |
|    0x100ac <start_msg+4>           ldrbtvc r6, [r2], #-372 ; 0xfffffe8c                                                                                                |
|    0x100b0 <start_msg+8>           andeq   r0, r0, r10                                                                                                                 |
|    0x100b4 <start_msg+12>          andeq   r0, r0, r9                                                                                                                  |
|    0x100b8 <start_msg+16>          strheq  r0, [r2], -r12                                                                                                              |
|    0x100bc                         cfstr64vs       mvdx6, [r12], #-288     ; 0xfffffee0                                                                                |
|    0x100c0                         svcvs   0x0057206f                                                                                                                  |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
remote Thread 1.383292 In: _start                                                                                                                       L??   PC: 0x1007c
(gdb) lay reg
(gdb) si
0x00010078 in _start ()
(gdb) si
0x0001007c in _start ()
(gdb)
Our instruction adr r1, start_msg has been translated to add r1, pc, #36

The add instruction takes the destination register to store the result, and the two arguments to add. In this instance, the immediate value #36 is being added to the pc (program counter) register's value.
This is where things can be confusing. While GDB lists the pc register currently as 0x1007c, ARM's pc register actually stays two instructions ahead of the program, and since these are 32-bit instructions, the value of the pc register will actually be 8 bytes more than our current line (32 bits * 2 = 64 bits = 8 bytes).

While you would expect add r1,pc, #36 to store, 0x100a0 in the register, if we step forward one instruction:
r1             0x100a8             65704                
We see that 0x100a8 is in fact stored in r1.

If we look further down our assembly layout, we can see that it is our start_msg label:
0x100a8 <start_msg>                             ; <UNDEFINED> instruction: 0x73207441
Notice that the disassembler is attempting to interpret the data as instructions, this is because it resides in the .text section with our code, but it does not contain valid assembly instructions.