ARM Assembly

ARM Hello World
We are now ready to write a hello world program for ARM. We will build upon what we have already learned from our x86 hello world, and note the differences for GNU ARM assembly.

We will be using Linux syscall table references again. This time for ARM 32-bit.
.section .rodata 
/* 
The .rodata section will be stored as read-only in memory.
This section is included by GAS in the overall .data section, but it is flagged as read-only
*/
b_STDOUT = 0x01 /* This defines b_STDOUT as a byte sized constant with a value of 0x01 */
b_WRITE  = 0x04 /* This defines b_WRITE as a byte sized contant with avalue of 0x04  */

.section .data
	hello_msg: 
	.ascii "Hello World!\n" 
	end_hello_msg:

	len_hello_msg = (end_hello_msg - hello_msg)
	/*  
		This declares a variable len_hello_msg and assigns it the difference between 
		the end_hello_msg label address and the hello_msg label address.
		Parenthesis are not necessary in this instance, len_hello_msg = end_hello_msg - hello_msg would
		evaluate the same
	*/

	unused_label: 
	.hword 0xbeef
	/* 
		This label is here to illustrate how GAS stores label addresses for ARM assembly 
		that are not assigned during the program execution, vs. those that are. 
		Note: The size of a word depends on the processor architecture, for ARM32 a word
		is 32 bits (4 bytes), so to store 2 bytes of data, we use the half-word (.hword) directive
	*/

.section .text               
.global _start

_start:
/* 
	Write "At start" and "Hello World!" to stdout

	Write syscall reference:

	r7       r0 (arg0)         r1(arg1)          r2(arg2)
	0x04	   unsigned int fd	 const char *buf   size_t count
*/ 
	print_start_msg:

	ldr r7, =b_WRITE 
	/* 
		The load register (ldr) instruction is similar to the lea instruction for the x86 processor
		in that, it loads a calculated memory address or immediate value into a register. 

		Like the eax register for x86, r7 is used to determine the syscall function for ARM

		Using the = character with ldr is an ARM specific pseudo-instruction that
		specifies a symbol name which represents a constant value or an address. 
		The assembler will determine the type of value and modify the instruction 
		to either load a relative memory address or an immediate value.
		For this instruction, ldr will load an immediate value into r7 because
		b_STDOUT is a constant and not the label for a memory address.

		For more information on this instruction, refer to this reference:
		https://developer.arm.com/documentation/dui0041/c/Babbfdih
	*/
	
	ldr r0, =b_STDOUT /* Another constant value loaded for the FD */
	adr r1, start_msg 
	/*  
		Address (adr) loads the address of a label into a register. 
		The major functional difference between ldr and adr is that adr can only reference 
		memory locations inside the .text section of code, while ldr can resolve
		addresses and values from any section.

		While both ldr and adr could load addresses from labels in the .text section,
		adr is more efficient for this specific task and should be used for that purpose
	*/
	ldr r2, =len_start_msg /* This will resolve to the value of len_start_msg, and load it into r2 */

	svc #00000000
	/*
		When writing ARM assembly for GAS, the # character is use to prefix an immediate value assignment

		This SuperVisor (svc) call is similar to the int 0x80 call for the x86.
		It will initiate the execution of the syscall by calling a system interrupt. 
		svc creates an exception and passes the immediate value to the exception handler.

		In earlier versions of ARM svc was called swi (SoftWare Interrupt), but they effectively the same
	*/   

	write_hello_msg:
	
	ldr r7, =b_WRITE 
	ldr r0, =b_STDOUT
	ldr r1, =hello_msg
	/*
	For this instruction, hello_msg is a label for a memory address located in the .data section. 
	Using the ldr pseudo-operation, the assembler will create
	an immediate values at the end the .text section to store the label address in.
	It will then reference the memory location for that immediate value and assign it to r1. 
	It uses the Program Counter (PC) register as a base address and offsets from PC to the address.

	This is essentially what we did with the adr instruction, except the assembly is copying the
	address for the label in .data and placing it in the .text section to assign.
	*/  

	ldr r2, =len_hello_msg 
	svc #0x00000000      

	exit_normally:
	/*
		exit syscall reference:

		r7       r0 (arg0)         
		0x01	   int error_code
	*/
	mov r7, #0x00000001 /* Like with x86 assembly, mov can be used to load an immediate value into a register */ 
	mov r0, #0x00000000    
	svc #0x00000000      

	/* 
	The following section of code was added to show how you can also place
	variables and labels for data in the .text section after your code.
	They must be placed after your code, because they are not executable instructions.
	They should never be reached by your program's noraml execution or it will crash.
	*/
	start_msg:
	.ascii "At start\n"
	len_start_msg = . - start_msg
	/*
		In GAS, the . character is used to reference the current position in memory,
		so instead of creating the label end_start_msg and writing "len_start_msg = (end_start_msg - start_msg)"
		we can just write this as shorthand.
	*/
ARM Assemble Link and Run
Once you have copied the code and have thoroughly read through the comments, it is time to make it executable. GNU provides a cross-assembler for the ARM instruction set which is included in the gcc-arm-linux-gnueabihf package. To assemble the source, navigate to the directory where the source is saved and enter the command:
arm-linux-gnueabihf-as -o hello_arm32.o hello_arm32.asm
This command assumes that you saved the source file as hello_arm32.asm. It directs GAS to create a 32-bit object file named hello_arm32.o and to use the hello_arm32.asm file as an input source.
Enter the command:
arm-linux-gnueabihf-ld -o hello_arm32 hello_arm32.o
Now that the executable is created, we will use Quick Emulator (QEMU) to execute it natively on our x86 system. Enter the command:
qemu-arm hello_arm32
At start
Hello World!