← All docs

ARM64 Assembly

Introduction to ARM64 assembly for compiler output.

What is assembly language?

Assembly is the thinnest possible layer above raw machine code. Each assembly instruction maps (almost) 1-to-1 to a single CPU operation. Where a high-level language says $x = $a + $b, assembly says:

ldr x1, [x29, #-16]     ; load $a from the stack into register x1
ldr x2, [x29, #-24]     ; load $b from the stack into register x2
add x0, x1, x2          ; add them, put the result in x0
str x0, [x29, #-32]     ; store the result back to the stack ($x)

Every line is one operation. The CPU reads them sequentially (unless a branch instruction says otherwise).

What is ARM64?

ARM64 (also called AArch64) is the instruction set used by Apple Silicon chips (M1, M2, M3, M4) and most modern smartphones. It’s a RISC (Reduced Instruction Set Computer) architecture — instructions are simple, fixed-size (4 bytes each), and uniform.

Compare with x86-64 (Intel/AMD), which is CISC (Complex Instruction Set) — variable-length instructions with many special cases. ARM64 is cleaner and easier to learn.

Registers: the CPU’s variables

A register is a tiny, ultra-fast storage location inside the CPU. ARM64 has 31 general-purpose registers, each 64 bits (8 bytes) wide:

General-purpose registers

RegisterConventionelephc usage
x0-x7Function arguments and return valuesArguments passed to/from functions. x0 = integer/bool result
x8Indirect resultScratch register
x9-x15Temporary (caller-saved)Scratch for intermediate computations
x16-x17Intra-procedure scratchx16 = syscall number on macOS
x29Frame pointer (FP)Points to current function’s stack frame
x30Link register (LR)Return address (where to go after ret)
spStack pointerTop of the stack (grows downward)

You can also use w0-w30 to access only the lower 32 bits of each register (useful for byte operations like strb w12, [x9]).

Floating-point registers

RegisterUsage
d0-d7Float arguments and return values
d8-d15Callee-saved (preserved across function calls)
d16-d31Temporary

In elephc, d0 holds float results. Float arguments to functions use d0-d7.

How elephc uses registers

elephc follows a simple convention (see The Code Generator for details):

Integer/Bool result  → x0
Float result         → d0
String result        → x1 (pointer to bytes), x2 (length)
Array result         → x0 (pointer to heap-allocated header)

The stack: function-local storage

The stack is a region of memory that grows downward (from high addresses to low addresses). Each function call creates a stack frame — a block of memory for that function’s local variables.

High addresses
┌─────────────────────┐
│  caller's frame      │
├─────────────────────┤ ← x29 (frame pointer) points here
│  saved x29, x30      │  (16 bytes: frame pointer + return address)
├─────────────────────┤
│  local variable 1    │  [x29, #-8]
│  local variable 2    │  [x29, #-16]
│  local variable 3    │  [x29, #-24]
│  ...                 │
├─────────────────────┤ ← sp (stack pointer) points here
Low addresses

Key concepts:

  • sp (stack pointer) marks the current top of the stack. You allocate space by subtracting from sp.
  • x29 (frame pointer) marks the base of the current frame. Local variables are accessed at negative offsets from x29.
  • x30 (link register) holds the return address — where the CPU should jump when the function finishes.

Function prologue and epilogue

Every function starts with a prologue (set up the frame) and ends with an epilogue (tear it down):

; Prologue
sub sp, sp, #48          ; allocate 48 bytes on the stack
stp x29, x30, [sp, #32] ; save old frame pointer and return address
add x29, sp, #32        ; set new frame pointer

; ... function body ...

; Epilogue
ldp x29, x30, [sp, #32] ; restore frame pointer and return address
add sp, sp, #48          ; deallocate stack space
ret                      ; jump to address in x30

This is what elephc generates for every function. See The Code Generator for the full details.

Memory: load and store

ARM64 is a load/store architecture. You can’t operate directly on memory — you must load values into registers first, operate on them, then store results back:

ldr x0, [x29, #-8]      ; LOAD: read 8 bytes from stack into x0
add x0, x0, #1           ; OPERATE: add 1 to x0
str x0, [x29, #-8]       ; STORE: write x0 back to the stack

This is why $i++ in PHP becomes at least 3 instructions in assembly.

Addressing modes

SyntaxMeaningExample
[x29, #-16]Base + offsetLoad from 16 bytes below frame pointer
[x1]Base onlyLoad from address in x1
[x0, x1, lsl #3]Base + shifted indexArray access: base + (index × 8)

System calls: talking to the OS

The CPU can’t print to the screen or read files on its own — it needs to ask the operating system. On macOS ARM64, this is done with the svc (supervisor call) instruction:

mov x0, #1          ; file descriptor 1 = stdout
; x1 = pointer to string data (already set)
; x2 = string length (already set)
mov x16, #4         ; syscall number 4 = write
svc #0x80           ; invoke the kernel

This is how echo works in elephc — every echo ultimately becomes a write system call. See The Runtime for more details on how values are converted to strings before printing.

Branches: control flow

The CPU executes instructions sequentially unless a branch changes the flow:

InstructionMeaningUsed for
b labelUnconditional jumpelse blocks, loop back-edges
b.eq labelBranch if equalAfter cmp, for ==
b.ne labelBranch if not equalFor !=
b.lt labelBranch if less thanFor <
b.gt labelBranch if greater thanFor >
b.le labelBranch if less or equalFor <=
b.ge labelBranch if greater or equalFor >=
b.lo labelBranch if lower (unsigned)Heap / pointer lower-bound checks
b.hs labelBranch if higher or same (unsigned)Heap / pointer upper-bound checks
b.hi labelBranch if higher (unsigned)Unsigned range checks
b.ls labelBranch if lower or same (unsigned)Unsigned range checks
b.cs labelBranch if carry setFlag-setting arithmetic and unsigned carry checks
cbz x0, labelBranch if x0 is zeroif conditions (falsy check)
cbnz x0, labelBranch if x0 is not zeroLoop conditions
tbnz x0, #bit, labelTest bit and branch if non-zeroRuntime flag/tag checks
bl labelBranch with link (function call)Saves return address in x30
blr xNBranch with link to register (indirect call)Call function at address in register (used for closures)
br xNBranch to registerTail jumps / runtime dispatch without saving a return address
brk #0Breakpoint trapRuntime guard failures and hard traps
retReturn from functionJumps to address in x30

How an if becomes assembly

if ($x > 0) {
    echo "positive";
}

becomes (simplified):

ldr x0, [x29, #-8]      ; load $x
cmp x0, #0               ; compare $x with 0
b.le _end_if_1           ; if $x <= 0, skip the body
; ... emit "positive" ...
_end_if_1:

For local stack slots, elephc usually emits ldur / stur (or computes the address with sub first) rather than raw ldr / str with negative immediates. The simplified examples in this page focus on the control-flow shape rather than the exact helper sequence.

See ARM64 Instruction Reference for every instruction elephc uses, and The Code Generator for how each PHP construct maps to assembly.

Labels: named positions

Labels are names for positions in the code. They don’t generate instructions — they just mark addresses that branches can jump to:

_while_1:                ; ← this is a label
    ldr x0, [x29, #-8]
    cmp x0, #10
    b.ge _end_while_1   ; jump forward to end
    ; ... loop body ...
    b _while_1           ; jump back to start
_end_while_1:            ; ← another label

In elephc, labels are generated with a global counter to avoid collisions: _while_1, _while_2, _if_3, etc. See The Code Generator for how Context::next_label() works.

Data section: constants

The assembly output has two sections:

  • .text — executable code (instructions)
  • .data — read-only data (string literals, float constants)
.data
_str_0: .ascii "Hello, world!\n"    ; 14 bytes
_float_0: .quad 0x400921FB54442D18  ; 3.14159... stored as raw bits

.text
; ... code that references _str_0 and _float_0 ...

String literals are embedded directly in the binary. To use them, you load their address with adrp + add (see ARM64 Instruction Reference).