RISC-V References
RISC-V Assembly Language Overview
RISC-V is an open standard instruction set architecture (ISA) based on established reduced instruction set computer (RISC) principles. It is a simple, clean, and efficient assembly language. Assembly language is a human readable (mostly) form of the computer machine code (binary values). Ultimately, a computer only know how to execute machine code. We use an assembler (as) to translate assembly code into machine code, which is similar to how a C compiler generates machine code from C source code.
Instructions and Registers
When thinking about assembly or machine code we need to think about core elements of a computer and computer processor. These elements include registers, memory, and instructions. A processor can only operate directly on values in registers. Memory holds both code (machine code) and data (globals, locals on the stack, and the heap). A processor executes instruction by loading an instruction from memory, then decoding it, then executing it. If program needs to operate on data, we need to use instruction to load data from memory into registers. Once the data is loaded into register, we compute new register values and often write the new values back to memory.
RISC-V assembly language has a set of instructions that operate on a set of 32 registers. Each register is 64 bits wide in the 64-bit variant of the ISA. The registers are named x0
through x31
. However, the ABI names are more commonly used:
zero
(x0): Holds the constant value 0.ra
(x1): Return address.sp
(x2): Stack pointer.gp
(x3): Global pointer.tp
(x4): Thread pointer.t0-t6
(x5-x10): Temporary/alternate link registers.a0-a7
(x10-x17): Function arguments/return values.s0-s11
(x18-x31): Saved registers.t3-t6
(x28-x31): More temporary registers.
Instructions typically take the form opcode dst, src1, src2
, where opcode
is the operation to perform, and dst
, src1
, and src2
are the destination and source registers.
For example, to add the values in a1
and a2
and store the result in a0
, you would write:
add a0, a1, a2
Labels and .global Directives
Labels in RISC-V assembly provide a way to name a specific location in your code. This is useful for branching or jumping to different parts of your program. For example:
start: # This is a label
add a0, a0, a2
add a0, a1, a3
Instructions are executed one after the other unless we direct the processor to go to an instruction at a different location in memory (see below).
The .global
directive makes a label visible to the linker so it can be used from other files. This is typically used to define the entry point of your program:
.global foo
foo:
# Your code here
Types of Instructions
RISC-V instructions can be broadly categorized into three types: data processing, control, and memory.
Data Processing Instructions
These instructions are used to perform operations on data. Examples include add
(addition), addi
(add immediate), sub
(subtraction), mul
(multiplication), and div
(division). For example:
add a3, a1, a2 # a3 = a1 + a2
addi a3, a1, 10 # a3 = a1 + 10
sub a3, a1, a2 # a3 = a1 - a2
mul a3, a1, a2 # a3 = a1 * a2
div a3, a1, a2 # a3 = a1 / a2
We can load constant values into a register like this:
li t0, 9
addi t0, zero, 9
These both do the same thing.
Control Instructions
Control instructions are used to alter the flow of execution. Examples include j
(jump), beq
(branch if equal), bne
(branch if not equal), blt
(branch if less than), and bge
(branch if greater than or equal). For example:
j label # Jump to label
beq a1, a2, label # If a1 == a2, jump to label
bne a1, a2, label # If a1 != a2, jump to label
blt a1, a2, label # If a1 < a2, jump to label
bge a1, a2, label # If a1 >= a2, jump to label
Memory Instructions
Memory in RISC-V is byte-addressable, meaning each byte has a unique address. Larger values occupy multiple bytes. For example, a 64-bit word occupies 8 bytes.
Memory instructions are used to load and store data from and to memory. Examples include lw
(load word), sw
(store word), ld
(load doubleword), and sd
(store doubleword). For example:
lw a1, (a2) # Load word from memory at address in a2 into a1
sw a1, (a2) # Store word in a1 into memory at address in a2
ld a1, (a2) # Load doubleword from memory at address in a2 into a1
sd a1, (a2) # Store doubleword in a1 into memory at address in a2
You can add an offset to the memory instructions that can be added to the base address:
lw a1, 4(a2) # Load word from memory at address a2 + 4
Writing Functions
In RISC-V, functions are defined using labels. Here’s a simple function that adds two numbers:
add_numbers:
add a0, a0, a2 # a0 = a0 + a`
ret
This function takes its arguments in a0
and a1
, and returns the result in a0
.
RISC-V Function Call Conventions
When calling a function, arguments are passed in registers a0-a7
. The return address is stored in ra
, and the stack pointer in sp
must be preserved across function calls.
Registers a0-a7
, t0-t6
, and ra
are caller-saved, meaning if a function modifies these registers, it must save the old values on the stack and restore them before returning.
Registers s0-s11
and sp
are callee-saved, meaning if a function modifies these registers, it must save the old values on the stack and restore them before returning.
Here’s an example of a function that calls another function, following these conventions:
.global foo
foo:
addi a0, zero, 2 # First argument
addi a1, zero, 3 # Second argument
call add_numbers # Call function
# Result is now in a0
ret
add_numbers:
add a0, a0, a1 # a0 = a0 + a1
ret
Translating C to Assembly
Here’s how you might translate some common C constructs to RISC-V assembly:
If/Else
if (a == b) {
c = a;
} else {
c = b;
}
beq a0, a1, equal
add t0, zero, a1
j end
equal:
add t0, zero, a0
end:
For Loop
for (i = 0; i < 10; i++) {
a = a + i;
}
addi t0, zero, 0 # t0 = 0
addi t1, zero, 10 # t1 = 10
loop:
bge t0, t1, end # If i >= 10, jump to end
add a0, a0, t0 # a0 = a0 + t0
addi t0, t0, 1 # t0 = t0 + 1
j loop # Jump to start of loop
end:
Array Access
Here’s how you might implement array indexing in RISC-V assembly:
x = arr[i];
# a0 - int arr[]
# a1 - int i
li t0, 4
mul t0, a1, t0 # t0 = i * 4
add t0, a0, t0 # t0 = a0 + (i * 4)
lw t1, (t0) # x = arr[i]
Here is way to do the multiplication with a shift instead of multiply:
# a0 - int arr[]
# a1 - int i
slli t0, t0, 2 # t0 = t0 * 4
add t0, a0, t0 # t0 = a0 + (i * 4)
lw t1, (t0) # x = arr[i]