What happens when you load into x0 on RISC-V?
February 17, 2021
A small thing of which I am irrationally proud: I was the proximate cause for the addition of a sentence to the RISC-V ISA spec.
Here’s the sentence:
Loads with a destination of x0 must still raise any exceptions and cause any other side effects even though the load value is discarded.
It’s OK if you have no idea what that means. You will soon.
Here’s the story.
Background
In the summer of 2016, I wrote most of the initial RISC-V Go compiler implementation. (Michael Pratt and Benjamin Barenblat worked on the assembler, linker, and runtime, and other people jumped in and ultimately completed the port.)
I was writing the first version of the RISC-V SSA lowering rules. Those rules turn a generic, architecture-independent description of Go code into a RISC-V-specific set of operations that ultimately get lowered into RISC-V instructions.
One of those lowering rules specified how to lower a nil check.
Nil checks in the Go compiler
Consider this code:
type T struct {
a [5000]byte // we'll explain this later
b bool
}
func f(t *T) {
_ = t.b
}
f
does almost nothing. But not nothing. f
evaluates t.b
for side-effects. If t
is nil, f
panics.
In the Go compiler, this is (unsurprisingly) called a nil check. The compiler arranges to execute an instruction that will fault if t
is nil.
On amd64, f
compiles to three instructions:
MOVQ "".t+8(SP), AX
Get the value of t
off of the stack and put it in the AX
register.
TESTB AL, (AX)
Load the value pointed to by AX
and do something with it. The parens around AX
mean dereference the pointer in the AX
register. It doesn’t matter here what the TESTB
instruction does; it was chosen because it is short to encode. It’s the deferencing that matters. If the load faults, the runtime will receive a signal and turn that into a panic.
RET
Return from the function. We only reach this instruction if we don’t panic first.
Implicit nil checks
Why does type T
above contain a [5000]byte
field?
There are lots of nil checks in a typical Go program. As an optimization, the runtime allocates a guard page at address 0, typically with size 4096 bytes. Any loads from an address < 4096 will fault.
As a result, if you’re dereferencing a struct field with a small offset, we can directly attempt to load from the calculated address of that struct field. If the pointer is zero, then the calculated address will be < 4096, and it’ll fault. There’s no need for a separate, explicit nil check.
For example, if I had used [20]byte
above, then *t.b
requires loading from t
plus 20. If t
is nil (0), then that address is 20, which is located in the guard page.
Since we have a [5000]byte
field above, the guard page isn’t enough, so we need an explicit nil check.
This makes it sounds like explicit nil checks are exceedingly rare. They’re not; they show up in other ways too.
Back to RISC-V
I had to decide how RISC-V should lower explicit nil checks.
RISC-V has a dedicated zero register, x0
. It always holds the value zero, and writes to it are discarded. It’s like /dev/null
and /dev/zero
rolled into one.
It sounds like just the thing for a nil check: We can derefence the pointer and load the value into x0
.
Here’s f
, compiled for RISC-V:
LD "".t+8(SP), X3
LB (X3), X0
JALR X0, X1
It is almost identical to the amd64 version. The first instruction loads the pointer from the stack. The second instruction dereferences it into x0
. The final instruction returns.
There was only one problem: Would it work?
An ambiguity in the spec
If you’re loading a value in order to discard it, do you really need to load it at all? if you’re writing to x0
, maybe you can just skip it.
There is an analog from amd64. The CMOV
instruction does a conditional move. If a flag is set, then it loads or moves a value, and not otherwise. It shows up when compiling code like this:
func g(x int) int {
y := 1
if x == 0 {
y = 3
}
return y
}
The core of this function compiled for amd64 is:
TESTQ AX, AX
MOVL $1, AX
MOVL $3, CX
CMOVQEQ CX, AX
TESTQ
sets the EQ
flag if x is 0. The next two instructions put 1 in AX
and 3 in CX
. Last, if the EQ
flag is set, we move CX
into AX
. AX
now holds the correct value of y to return.
If a CMOV
instruction includes a load from memory, that load is done unconditionally, even though the write of that value into the destination register is conditional.
I knew (and know) approximately nothing about hardware, but I can guess why this is a good decision. If you’re doing out of order execution, you might not know yet what the flags are going to be when you reach that CMOV
instruction. But memory loads are slow. We want to start that memory load early for maximum benefit. So it is useful to be able to do the load unconditionally, even if it is inconvenient for compiler developers.
But the same consideration doesn’t really apply to RISC-V. There’s no uncertainty about whether the instruction writes to x0
. Skipping the load would be easy and cheap.
Denouement
I asked my co-conspirators, and one of them asked Andrew Waterman.
He replied:
We debated this hole in the spec at length, but neglected to write down the conclusion.
The main reason we went with this definition is cleaner semantics for memory-mapped I/O loads that trigger side effects. The opposite choice is also defensible (it gives you a non-binding prefetch instruction for free).
Light-years ahead of me, unsurprisingly. But convenient for Go’s nil checks. And me having asked did help tie up one little loose end.