Generating safer Go code

October 28, 2020

It’s easy to forget to call go generate when you need to. Failure to regenerate can mean nasty bugs.

Venerable gopher Rog Peppe found an excellent technique for guarding against this class of bugs. Like many good ideas, it is obvious in retrospect.

Generate code that will not compile if needs to be regenerated.

I’ll illustrate this with two examples.

stringer

The first example comes directly from Rog, in the stringer command. stringer generates a String() string method for integer types that have defined constants.

type T int

const (
    One T = 1
    Two T = 2
)

stringer will generate a method that returns "One" for 1, "Two" for 2, and "T(3)" for 3.

What if you now change the value of One to be 3 and forget to re-generate?

Well, stringer also generated this function:

func _() {
	var x [1]struct{}
	_ = x[One-1]
	_ = x[Two-2]
}

The function is named _, which means it is impossible to call it. The compiler won’t even bother generating code for it. It will, however, typecheck it. And typechecking is where the magic happens.

When the value of One is 1, x[One-1] evaluates to x[0]. Since x has length 1, that’s OK.

When the value of One is 3, x[One-1] evaluates to x[2]. But x only has length 1! Attempts to compile this generate a compiler error: invalid array index One - 1 (out of bounds for 1-element array).

The function recorded the values of the constants when stringer was run and fails to compile if those values change.

cloner

Now that we know the trick, we can apply it elsewhere.

Tailscale has a little bespoke tool to generate Clone methods for structs.

The output of cloner depends on the input struct fields. How can we trigger a compilation failure if we forget to re-run the tool after changing an input struct?

The trick is to duplicate the original struct in the generated code and then attempt to convert from the original struct to the current struct.

We start with this input code:

type T struct {
	X int
}

After generating a Clone method for T, cloner also generates:

var _ = T(struct {
	X int
}{})

Here we’ve written out the exact form of T when we generated the code, and assigned it to _, which the compiler can discard. However, it still must be typechecked.

Suppose we now change the type T. Let’s add a new field.

type T struct {
	X int
	Y string
}

The conversion now fails: It’s not possible to convert a struct { X int } to a struct { X int; Y string }.

Similar to stringer, cloner recorded the types when cloner was run and now fails to compile if those types change.

Compile-time assertion taxonomy

We’ve seen two forms of assertions that can trigger during typechecking: x == y and a struct’s fields are unchanged.

There are others. For example, you can use conversions to assert that a type implements an interface. You can use conversion to uint to assert that one untyped constant is greater than or equal to than another. (You can’t convert a negative constant to uint.)

There are some obscure ones, of questionable utility. For example, you could assert that two concrete types are distinct by putting them both as cases in a type switch, which disallows duplicate types.

I don’t know of any attempt to exhaustively list compile-time assertions (aside from the spec) and how they can be used, with examples. Someone please make one!

Matthew Dempsky has proposed that Go add explicit compile time assertions for boolean expressions. (That doesn’t cover relationships between types, although maybe generics would break some new ground here.) And I’ve written about a quirky way that you can write link-time assertions in Go.

Call to action

If you maintain a code generator, please check whether you can use this technique to protect your users from bugs. One obvious category is generated serialization/deserialization routines. There are almost certainly others.