testing.B.ReportMetric
July 3, 2019
My favorite new API in Go 1.13 is testing.B.ReportMetric
. It allows you to report custom metrics from within your benchmarks.
When the API was proposed, the motivating example was package sort
. The package sort
benchmarks, like all benchmarks, measure elapsed wall time and (optionally) allocations. But wall time measurements can be noisy, and the sort routines don’t allocate much. There is another really useful algorithm-level metric for a sorting routine: How many comparisons does it perform?
Let’s use our new to API find out. Here’s a simple sort benchmark.
package demo
import (
"sort"
"testing"
)
func BenchmarkSort(b *testing.B) {
data := make([]int, 1<<10)
b.ResetTimer() // ignore big allocation
for i := 0; i < b.N; i++ {
for i := range data {
data[i] = i ^ 0x2cc
}
b.StartTimer()
sort.Slice(data, func(i, j int) bool {
return data[i] < data[j]
})
b.StopTimer()
}
}
(Aside: Some of the sort benchmarks in the standard library allocate inside the loop. Oops. Carefully fixing this might make a nice first contribution for someone wanting to try their hand at the Go standard library.)
On my machine, executing this looks like:
$ go test -bench=. -count=5
goos: darwin
goarch: amd64
pkg: github.com/commaok/blog/content/post
BenchmarkSort-8 19536 60568 ns/op
BenchmarkSort-8 19666 60592 ns/op
BenchmarkSort-8 19569 60674 ns/op
BenchmarkSort-8 19724 61363 ns/op
BenchmarkSort-8 19705 60792 ns/op
PASS
ok github.com/commaok/blog/content/post 11.772s
If the number of iterations looks funny to you, it’s because you are used to seeing round numbers there. That also changed in Go 1.13.
Now let’s use the new testing.B.ReportMetric
API to report the number of comparisons. To do this, we need to count the number of comparisons. Then we’ll divide by b.N
. The reason we must do the division ourselves is to allow metrics that aren’t naturally represented as per-op counts, such as ratios or percentiles.
Here’s the updated code, with added lines marked:
func BenchmarkSort(b *testing.B) {
var cmps int64 // <--------
data := make([]int, 1<<10)
b.ResetTimer() // remove allocation
for i := 0; i < b.N; i++ {
for i := range data {
data[i] = i ^ 0x2cc
}
b.StartTimer()
sort.Slice(data, func(i, j int) bool {
cmps++ // <--------
return data[i] < data[j]
})
b.StopTimer()
}
b.ReportMetric(float64(cmps)/float64(b.N), "cmps/op") // <--------
}
Running this yields:
$ go test -bench=. -count=5
goos: darwin
goarch: amd64
pkg: github.com/commaok/blog/content/post
BenchmarkSort-8 18523 69874 ns/op 10091 cmps/op
BenchmarkSort-8 18654 64621 ns/op 10091 cmps/op
BenchmarkSort-8 18657 64589 ns/op 10091 cmps/op
BenchmarkSort-8 18774 64367 ns/op 10091 cmps/op
BenchmarkSort-8 18704 64408 ns/op 10091 cmps/op
PASS
ok github.com/commaok/blog/content/post 12.062s
Note that the number of comparisons is stable. This is a wonderful property when benchmarking. It means you can run your benchmarks a small number of times, without quitting your browser or trying to disable thermal scaling or worrying about how your laptop CPU may be different from your production CPUs.
Now let’s reach into the standard library and tweak something! There’s a function called maxDepth
inside package sort
:
// maxDepth returns a threshold at which quicksort should switch
// to heapsort. It returns 2*ceil(lg(n+1)).
func maxDepth(n int) int {
var depth int
for i := n; i > 0; i >>= 1 {
depth++
}
return depth * 2
}
(Aside: maxDepth should probably use math/bits.Len
to calculate ceil(lg(n+1))
. This is another candidate for a good first contribution to the Go core.)
Let’s make maxDepth
just return 0 instead and re-run our benchmark:
$ go test -bench=. -count=5 -run=NONE
goos: darwin
goarch: amd64
pkg: github.com/commaok/blog/content/post
BenchmarkSort-8 12618 94792 ns/op 17426 cmps/op
BenchmarkSort-8 12668 94769 ns/op 17426 cmps/op
BenchmarkSort-8 12642 94645 ns/op 17426 cmps/op
BenchmarkSort-8 12738 94623 ns/op 17426 cmps/op
BenchmarkSort-8 12654 94431 ns/op 17426 cmps/op
PASS
ok github.com/commaok/blog/content/post 12.867s
We can see that not only is it slower, but that it is clearly an algorithmic degradation: The number of comparisons has increased markedly.
benchstat supports custom metrics, so we can use it to tell us exactly how much:
$ benchstat a b
name old time/op new time/op delta
Sort-8 64.5µs ± 0% 94.7µs ± 0% +46.76% (p=0.016 n=4+5)
name old cmps/op new cmps/op delta
Sort-8 10.1k ± 0% 17.4k ± 0% +72.69% (p=0.008 n=5+5)
There’s one more trick worth worth mentioning. testing.B.ReportMetric
generally overrides built-in metrics. If you call b.ReportMetric(5, "allocs/op")
, then go test
will report 5 allocs/op
, regardless of how much your benchmark actually allocated. There’s one special case: ns/op
. If set to 0
, then ns/op
won’t be reported at all. This is useful to avoid confusion or distraction if your benchmark’s elapsed time is meaningless. For example, one of the earliest uses of testing.B.ReportAlloc
was for a package sync
benchmark that reported garbage collector stop-the-world time, in percentiles. It is irrelevant how long it takes that benchmark to run; only the percentiles matter.
On a sad note, there’s currently no good way to get profiling tools to tell you where a particular custom metric increased.
What else could you use testing.B.ReportMetric
for? Obvious use cases include cache hit rates, I/O amounts, fast path/slow path heuristics, percentile reporting, and algorithmic improvements. (We measured comparisons during sorting here; how about swaps?) I’m sure there will be non-obvious ones as well.