你想了解的golang基准测试

基准测试主要依据CPU和内存的使用效率，来评估被测试代码的性能，进而帮助我们优化代码。

Go 语言标准库内置的 testing 测试框架提供了基准测试(benchmark)的能力，我们可以很容易地对某一段代码进行性能测试。

性能测试受环境的影响很大，为了保证测试的可重复性，过程中我们要尽可能地保持测试环境的稳定：

被测机器尽可能处于闲置状态，不要执行其他任务；
避免使用虚拟机和云主机进行测试。

为了尽可能地提高资源的利用率，虚拟机和云主机 CPU 和内存一般会超分配，超分机器的性能表现会非常地不稳定。

benchmark的使用

一颗栗子

我们先写一段被测函数，用来计算第N个斐波那契数。直接上代码：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


// fib.go
package main

// fib 计算第 N 个斐波那契数
func fib(n int) int {
    if n == 1 || n == 0 {
        return n
    } else {
        return fib(n-1) + fib(n-2)
    }
}

新建一个fib_test.go文件，实现一个benchmark用例：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


// fib_test.go
package main

import "testing"

func BenchmarkFib(b *testing.B) {
    for n := 0; n < b.N; n++ {
        fib(30) // run fib(30) b.N times
    }
}

需要注意：

benchmark 测试用例位于 *_test.go文件中；
性能测试函数命名规则为"BenchmarkXxx"，其中"Xxx"为自定义的标识，需要以大写字母开始，通常为待测函数；
参数为b *testing.B，提供了一系列的用于辅助性能测试的方法或成员，例如示例代码中的b.N循环次数，下文会再解释；
没有返回值。

运行用例

想要运行某个package内的测试用例，需要使用go test命令。具体使用规则如下：

运行当前package内的用例：

go test example或 go test .
运行子package 内的用例：

go test example/<package name> 或 go test ./<package name>
递归测试当前目录下的所有的 package：

go test ./... 或 go test example/...
运行某个package内的所有测试用例：

go test <module name>/<package name>

go test 命令默认不运行 benchmark 用例，需要加上 -bench 参数。例如：

1
2
3
4
5
6
7
8


$ go test -bench BenchmarkFib
goos: windows
goarch: amd64
pkg: xxx/benchmark
cpu: 13th Gen Intel(R) Core(TM) i5-1340P
BenchmarkFib-16              349           3461418 ns/op
PASS
ok      xxx/benchmark     1.546s

除了上面直接传入测试用例的名字，-bench还支持传入一个正则表达式，匹配到的用例就会执行。例如，只运行以Fib结尾的benchmark用例：

1
2
3
4
5
6
7
8


$ go test -bench='Fib$' .
goos: windows
goarch: amd64
pkg: xxx/benchmark
cpu: 13th Gen Intel(R) Core(TM) i5-1340P
BenchmarkFib-16              357           3472680 ns/op
PASS
ok      xxx/benchmark     1.614s

benchmark工作细节

用例参数 b *testing.B包含一个属性 b.N ，表示这个用例需要运行的次数。b.N 值是动态调整的，对于每个用例都不一样。具体执行次数会在执行结束后打印出来。

b.N值是如何决定的呢？

从 1 开始，如果该用例能够在 -benchtime参数（耐心看下去，下文会再解释(●’◡’●)，默认1s）指定的时间内完成，b.N 的值便会增加，再次执行。b.N 的值大概以 1, 2, 3, 5, 10, 20, 30, 50, 100 这样的序列递增，越到后面，增加得越快，直到可靠的算出程序执行时间后才会停止。

我们仔细观察上述例子的输出：

1

BenchmarkFib-16              357           3472680 ns/op

BenchmarkFib-16 中的 -16 即 GOMAXPROCS，默认等于 CPU 核数。可以通过 -cpu 参数改变 GOMAXPROCS，例如：

1
2
3
4
5
6
7
8


$ go test -bench='Fib$' -cpu=4 .    
goos: windows
goarch: amd64
pkg: xxx/benchmark
cpu: 13th Gen Intel(R) Core(TM) i5-1340P
BenchmarkFib-4               350           3477851 ns/op
PASS
ok      xxx/benchmark     1.714s

在这个例子中，Fib 的调用是串行的，改变 CPU 的核数对结果几乎没有影响。

350 和 3477851 ns/op 表示用例执行了 202 次，每次花费约 0.0035s。总耗时比 1s 略多。

增加测试次数

性能测试，提升准确度的一个重要手段，就是增加测试的次数。下面我们从两个角度来达到这个目的。

-benchtime设置执行时间或次数

benchmark 的默认时间是 1s，我们可以使用 -benchtime 指定为 5s。例如：

1
2
3
4
5
6
7
8


$ go test -bench='Fib$' -benchtime=5s .
goos: windows
goarch: amd64
pkg: xxx/benchmark
cpu: 13th Gen Intel(R) Core(TM) i5-1340P
BenchmarkFib-16             1758           3459473 ns/op
PASS
ok      xxx/benchmark     6.467s

实际执行的时间是 6.5s，比 benchtime 的 5s 要长，测试用例编译、执行、销毁等是需要时间的。

将 -benchtime 设置为 5s，用例执行次数也变成了原来的 5倍，每次函数调用时间仍为 0.0035s，几乎没有变化。

-benchtime 还可以设置具体的次数。例如，执行 30 次可以用 -benchtime=30x：

1
2
3
4
5
6
7
8


$ go test -bench='Fib$' -benchtime=30x .
goos: windows
goarch: amd64
pkg: xxx/benchmark
cpu: 13th Gen Intel(R) Core(TM) i5-1340P
BenchmarkFib-16               30           3322773 ns/op
PASS
ok      xxx/benchmark     0.138s

调用 30 次 fib(30)，仅花费了 0.138s。

-count设置benchmark轮数

例如进行3轮benchmark，可以执行如下命令：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


$ go test -bench='Fib$' -count=3 .                           
goos: windows
goarch: amd64
pkg: xxx/benchmark
cpu: 13th Gen Intel(R) Core(TM) i5-1340P
BenchmarkFib-16              349           3460694 ns/op
BenchmarkFib-16              350           3440665 ns/op
BenchmarkFib-16              349           3445845 ns/op
PASS
ok      xxx/benchmark     4.697s

可以发现，3轮的执行结果都差不多。

当然了，-benchtime和-count也可以同时设置，配合达到增加测试次数的目的。

内存分配

度量内存分配的大小及次数，可以使用-benchmem参数。内存分配情况也是和性能息息相关的。例如不合理的切片容量，将导致内存重新分配，带来不必要的开销。

下面的例子中，generateWithCap 和 generate 的作用是一致的：生成一组长度为 n 的随机序列。唯一的不同在于，generateWithCap 创建切片时，将切片的容量(capacity)设置为 n，这样切片就会一次性申请 n 个整数所需的内存。

点击查看示例代码

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38


// generate_test.go
package main

import (
    "math/rand"
    "testing"
    "time"
)

func generateWithCap(n int) []int {
    rand.Seed(time.Now().UnixNano())
    nums := make([]int, 0, n)
    for i := 0; i < n; i++ {
        nums = append(nums, rand.Int())
    }
    return nums
}

func generate(n int) []int {
    rand.Seed(time.Now().UnixNano())
    nums := make([]int, 0)
    for i := 0; i < n; i++ {
        nums = append(nums, rand.Int())
    }
    return nums
}

func BenchmarkGenerateWithCap(b *testing.B) {
    for n := 0; n < b.N; n++ {
        generateWithCap(1000000)
    }
}

func BenchmarkGenerate(b *testing.B) {
    for n := 0; n < b.N; n++ {
        generate(1000000)
    }
}

运行该用例，结果如下：

1
2
3
4
5
6
7
8
9


$ go test -bench="Generate" -benchtime=5s -benchmem .
goos: windows
goarch: amd64
pkg: xxx/benchmark
cpu: 13th Gen Intel(R) Core(TM) i5-1340P
BenchmarkGenerateWithCap-16          429          13750025 ns/op         8003584 B/op          1 allocs/op
BenchmarkGenerate-16                 325          18352395 ns/op        41678115 B/op         38 allocs/op
PASS
ok      xxx/benchmark     15.845s

可以看到生成 100w 个数字的随机序列，GenerateWithCap 每次执行的耗时比 Generate 少 25%。并且Generate 分配的内存是 GenerateWithCap 的 5 倍+，设置了切片容量，内存只分配一次；不设置切片容量，内存分配了 38次。

复杂度测试

不同的函数复杂度不同，O(1)，O(n)，O(n^2) 等。benchmark 用例可以通过构造不同的输入验证复杂度。

新建测试文件 generate_test.go，用例代码如下：

点击查看用例代码

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26


package main

import (
    "math/rand"
    "testing"
    "time"
)

func generate(n int) []int {
    rand.Seed(time.Now().UnixNano())
    nums := make([]int, 0)
    for i := 0; i < n; i++ {
        nums = append(nums, rand.Int())
    }
	return nums
}
func benchmarkGenerate(i int, b *testing.B) {
    for n := 0; n < b.N; n++ {
        generate(i)
    }
}

func BenchmarkGenerate1000(b *testing.B)    { benchmarkGenerate(1000, b) }
func BenchmarkGenerate10000(b *testing.B)   { benchmarkGenerate(10000, b) }
func BenchmarkGenerate100000(b *testing.B)  { benchmarkGenerate(100000, b) }
func BenchmarkGenerate1000000(b *testing.B) { benchmarkGenerate(1000000, b) }

辅助函数 benchmarkGenerate 允许传入参数 i，并构造 4 个不同输入的 benchmark 用例。运行结果如下：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


$ go test -bench="0$" .
goos: windows
goarch: amd64
pkg: xxx/benchmark
cpu: 13th Gen Intel(R) Core(TM) i5-1340P
BenchmarkGenerate1000-16           52900             23253 ns/op
BenchmarkGenerate10000-16           6468            178180 ns/op
BenchmarkGenerate100000-16           700           1827157 ns/op
BenchmarkGenerate1000000-16           54          18553407 ns/op
PASS
ok      xxx/benchmark     5.832s

通过测试结果可以发现，输入变为原来的10倍，函数每次调用的时长差不多也是原来的10倍，说明复杂度是线性的。

DLC

上文介绍了benchmark的一般用法，还有些其他可能会涉及到的注意事项，我们放在这一节讨论。

ResetTimer

如果在 benchmark 开始前，需要一些准备工作，且比较耗时，可以将这部分代码的耗时忽略掉。看个栗子：

1
2
3
4
5
6


func BenchmarkFib(b *testing.B) {
    time.Sleep(time.Second * 3) // 模拟耗时准备任务
    for n := 0; n < b.N; n++ {
        fib(30) // run fib(30) b.N times
    }
}

运行用例：

1
2
3
4
5
6
7
8


$ go test -bench='Fib2' -benchtime=50x .
goos: windows
goarch: amd64
pkg: xxx/benchmark
cpu: 13th Gen Intel(R) Core(TM) i5-1340P
BenchmarkFib2-16              50          63713120 ns/op
PASS
ok      xxx/benchmark     6.902s

50次调用，每次调用约 0.064s，是之前的 0.0035s 的 18 倍。究其原因，受到了耗时准备任务的干扰，需要用 ResetTimer 屏蔽掉：

1
2
3
4
5
6
7


func BenchmarkFib(b *testing.B) {
    time.Sleep(time.Second * 3) // 模拟耗时准备任务
    b.ResetTimer() // 重置定时器
    for n := 0; n < b.N; n++ {
        fib(30) // run fib(30) b.N times
    }
}

运行结果恢复正常，每次调用约 0.0034s。

1
2
3
4
5
6
7
8


$ go test -bench='Fib2' -benchtime=50x .
goos: windows
goarch: amd64
pkg: xxx/benchmark
cpu: 13th Gen Intel(R) Core(TM) i5-1340P
BenchmarkFib2-16              50           3360844 ns/op
PASS
ok      xxx/benchmark     6.913s

StopTimer & StartTimer

每次函数调用前后如果需要一些准备工作和清理工作，我们可以使用 StopTimer 暂停计时以及使用 StartTimer 开始计时。

例如，测试一个冒泡函数的性能，每次调用冒泡函数前，需要随机生成一个数字序列，这是非常耗时的操作。这种场景下，可以使用 StopTimer 和 StartTimer 避免将这部分时间计算在内。

点击查看用例代码

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36


// sort_test.go
package main

import (
    "math/rand"
    "testing"
    "time"
)

func generateWithCap(n int) []int {
    rand.Seed(time.Now().UnixNano())
    nums := make([]int, 0, n)
    for i := 0; i < n; i++ {
        nums = append(nums, rand.Int())
    }
    return nums
}

func bubbleSort(nums []int) {
    for i := 0; i < len(nums); i++ {
        for j := 1; j < len(nums)-i; j++ {
            if nums[j] < nums[j-1] {
                nums[j], nums[j-1] = nums[j-1], nums[j]
            }
        }
    }
}

func BenchmarkBubbleSort(b *testing.B) {
    for n := 0; n < b.N; n++ {
        b.StopTimer()
        nums := generateWithCap(10000)
        b.StartTimer()
        bubbleSort(nums)
    }
}

执行该用例，每次排序耗时约 0.037s。

1
2
3
4
5
6
7
8


$ go test -bench='BubbleSort$' .
goos: windows
goarch: amd64
pkg: xxx/benchmark
cpu: 13th Gen Intel(R) Core(TM) i5-1340P
BenchmarkBubbleSort-16                36          37249725 ns/op
PASS
ok      xxx/benchmark     2.070s

references

Go语言使用benchmark进行性能测试-知乎

Golang性能诊断

benchmark基准测试-极客兔兔

Go语言专家编程