聊一聊 pprof

站长

2023年05月26日 13:58 · 阅读数 105

对于线上稳定运行的服务来说, 可能会遇到 cpu、mem 利用率升高的问题。有时可能增加机器配置或重启服务也解决不了问题。那我们就需要使用 pprof 工具来进行性能分析, 找出造成 cpu、mem 升高的问题。

pprof

pprof 是 golang 的性能分析工具。pprof 通过读取 profile.proto 格式的性能采样数据来生成相关报告。从而分析服务中存在的性能问题。

pprof profiles

存储采样的数据集合。在 golang 中, 采样数据分别包含这几种类型：

cpu

cpu 采样是最常用的采用类型。当开始 cpu 采样时, 每隔 10ms 会进行一次采样, 记录当前运行 goroutines 的堆栈信息。当采样完成之后, 可以分析哪些代码比较消耗 cpu。

memory

memory 采样主要用来分析服务的堆内存的分配和回收。程序运行时, 栈内存是可以直接分配和回收的。因此, 无需采集栈内存的使用情况。

block

采集阻塞数据。主要用于定位 channel 所导致的 goroutines 阻塞。

mutex

主要用来采集锁竞争相关数据, 分析锁竞争导致的系统的延迟。

生成 profile 文件

在开始进行性能相关分析之前, 需要生成对应的采样文件。然后通过 pprof 工具对 profile 文件进行分析, 找出影响性能的代码。在 golang 中, 可以使用以下三种方式生成 profile 文件：

测试函数

在使用基准测试时, 可以指定参数生成对应的 profile 文件。

go test -cpuprofile {proflie_name} -bench .

http 访问

对于持续运行的任务, 例如：http、rpc。可以导入 net/http/pprof 来注册 debug 路由, 生成对应的 proflie 文件。对于非 http 的服务, 需要注册一个 debug 端口。

直接生成 profile 文件

对于非持续性任务。例如：定时任务。可以通过指定方法生成对应的 proflie 文件。

import (
   "github.com/pkg/profile" // 对 runtime/pprof 进行的封装
   "fmt"
)

func main() {
   defer func() {
      profile.Start(profile.CPUProfile, profile.ProfilePath("."))
   }()
   func() {
      fmt.Println(5 + 6)
   }()
}

分析 profile 文件

下面将通过具体的例子, 来尝试分析对应的 cpu、mem 等指标。

cpu 分析

在下面的例子中, 尝试分析哪个函数比较占用 cpu.

package main

import (
   "log"
   "net/http"
   _ "net/http/pprof"
   "runtime"
   "time"
)
//go:noinline
func req(num int64) {
   A(num)
   B(num)
}
//go:noinline
func B(num int64) {
   for i := int64(0); i < num/2; i++ {

   }
}
//go:noinline
func A(num int64) {
   for i := int64(0); i < num; i++ {

   }
}
func main() {
   runtime.GOMAXPROCS(1)
   go func() {
      log.Println(http.ListenAndServe("localhost:3016", nil))
   }()
   for {
      time.Sleep(time.Second / 4)
      go func() {
         req(time.Now().Unix())
      }()
   }
}

下载并保存对应的 cpu profile, 采样时间 19s：

curl ``http://127.0.0.1:3016/debug/pprof/profile``?seconds=19 --output cpu.pprof

启动可视化界面：

go tool pprof -http=:8080 cpu.pprof

从最右侧可以看出, 此次采集共耗时 19.21s, 采样得到的 cpu 时间为 16.73s 「87.10%」。由于是采样, 所以 total samples 是小于 Duration 的。通过 view 来切换不同的试图。

聊一聊 pprof

Top, 用于展示不同函数的耗时占比。Flat: 当前函数运行耗时。Flat%: Flat 对应的占比。Sum%: 累积使用占比。Cum: 当前函数及子函数运行耗时。Cum%: Cum 对应的占比。

对于下面的例子：统计到的 cpu 时间为 16.73s, 函数 A 本身耗时 13.3s 「79.49%」; 函数 A 及其子函数耗时为 13.85s「82.78%」; 函数 req 本身耗时 0; 子函数耗时 16.73s「100%」。

聊一聊 pprof

Graph, 以图的形式展示不同函数的耗时占比。如下图所示：函数入口为 main.func2「匿名函数」; 0 of 16730ms「匿名函数不占用任何资源, cpu 采样总时长为 16730ms」; main.req 为 main.func2 的子函数「函数自身不占用任何资源」; main.A「13850ms 函数自身占用 13300ms」和 mian.B「2880ms 函数自身占用 2810ms」为 main.req「16730ms」的子函数。

通过 Graph 可以直观的看到哪些函数占用的资源比较高, 方框越大资源占比越高。同时可以直观的看到调用关系。

聊一聊 pprof

Flame Graph, 火焰图。通过火焰图可以看到各个函数的耗时, 以及子函数的耗时。横轴越长占用的资源越多, 纵轴越长, 调用层级越深。

聊一聊 pprof

Source, 查看对应的源码。在下图中, req 的子函数 A 和 B 共占用了 100% 的资源。在函数 B 中, 函数自身耗时占用了 2.81s, 其中 for 循环语句占用 2.18s。

聊一聊 pprof

mem 分析

下面是一段内存分配的代码。heap 采样并不需要像 cpu 采样一样确定固定周期。heap 关注的是内存的分配和回收, 只需采集对应的事件即可。

package main

import (
   "fmt"
   "log"
   "net/http"
   _ "net/http/pprof"
   "runtime"
   "time"
)

//go:noinline
func req() {
   buf = append(buf, allocate()...)
}

//go:noinline
func allocate() []byte {
   var b []byte
   for i := 0; i < 1024; i++ {
      temp := make([]byte, 1024)
      b = append(b, temp...)
   }
   return b
}

var buf []byte

func main() {
   runtime.GOMAXPROCS(1)
   go func() {
      log.Println(http.ListenAndServe("localhost:3016", nil))
   }()
   for {
      time.Sleep(time.Second)
      go func() {
         req()
         fmt.Println(len(buf))
      }()
   }
}

下载堆内存采样文件。

curl ``http://127.0.0.1:3016/debug/pprof/heap`` --output mem.pprof

使用 webui 打开文件。

go tool pprof -http=:8080 mem.pprof

共分为四种采样类型：alloc_objects: 所有分配的对象; alloc_space: 所有分配的空间; inuse_objects: 活跃的对象; inuse_space: 活跃的空间。这里需要注意的是, inuse_space 使用的空间是要小于当前进程使用的系统空间的「使用中的, gc 未释放给操作系统的, cgo 申请的空间」

聊一聊 pprof

通过 alloc 可以分析哪里在频繁的分配空间, inuse 可以用来分析内存没有释放。切到 inuse_sapce, 可以看到 req 函数以及它的子函数 allocate 在不断分配空间。

聊一聊 pprof

goroutine 分析

下面是一个 goroutine 泄漏的例子。

package main

import (
   "fmt"
   "log"
   "net/http"
   _ "net/http/pprof"
   "runtime"
   "time"
)

//go:noinline
func req() {
   c := make(chan struct{})
   tick := time.NewTicker(time.Millisecond)
   go func() {
      time.Sleep(time.Millisecond * 2)
      c <- struct{}{}
   }()
   select {
   case <-c:
      return
   case <-tick.C:
      return
   }
}

func main() {
   runtime.GOMAXPROCS(1)
   go func() {
      log.Println(http.ListenAndServe("localhost:3016", nil))
   }()
   for {
      time.Sleep(time.Second / 10)
      go func() {
         req()
      }()
      fmt.Println(runtime.NumGoroutine())
   }
}

下载堆 goroutine 采样文件。

curl ``http://127.0.0.1:3016/debug/pprof/``goroutine --output goroutine.pprof

使用 webui 打开文件。

go tool pprof -http=:8080 goroutine.pprof

切到火焰图, 可以看到 main.req.func1「匿名函数」占用了需要 goroutine。我们切到 Souce 定位到代码行数。可以看到, 卡在了往非缓存的 channel 写数据。由于接收端已经 return, 以此 goroutine 等待写入而无法释放。我们只需声明一个带缓冲的 channel 就可以解决问题。

聊一聊 pprof

总结

本文简单介绍了 golang 的性能分析工具。如何进行性能采样, 查看采样的数据并对其进行分析。介绍了 Graph、Flame Graph、Source、Top 的用法。最后, 通过几个简单的例子来进行 cpu、mem、goroutine 的分析。

其实, 在真实的的线上服务中, 如果没有遇到具体的问题「接口变慢、cpu 或 mem 持续升高、goroutine 泄漏、死锁」, 不太建议过度的进行优化。倒不如花精力聚焦在业务思考上。

转载自:https://juejin.cn/post/7236269878227238967