go 高并发下的数据结构

站长

2024年07月18日 16:06 · 阅读数 68

什么变量的大小是 0 字节

查看一个变量的字节大小

fmt.Println(unsafe.Sizeof(int(0))) // 8

int 类型的变量大小是 8 字节，int 类型的变量大小是不固定的，会因为不同的操作系统而改变
int32 类型的变量大小是 4 字节
int64 类型的变量大小是 8 字节
指针的字节长度是 8 字节，也会根据操作系统的不同而改变

空结构体的大小是 0 字节

type Person struct {}
func main() {
  p := Person{}
  fmt.Println(unsafe.Sizeof(p)) // 0
}

0 字节的变量在内存中的地址是相同的，称为 zerobase，这个变量在 runtime/malloc.go 文件中

fmt.Printf("%p\n", &Person{}) // 0x1164340 -> 是 zerobase

不过要注意的是，空结构体如果在其他结构体(这个结构体中的成员要有非空结构体)中，那么这个结构体的地址就不是 zerobase 了

type Person1 struct {
  p1 Person
}
p1 := Person1{}
fmt.Printf("%p\n", &p1.p1) // 0x1164340 -> 是 zerobase

type Person1 struct {
  p1 Person
  name string
}
p1 := Person1{}
fmt.Printf("%p\n", &p1.p1) // 0xc000014070 -> 不是 zerobase

空结构体的主要作用就是为了节省内存，使用场景：hashset、channel

字符串的底层

字符串在 go 中占用的是 16 字节，不管字符串的长度是多少，都是 16 字节

fmt.Println(unsafe.Sizeof("a")) // 16
fmt.Println(unsafe.Sizeof("高并发下的数据结构")) // 16

字符串的底层是一个结构体 stringStruct，结构体中有两个字段，一个是指向字符串的指针，一个是字符串的长度

这个结构体在 runtime/string.go 文件中

type stringStruct struct {
  str unsafe.Pointer
  len int
}

但是这个结构体不能被外部直接访问

在反射包里面有一个 StringHeader 结构体，和 stringStruct 结构体是类似

这个结构体在 reflect/value.go 文件中

type StringHeader struct {
  Data uintptr
  Len  int
}

StringHeader 中的 Len 字段是字符串的字节长度，这个长度和 len 方法返回的长度是一样的

s := "高并发"
fmt.Println((*reflect.StringHeader)(unsafe.Pointer(&s)).Len) // 9
len(s) // 9

字符串遍历应该使用 range 关键字，因为 range 关键字会自动处理 utf-8 编码的字符串

s := "高并发"
for _, char := range s {
  fmt.Println(string(char))
}

如果要切割字符串的话需要将字符串转成 rune 类型的切片

s := "高并发"
runeSlice := []rune(s)
fmt.Println(string(runeSlice[0])) // 高

map 的底层

map 的底层是 hmap，在 runtime.map.go 文件中，结构如下：

// A header for a Go map.
type hmap struct {
  // Note: the format of the hmap is also encoded in cmd/compile/internal/reflectdata/reflect.go.
  // Make sure this stays in sync with the compiler's definition.
  count     int // # live cells == size of map.  Must be first (used by len() builtin)
  flags     uint8
  B         uint8  // log_2 of # of buckets (can hold up to loadFactor * 2^B items)
  noverflow uint16 // approximate number of overflow buckets; see incrnoverflow for details
  hash0     uint32 // hash seed

  buckets    unsafe.Pointer // array of 2^B Buckets. may be nil if count==0.
  oldbuckets unsafe.Pointer // previous bucket array of half the size, non-nil only when growing
  nevacuate  uintptr        // progress counter for evacuation (buckets less than this have been evacuated)

  extra *mapextra // optional fields
}

buckets：有这个属性的就是拉链法
B：buckets 数量的 2 的对数
count：当前这个 hamp 中存储的key-value 的数量
hash0：哈希算法的种子
extra：溢出桶的信息

bucket 的结构如下：

// A bucket for a Go map.
type bmap struct {
  // tophash generally contains the top byte of the hash value
  // for each key in this bucket. If tophash[0] < minTopHash,
  // tophash[0] is a bucket evacuation state instead.
  tophash [bucketCnt]uint8
  // Followed by bucketCnt keys and then bucketCnt elems.
  // NOTE: packing all the keys together and then all the elems together makes the
  // code a bit more complicated than alternating key/elem/key/elem/... but it allows
  // us to eliminate padding which would be needed for, e.g., map[int64]int8.
  // Followed by an overflow pointer.
}

tophash：存储的是每一个 key 的前一个字节的哈希值
overflow：溢出指针

func makemap(t *maptype, hint int, h *hmap) *hmap {
  mem, overflow := math.MulUintptr(uintptr(hint), t.Bucket.Size_)
  if overflow || mem > maxAlloc {
    hint = 0
  }

  // initialize Hmap
  if h == nil {
    // new 了一个 hmap 结构体
    h = new(hmap)
  }
  h.hash0 = fastrand()

  // Find the size parameter B which will hold the requested # of elements.
  // For hint < 0 overLoadFactor returns false since hint < bucketCnt.
  B := uint8(0)
  // 算出 B 的值
  for overLoadFactor(hint, B) {
    B++
  }
  h.B = B

  // allocate initial hash table
  // if B == 0, the buckets field is allocated lazily later (in mapassign)
  // If hint is large zeroing this memory could take a while.
  if h.B != 0 {
    var nextOverflow *bmap
    // 创建一些 bucket 和 overflow bucket
    h.buckets, nextOverflow = makeBucketArray(t, h.B, nil)
    if nextOverflow != nil {
      h.extra = new(mapextra)
      // overflow bucket 保存在 extra 中
      h.extra.nextOverflow = nextOverflow
    }
  }

  return h
}

map 初始结构：

go 高并发下的数据结构

hash 值通过 hash0 和 key 计算出来
tophash 取 hash 值的高 8 位
放第几个 bucket，通过 B 来判断，B = 3 表示取 hash 值的低 3 位，然后把这 3 位换算成十进制

go 高并发下的数据结构

开放寻址法

计算 a:A 的 hash(根据数组长度取模)，假如结果为 1，就会被放入 1 的 slot 中，如果 1 已经被占用了，那么就往后放，后面也被占用了，就接着往后放，直到有个空的 slot

go 高并发下的数据结构

拉链法

计算 a:A 的 hash(根据数组长度取模)如果结果为 1，就在 1 的下面挂一个 bucket 组成一个链表，在拉链法中，slot 不直接存放数据，同一个 bucket，它的 hash 值一样

go 高并发下的数据结构

map 扩容

map 溢出桶太多时，会导致性能严重下降

go 高并发下的数据结构

go 为了提高性能，会对 map 进行扩容

func mapassign(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {
  // If we hit the max load factor or we have too many overflow buckets,
  // and we're not already in the middle of growing, start growing.
  if !h.growing() && (overLoadFactor(h.count+1, h.B) || tooManyOverflowBuckets(h.noverflow, h.B)) {
    hashGrow(t, h)
    goto again // Growing the table invalidates everything, so try again
  }
}

map 扩容的时机：

装载因子超过 6.5（平均每个槽 6.5 个 key）
使用了太多溢出桶（溢出桶超过了普通桶）

map 扩容步骤一：

创建一组新桶
oldbuckets 指向原有的桶数组
buckets 指向新的桶数组

map 标记为扩容状态

扩容方法：

func hashGrow(t *maptype, h *hmap) {
  // If we've hit the load factor, get bigger.
  // Otherwise, there are too many overflow buckets,
  // so keep the same number of buckets and "grow" laterally.
  bigger := uint8(1)
  if !overLoadFactor(h.count+1, h.B) {
    bigger = 0
    h.flags |= sameSizeGrow
  }
  // 将 oldbuckets 指向原有的 bucket
  oldbuckets := h.buckets
  // 创建新的 bucket
  newbuckets, nextOverflow := makeBucketArray(t, h.B+bigger, nil)

  flags := h.flags &^ (iterator | oldIterator)
  if h.flags&iterator != 0 {
    flags |= oldIterator
  }
  // commit the grow (atomic wrt gc)
  // 将 B 更新为扩容后的大小
  h.B += bigger
  // 做标记
  h.flags = flags
  // 将 oldbuckets 指向原有的 bucket
  h.oldbuckets = oldbuckets
  // 将 buckets 指向新的 bucket
  h.buckets = newbuckets
  h.nevacuate = 0
  h.noverflow = 0

  // 如果有溢出桶，将溢出桶放到 extra.oldoverflow 中
  if h.extra != nil && h.extra.overflow != nil {
    // Promote current overflow buckets to the old generation.
    if h.extra.oldoverflow != nil {
      throw("oldoverflow is not nil")
    }
    h.extra.oldoverflow = h.extra.overflow
    h.extra.overflow = nil
  }
  if nextOverflow != nil {
    if h.extra == nil {
      h.extra = new(mapextra)
    }
    h.extra.nextOverflow = nextOverflow
  }

  // the actual copying of the hash table data is done incrementally
  // by growWork() and evacuate().
}

map 扩容步骤二：
1. 将所有的数据从旧的 bucket 中迁移到新的 bucket 中
2. 采用渐进式的方式迁移
3. 每次操作一个旧的 bucket 时，将旧的 bucket 中的数据迁移到新的 bucket 中
4. 读取时不进行迁移，只判断读取新的 bucket 还是旧的 bucket
map 扩容步骤三：
1. 所有旧的 buckets 都迁移完之后
2. oldBuckets 进行回收

map 并发问题

map 是存在并发问题的，go 为了解决这个问题，使用 锁 来解决

但是锁的问题会影响性能，go 为了提高性能，使用了 sync.Map 来解决这个问题

在 sync.Map 中，一套 value 有两套 key：一套是 read map，另一套是 dirty map

结构如下所示：

go 高并发下的数据结构

读写默认走 read map 结构体，大致过程如下

go 高并发下的数据结构

追加一个 key-value 时过程如下

追加时，不知道 key 是否存在，所以会先走 read map
在 read map 中没有找到 d 这个 key，说明要追加这个 key-value
回到 sync.Map，给它加一个锁 mu，这个锁是用来锁 dirty map，上锁之后就可以安全的操作 dirty map，同时只有一个协程能操作 dirty map
在 dirty map 中追加 d 这个 key-value
同时把 read map 中的 amended 设置为 true，表示 read map 中的数据已经过时了，最新的数据在 dirty map 中

go 高并发下的数据结构

追加后的读写操作过程如下

读取 d 时，先走 read map，在 read map 没有发现 d 可以 key，然后回过来发现 read map 中 amended 为 true，说明 dirty map 中的 d 值应该在 dirty map 中
进入 dirty map 把 d 对应的值读取出来
将 sync.Map 中的 misses 加 1，表示有一个 key 在 read map 中没有找到，需要到 dirty map 中找

go 高并发下的数据结构

dirty 提升

随着业务不断的进行 misses 的值会不断的增加，当 misses 的值达到一定的阈值时，dirty 会提升，这个阈值是 len(dirty map)

go 高并发下的数据结构

删除一个 key-value 两情况：

正常删除
- 正常删除的意思是 dirty map 和 read map 都有这个 key，把 Pointer 置为 nil，等待 GC 回收
追加后删除（提升后被删 key，需要特殊处理）
- d 是刚刚追加的，现在还没有提升上去就要被删除了
- 在 dirty map 找到 d，将 Pointer 设置为
- dirty map 需要提升为 read map，
- dirty map 重建时，被删除的 d 不会被迁移到 dirty map，并将 read map 中的 d 指向标记为 expunged

在 sync.Map 中不会引发扩容问题用 read map，会引发扩容问题用 dirty map

sync.Map 在追加不多的时候性能好

接口隐式还是显示好

下面这段代码 c 是什么类型

type Car interface {
  Driver()
}
type Truck struct {
  Model string
}
func (t Truck) Driver() {
  fmt.Println(t.Model)
}

var c Car = Truck{}

c 变量显示指明了 Car 类型，所以 c 的类型当然是 Car 类型

那就有个问题，我实例化的是 Truck 结构体，为什么 c 上面没有 Model 属性了呢？

我们先来看看 Car 接口底层是啥

Car 接口的底层是 iface，在 runtime/runtime2.go 文件中

type iface struct {
  tab  *itab
  data unsafe.Pointer
}

iface 结构体中有两个字段，一个是 tab，一个是 data，data 是一个指针，指向 Truck 结构体的地址

tab 是一个指向 itab 结构体的指针，结构如下：

// layout of Itab known to compilers
// allocated in non-garbage-collected memory
// Needs to be in sync with
// ../cmd/compile/internal/reflectdata/reflect.go:/^func.WriteTabs.
type itab struct {
  inter *interfacetype
  _type *_type
  hash  uint32 // copy of _type.hash. Used for type switches.
  _     [4]byte
  fun   [1]uintptr // variable sized. fun[0]==0 means _type does not implement inter.
}

inter：接口的类型
_type：接口装载的值具体是啥类型
fun：表示这个类型实现了哪些方法，默认是一个，在编译时，会根据实现的方法来变长

类型断言

类型断言就是这个应用，类型断言的时候，会根据 tab 中的 fun 来判断这个结构是否实现了这个接口，如果实现了就可以断言成功

var c Car = Truck{}
t := c.(Truck)

结构体和指针实现接口

Truck 值类型实现 Car 接口，go 会帮我们自动实现 Truck 指针类型实现 Car 接口

type Car interface {
  Driver()
}
type Truck struct {
  Model string
}
func (t Truck) Driver() {
  fmt.Println(t.Model)
}
// go 会隐式帮我们实现 Truck 指针类型实现 Car 接口
// func (t *Truck) Driver() {
//   fmt.Println(t.Model)
// }

空接口

go 给空接口一个全新的类型 eface，在 runtime/runtime2.go 文件中

eface 结构体中有两个字段：

_type：是一个指针，和 iface.tab._type 一样，其他信息，对于 eface 来说是没有用的
data：是一个指针，和 iface 的 data 一样

type eface struct {
  _type *_type
  data  unsafe.Pointer
}

nil，空结构体，空接口的区别

在 builtin/builtin.go 文件中，我们会看到下面这个代码：

// nil is a predeclared identifier representing the zero value for a
// pointer, channel, func, interface, map, or slice type.
var nil Type // Type must be a pointer, channel, func, interface, map, or slice type

由上面代码可知 nil 是有类型，它是 pointer、channel、func、interface、map、slice 这六种类型的零值

要注意两点：

nil 不是结构体的零值

var c struct{}
fmt.Println(c == nil) // 编译直接报错

不同类型的 nil 不能比较

var a *int
var b map[string]string
fmt.Println(a == b) // 编译直接报错

空接口不一定是 nil 接口，两个属性都是 nil 接口才是 nil 接口，因为空接口是 eface 类型，它的 type 和 data 都是空，但是下面的代码将 *int 赋值给 a，导致 a 的type 不是空了，而是*int 类型

var a interface{}
var b *int
a = b
fmt.Println(a == nil)  // fasle

内存对齐

为了方便内存对齐，go 提供的对齐系数
```
unsafe.Alignof()
```
对齐系数的含义是：变量的内存地址必须被对齐系数整除
如果对齐系数为 4，表示变量内存地址必须是 4 的倍数

通过下面代码我们可以看到，基础数据类型的对齐系数，和它自己本身的长度是一样的

fmt.Printf("bool size: %d align: %d\n", unsafe.Sizeof(bool(true)), unsafe.Alignof(bool(true))) // bool size: 1 align: 1
fmt.Printf("byte size: %d align: %d\n", unsafe.Sizeof(byte(0)), unsafe.Alignof(byte(0))) // byte size: 1 align: 1
fmt.Printf("int8 size: %d align: %d\n", unsafe.Sizeof(int8(0)), unsafe.Alignof(int8(0))) // int8 size: 1 align: 1
fmt.Printf("int16 size: %d align: %d\n", unsafe.Sizeof(int16(0)), unsafe.Alignof(int16(0))) // int16 size: 2 align: 2
fmt.Printf("int32 size: %d align: %d\n", unsafe.Sizeof(int32(0)), unsafe.Alignof(int32(0))) // int32 size: 4 align: 4
fmt.Printf("int64 size: %d align: %d\n", unsafe.Sizeof(int64(0)), unsafe.Alignof(int64(0))) // int64 size: 8 align: 8

string 类型的对齐系数是 8，string 类型的长度是 16

fmt.Printf("string size: %d align: %d\n", unsafe.Sizeof(string("uccs")), unsafe.Alignof(string("uccs"))) // string size: 16 align: 8

结构体对齐

结构体对齐分为内部对齐和结构体之间对齐

内部对齐：考虑成员大小和成员的对齐系数
- 结构体内部成员的相对位置（偏移量）
- 每个成员的偏移量是自身大小与其对齐系数较小值的倍数
结构体填充：考虑自身对齐系数和系统字长
- 结构体增加长度，对齐系统字长
- 结构体长度是最大成员长度与系统字长较小的整数倍

type Demo struct {
  a bool // 大小 1，对齐系数 1
  b string // 大小 16，对齐系数 8
  c int16 // 大小 2，对齐系数 2
}

每个成员的偏移量是自身大小与其对齐系数较小值的倍数的意思是：size 和 align 取最小值，然后取这个值的倍数

从什么基准位置偏移呢？就是结构体自己的地址，结构体自己的地址是内部第一个成员的地址

bool 的 size 和 align 都是 1，所以 a 的偏移量是 1
string 的 size 是 16，align 是 8，所以 b 的偏移量是 8
int16 的 size 是 2，align 是 2，所以 c 的偏移量是 24

所以结构体 Demo 在内存中的分布如下图：

go 高并发下的数据结构

结构体成员在内存中的顺序是按照定义的顺序来的，c 不会被放入 a 后面的空位，所以 a 和 b 之间的空位需要补 0

由于系统字长是 64bit，所以 c 后面需要补 0，补 0 的个数是结构体内部最大长度和系统字长较小那个的整数倍，这是内存对齐

结构体的对齐系数是结构体内部最大对齐系数，所以 Demo 结构体的对齐系数是 8，所以 a 必须分配在 0、8、16 等位置上

为了节省内存空间，可以将 c 放在 a 后面，这样 c 就可以被分配在 a 后面的空位上，最后可以省下 8 字节

type Demo struct {
  a bool // 大小 1，对齐系数 1
  c int16 // 大小 2，对齐系数 2
  b string // 大小 16，对齐系数 8
}

go 高并发下的数据结构

如果一个结构体中有空结构体，内存是怎么分配的呢？

空结构体独立存在时，是不在内存空间，内存地址是 zerobase，但是空结构体如果在其他结构体内部，是会根据其他结构体的对齐系数来分配内存的

type Demo struct {
  a bool // 大小 1，对齐系数 1
  z struct{}
  c int16 // 大小 2，对齐系数 2
  b string // 大小 16，对齐系数 8
}

z 在 a 的后面，它的地址就跟随 a 地址分配地址

go 高并发下的数据结构

如果空结构体出现在其他结构体的最后，就需要补齐字长

type Demo struct {
  a bool // 大小 1，对齐系数 1
  c int16 // 大小 2，对齐系数 2
  b string // 大小 16，对齐系数 8
  z struct{}
}

因为 string 是 Demo 这个结构体中最后一个有意义的类型，内存就分配到 0x18，后面可能会有其他的数据，如果这个时候有一个空结构体在最后，就需要补齐字长，避免空结构体的地址和其他数据的地址重叠

go 高并发下的数据结构

type User struct {
  A int32   // size 4 align 4
  B []int32 // size 24 align 8
  C string  // size 16 align 8
  D bool    // size 1 align 1
  E struct{}
}

User 对齐系数是 8，占用 56 个字节
A 占用 4 个字节，补 4 个字节的 0
B 占用 24 个字节
C 占用 16 个字节
D 占用 1 个字节，后面补 7 个字节的 0
E 占用 0 个字节，紧跟在 D 后面

go 高并发下的数据结构

转载自:https://juejin.cn/post/7383983223239196713