likes
comments
collection
share

听说map删除元素占用竟然不会减少?

作者站长头像
站长
· 阅读数 5

引言

知道了map的底层实现,下面这些问题也就水落石出啦~

  • map底层实现是什么?
  • map什么情况下会扩容,扩容原理是什么?
  • map为什么有时候会报并发读写错误?
  • 未初始化的map读取会发生什么?
  • map遍历顺序为什么是随机的?
  • map删除元素 内存占用会减少吗?

构成

hmap

map的底层实现

  • count len(map) map中已经存放了多少元素
  • flag 标志map所处于的阶段 下文有具体解释
  • B 2^B 代表桶的个数 一个桶中可以保存8对键值对
  • noverflow 溢出桶的数量
  • hash0 hash计算的种子 map的初始化的时候会使用随机值fastrand()作为hash种子
  • buckets 存放桶的数组 桶的实现是bmap
  • oldbuckets 扩容场景下 oldbuckets 存放的是扩容前的桶数组,buckets存放的是扩容完成的桶数组。
  • nevacuate 已经完成迁移操作的桶的下标 因为扩容操作是渐进式的,所以需要保存迁移进度
  • extra 一些额外的信息 mapextra 具体见下文
// A header for a Go map.
type hmap struct {
    // Note: the format of the hmap is also encoded in cmd/compile/internal/reflectdata/reflect.go.
    // Make sure this stays in sync with the compiler's definition.
    count     int // # live cells == size of map.  Must be first (used by len() builtin)
    flags     uint8
    B         uint8  // log_2 of # of buckets (can hold up to loadFactor * 2^B items)
    noverflow uint16 // approximate number of overflow buckets; see incrnoverflow for details
    hash0     uint32 // hash seed

    buckets    unsafe.Pointer // array of 2^B Buckets. may be nil if count==0.
    oldbuckets unsafe.Pointer // previous bucket array of half the size, non-nil only when growing
    nevacuate  uintptr        // progress counter for evacuation (buckets less than this have been evacuated)

    extra *mapextra // optional fields
}

bmap

map底层中一个桶的实现

其中tophash存放的是 key hash值的高八位,用于快速定位元素

紧随其后的八组键值对 key/key/... elem/elem/elem/...

最后还有一个指针指向一个溢出桶(如果有)

也就是说 map[key]的大概流程如下(省略了部分逻辑):

  • key hash找到对应的桶下标
  • 基于hash的高八位快速判断元素是否存在该桶中
  • 如果桶中不存在,则看其是否有溢出桶,如果有则继续在溢出桶中寻找
  • 继续递归寻找,直至元素找到或者不再有溢出桶了
// A bucket for a Go map.
type bmap struct {
    // tophash generally contains the top byte of the hash value
    // for each key in this bucket. If tophash[0] < minTopHash,
    // tophash[0] is a bucket evacuation state instead.
    tophash [bucketCnt]uint8
    // Followed by bucketCnt keys and then bucketCnt elems.
    // NOTE: packing all the keys together and then all the elems together makes the
    // code a bit more complicated than alternating key/elem/key/elem/... but it allows
    // us to eliminate padding which would be needed for, e.g., map[int64]int8.
    // Followed by an overflow pointer.
}

mapextra

如果一个map中的key 和value都不包含指针 则可以标记这个map不包含指针,这样GC扫描的时候可以不去遍历整个map。但是bmap.overflow是指针(指向一个一个的溢出桶bmap),为了保证这些溢出桶存活,需要有一个地方记录其对应的指针。这就是overflow和oldoverflow的作用

nextOverflow 指向 预先分配的溢出桶的数组,当需要溢出桶的时候可以优先从这里获取,这里的桶用完了才会使用new关键字生成一个

// mapextra holds fields that are not present on all maps.
type mapextra struct {
    // If both key and elem do not contain pointers and are inline, then we mark bucket
    // type as containing no pointers. This avoids scanning such maps.
    // However, bmap.overflow is a pointer. In order to keep overflow buckets
    // alive, we store pointers to all overflow buckets in hmap.extra.overflow and hmap.extra.oldoverflow.
    // overflow and oldoverflow are only used if key and elem do not contain pointers.
    // overflow contains overflow buckets for hmap.buckets.
    // oldoverflow contains overflow buckets for hmap.oldbuckets.
    // The indirection allows to store a pointer to the slice in hiter.
    overflow    *[]*bmap
    oldoverflow *[]*bmap

    // nextOverflow holds a pointer to a free overflow bucket.
    nextOverflow *bmap
}

关键代码解释

overLoadFactor

overLoadFactor 用于判断count/ 2^B 是否大于负载因子 6.5

map的实现中一个桶内可以存放8组 kv键值对,为什么负载因子设置为6.5呢?

该值设置的太大,会导致产生更多的溢出桶 ==> 特定key的时候需要检查更多的桶

设置的较小,虽然溢出桶数量减少了,但是会带来了较大的空间浪费

权衡之下,6.5是一个不错的选择

// Picking loadFactor: too large and we have lots of overflow
// buckets, too small and we waste a lot of space. I wrote
// a simple program to check some stats for different loads:
// (64-bit, 8 byte keys and elems)
//  loadFactor    %overflow  bytes/entry     hitprobe    missprobe
//        4.00         2.13        20.77         3.00         4.00
//        4.50         4.05        17.30         3.25         4.50
//        5.00         6.85        14.77         3.50         5.00
//        5.50        10.55        12.94         3.75         5.50
//        6.00        15.27        11.67         4.00         6.00
//        6.50        20.90        10.79         4.25         6.50
//        7.00        27.14        10.15         4.50         7.00
//        7.50        34.03         9.73         4.75         7.50
//        8.00        41.10         9.40         5.00         8.00
//
// %overflow   = percentage of buckets which have an overflow bucket
// bytes/entry = overhead bytes used per key/elem pair
// hitprobe    = # of entries to check when looking up a present key
// missprobe   = # of entries to check when looking up an absent key
 // overLoadFactor reports whether count items placed in 1<<B buckets is over loadFactor.  func  overLoadFactor (count int , B uint8 )  bool {  return count > bucketCnt && uintptr (count) > loadFactorNum*(bucketShift(B)/loadFactorDen) }

// bucketShift returns 1<<b, optimized for code generation.
func bucketShift(b uint8) uintptr {
    // Masking the shift amount allows overflow checks to be elided.
    // 1 << (b&(0xFF))
    return uintptr(1) << (b & (goarch.PtrSize*8 - 1))
}

const (
    // Maximum number of key/elem pairs a bucket can hold.
    bucketCntBits = 3
    bucketCnt     = 1 << bucketCntBits. // 8

    // Maximum average load of a bucket that triggers growth is 6.5.
    // Represent as loadFactorNum/loadFactorDen, to allow integer math.
    loadFactorNum = 13
    loadFactorDen = 2
)

tooManyOverflowBuckets

判断溢出桶是否过多

一般来讲溢出桶内存使用是稀疏的(因为如果不是稀疏,早就会触发overLoadFactor扩容),所以过多的溢出桶会导致有很多内存得不到使用。所以过多的溢出桶应该触发等容量迁移,将map中的元素重新整理

// tooManyOverflowBuckets reports whether noverflow buckets is too many for a map with 1<<B buckets.
// Note that most of these overflow buckets must be in sparse use;
// if use was dense, then we'd have already triggered regular map growth.
func tooManyOverflowBuckets(noverflow uint16, B uint8) bool {
    // If the threshold is too low, we do extraneous work.
    // If the threshold is too high, maps that grow and shrink can hold on to lots of unused memory.
    // "too many" means (approximately) as many overflow buckets as regular buckets.
    // See incrnoverflow for more details.
    if B > 15 {
        B = 15
    }
    // The compiler doesn't see here that B < 16; mask B to generate shorter shift code.
    return noverflow >= uint16(1)<<(B&15)
}

tophash

截取hash值的高八位

如果计算出来的结果小于5 则人为地加上5。因为源码实现保留了一些值作为特定的标志位

  • emptyRest = 0 该单元为空(特指一个桶中的一个单元),并且该桶的更高索引上也没有元素,该桶的后面也没有溢出桶
  • emptyOne = 1 该单元为空
  • evacuatedX = 2 该单元有数据,且该数据已迁移到更大表的前半部分(扩容场景)
  • evacuatedY = 3 与上述相同,但迁移到更大表的后半部分(扩容场景)
  • evacuatedEmpty = 4 该单元为空,且该桶已经完成迁移
  • minTopHash = 5 正常计算出来的最小 tophash
// tophash calculates the tophash value for hash.
func tophash(hash uintptr) uint8 {
    top := uint8(hash >> (goarch.PtrSize*8 - 8))
    if top < minTopHash {
        top += minTopHash
    }
    return top
}

// Possible tophash values. We reserve a few possibilities for special marks.
// Each bucket (including its overflow buckets, if any) will have either all or none of its
// entries in the evacuated* states (except during the evacuate() method, which only happens
// during map writes and thus no one else can observe the map during that time).
emptyRest      = 0 // this cell is empty, and there are no more non-empty cells at higher indexes or overflows.
emptyOne       = 1 // this cell is empty
evacuatedX     = 2 // key/elem is valid.  Entry has been evacuated to first half of larger table.
evacuatedY     = 3 // same as above, but evacuated to second half of larger table.
evacuatedEmpty = 4 // cell is empty, bucket is evacuated.
minTopHash     = 5 // minimum tophash for a normal filled cell.

isEmpty

通过判断tophash 中的值 是否小于emptyOne 从而判断bmap中对应槽位是否为空

为什么可以这么判断 详见 tophash

// isEmpty reports whether the given tophash array entry represents an empty bucket entry.
func isEmpty(x uint8) bool {
    return x <= emptyOne
}

flags

底层实现中通过flag标志来代表map处于的阶段

  • iterator(1) map正在被迭代(for ... range map)

  • oldIterator(2) 正在oldbuckets迭代(for ... range map),说明扩容阶段也正在进行

  • hashWriting(4) map正在被写入元素,其他并发读请求如果观察到这个flag会报并发读写错误,程序异常退出。因为该map不是并发安全的,存在并发读写说明该map被滥用

  • sameSizeGrow(8) 说明map正在等容量迁移。具体来讲,触发map的扩容有两种情况:

    • 溢出桶过多(map找key需要遍历更多的桶),会触发等容量迁移(容量不变,元素重新分配)
    • count/(1^B) 超出负载因子6.5,说明容量不足,桶的数量会变为原来的两倍
    // flags
    iterator     = 1 // there may be an iterator using buckets
    oldIterator  = 2 // there may be an iterator using oldbuckets
    hashWriting  = 4 // a goroutine is writing to the map
    sameSizeGrow = 8 // the current map growth is to a new map of the same size

evacuated

判断一个桶是否已经完成迁移

具体见 tophash 中的解释

func evacuated(b *bmap) bool {
    h := b.tophash[0]
    return h > emptyOne && h < minTopHash
}

makeBucketArray

给定B, 初始化一个bmap数组

关键点

  • B<4 时(也就是说桶的数量小于16时),不额外预先分配溢出桶
  • B>4时,预分配溢出桶的数量为 1<<(B-4)
  • 正常的bmap数组和预分配的溢出桶,是一块连续的内存地址
// makeBucketArray initializes a backing array for map buckets.
// 1<<b is the minimum number of buckets to allocate.
// dirtyalloc should either be nil or a bucket array previously
// allocated by makeBucketArray with the same t and b parameters.
// If dirtyalloc is nil a new backing array will be alloced and
// otherwise dirtyalloc will be cleared and reused as backing array.
func makeBucketArray(t *maptype, b uint8, dirtyalloc unsafe.Pointer) (buckets unsafe.Pointer, nextOverflow *bmap) {
    base := bucketShift(b)
    nbuckets := base
    // For small b, overflow buckets are unlikely.
    // Avoid the overhead of the calculation.
    if b >= 4 {
        // Add on the estimated number of overflow buckets
        // required to insert the median number of elements
        // used with this value of b.
        nbuckets += bucketShift(b - 4)
        sz := t.bucket.size * nbuckets
        up := roundupsize(sz)
        if up != sz {
            nbuckets = up / t.bucket.size
        }
    }

    if dirtyalloc == nil {
        buckets = newarray(t.bucket, int(nbuckets))
    } else {
        // dirtyalloc was previously generated by
        // the above newarray(t.bucket, int(nbuckets))
        // but may not be empty.
        buckets = dirtyalloc
        size := t.bucket.size * nbuckets
        if t.bucket.ptrdata != 0 {
            memclrHasPointers(buckets, size)
        } else {
            memclrNoHeapPointers(buckets, size)
        }
    }

    if base != nbuckets {
        // We preallocated some overflow buckets.
        // To keep the overhead of tracking these overflow buckets to a minimum,
        // we use the convention that if a preallocated overflow bucket's overflow
        // pointer is nil, then there are more available by bumping the pointer.
        // We need a safe non-nil pointer for the last overflow bucket; just use buckets.
        nextOverflow = (*bmap)(add(buckets, base*uintptr(t.bucketsize)))
        last := (*bmap)(add(buckets, (nbuckets-1)*uintptr(t.bucketsize)))
        last.setoverflow(t, (*bmap)(buckets))
    }
    return buckets, nextOverflow
}

func (b *bmap) setoverflow(t *maptype, ovf *bmap) {
    *(**bmap)(add(unsafe.Pointer(b), uintptr(t.bucketsize)-goarch.PtrSize)) = ovf
}

sameSizeGrow

用于判断 map是不是处于 等容量迁移阶段(溢出桶过多导致的)

// sameSizeGrow reports whether the current growth is to a map of the same size.
func (h *hmap) sameSizeGrow() bool {
    return h.flags&sameSizeGrow != 0
}

growing

用于判断 map是不是处于迁移阶段(等容量迁移/扩容迁移)

// growing reports whether h is growing. The growth may be to the same size or bigger.
func (h *hmap) growing() bool {
    return h.oldbuckets != nil
}

hashGrow

  1. 如果是因为overLoadFactor 触发的扩容 ,则申请的内存为原来两倍
  2. 如果是溢出桶过多触发的迁移,申请的内存与原来一致

该方法只是完成内存的申请与一些变量的赋值,但是并没有真正地触发桶的迁移

  • 如果是因为overLoadFactor触发的扩容,则设置flag |= sameSizeGrow
  • iterator -> oldIterator 如此迭代器便能够感知到map触发了迁移
  • 设置期望的 B
  • 清空统计的溢出桶数量noverflow
  • 设置当前桶迁移进度nevacuate为0
  • ...
func hashGrow(t *maptype, h *hmap) {
    // If we've hit the load factor, get bigger.
    // Otherwise, there are too many overflow buckets,
    // so keep the same number of buckets and "grow" laterally.
    bigger := uint8(1)
    if !overLoadFactor(h.count+1, h.B) {
        bigger = 0
        h.flags |= sameSizeGrow
    }
    oldbuckets := h.buckets
    newbuckets, nextOverflow := makeBucketArray(t, h.B+bigger, nil)

    flags := h.flags &^ (iterator | oldIterator)
    if h.flags&iterator != 0 {
        flags |= oldIterator
    }
    // commit the grow (atomic wrt gc)
    h.B += bigger
    h.flags = flags
    h.oldbuckets = oldbuckets
    h.buckets = newbuckets
    h.nevacuate = 0
    h.noverflow = 0

    if h.extra != nil && h.extra.overflow != nil {
        // Promote current overflow buckets to the old generation.
        if h.extra.oldoverflow != nil {
            throw("oldoverflow is not nil")
        }
        h.extra.oldoverflow = h.extra.overflow
        h.extra.overflow = nil
    }
    if nextOverflow != nil {
        if h.extra == nil {
            h.extra = new(mapextra)
        }
        h.extra.nextOverflow = nextOverflow
    }

    // the actual copying of the hash table data is done incrementally
    // by growWork() and evacuate().
}

growWork

真正开始执行桶的迁移

先迁移传入的bucket下标的桶

然后在尝试迁移下一个桶,推动整个map的迁移

func growWork(t *maptype, h *hmap, bucket uintptr) {
    // make sure we evacuate the oldbucket corresponding
    // to the bucket we're about to use
    evacuate(t, h, bucket&h.oldbucketmask())

    // evacuate one more oldbucket to make progress on growing
    if h.growing() {
        evacuate(t, h, h.nevacuate)
    }
}

迁移的逻辑位于 evacuate

  • 如果是等量迁移,则直接计算对应的hash值,往对应的桶中放置元素
  • 如果是扩容迁移,计算完hash值后,决定其最终放置在哪个桶

比如原来容量为4 hash值 0b111 和0b011 都在0b11号桶(hash & 0b11)

扩容后容量为8 hash值 0b111 和0b011 分别存放在 0b111号桶和0b11号桶(hash & 0b111)

  • 当该桶以及其溢出桶都迁移完毕,Unlink the overflow buckets & clear key/elem to help GC.(这里只是清空了 桶中的键值对以及指向溢出桶的指针,使得这些变量可以被GC回收,但是桶本身占据的内存空间并没有被回收)
  • oldbuckets中的所有桶都完成迁移后,才会设置 h.oldbuckets = nil。这时h.oldbuckets指向的内存空间才可以真正被释放
func evacuate(t *maptype, h *hmap, oldbucket uintptr) {
    b := (*bmap)(add(h.oldbuckets, oldbucket*uintptr(t.bucketsize)))
    newbit := h.noldbuckets()
    if !evacuated(b) {
        // TODO: reuse overflow buckets instead of using new ones, if there
        // is no iterator using the old buckets.  (If !oldIterator.)

        // xy contains the x and y (low and high) evacuation destinations.
        var xy [2]evacDst
        x := &xy[0]
        x.b = (*bmap)(add(h.buckets, oldbucket*uintptr(t.bucketsize)))
        x.k = add(unsafe.Pointer(x.b), dataOffset)
        x.e = add(x.k, bucketCnt*uintptr(t.keysize))

        if !h.sameSizeGrow() {
            // Only calculate y pointers if we're growing bigger.
            // Otherwise GC can see bad pointers.
            y := &xy[1]
            y.b = (*bmap)(add(h.buckets, (oldbucket+newbit)*uintptr(t.bucketsize)))
            y.k = add(unsafe.Pointer(y.b), dataOffset)
            y.e = add(y.k, bucketCnt*uintptr(t.keysize))
        }

        for ; b != nil; b = b.overflow(t) {
            k := add(unsafe.Pointer(b), dataOffset)
            e := add(k, bucketCnt*uintptr(t.keysize))
            for i := 0; i < bucketCnt; i, k, e = i+1, add(k, uintptr(t.keysize)), add(e, uintptr(t.elemsize)) {
                top := b.tophash[i]
                if isEmpty(top) {
                    b.tophash[i] = evacuatedEmpty
                    continue
                }
                if top < minTopHash {
                    throw("bad map state")
                }
                k2 := k
                if t.indirectkey() {
                    k2 = *((*unsafe.Pointer)(k2))
                }
                var useY uint8
                if !h.sameSizeGrow() {
                    // Compute hash to make our evacuation decision (whether we need
                    // to send this key/elem to bucket x or bucket y).
                    hash := t.hasher(k2, uintptr(h.hash0))
                    if h.flags&iterator != 0 && !t.reflexivekey() && !t.key.equal(k2, k2) {
                        // If key != key (NaNs), then the hash could be (and probably
                        // will be) entirely different from the old hash. Moreover,
                        // it isn't reproducible. Reproducibility is required in the
                        // presence of iterators, as our evacuation decision must
                        // match whatever decision the iterator made.
                        // Fortunately, we have the freedom to send these keys either
                        // way. Also, tophash is meaningless for these kinds of keys.
                        // We let the low bit of tophash drive the evacuation decision.
                        // We recompute a new random tophash for the next level so
                        // these keys will get evenly distributed across all buckets
                        // after multiple grows.
                        useY = top & 1
                        top = tophash(hash)
                    } else {
                        if hash&newbit != 0 {
                            useY = 1
                        }
                    }
                }

                if evacuatedX+1 != evacuatedY || evacuatedX^1 != evacuatedY {
                    throw("bad evacuatedN")
                }

                b.tophash[i] = evacuatedX + useY // evacuatedX + 1 == evacuatedY
                dst := &xy[useY]                 // evacuation destination

                if dst.i == bucketCnt {
                    dst.b = h.newoverflow(t, dst.b)
                    dst.i = 0
                    dst.k = add(unsafe.Pointer(dst.b), dataOffset)
                    dst.e = add(dst.k, bucketCnt*uintptr(t.keysize))
                }
                dst.b.tophash[dst.i&(bucketCnt-1)] = top // mask dst.i as an optimization, to avoid a bounds check
                if t.indirectkey() {
                    *(*unsafe.Pointer)(dst.k) = k2 // copy pointer
                } else {
                    typedmemmove(t.key, dst.k, k) // copy elem
                }
                if t.indirectelem() {
                    *(*unsafe.Pointer)(dst.e) = *(*unsafe.Pointer)(e)
                } else {
                    typedmemmove(t.elem, dst.e, e)
                }
                dst.i++
                // These updates might push these pointers past the end of the
                // key or elem arrays.  That's ok, as we have the overflow pointer
                // at the end of the bucket to protect against pointing past the
                // end of the bucket.
                dst.k = add(dst.k, uintptr(t.keysize))
                dst.e = add(dst.e, uintptr(t.elemsize))
            }
        }
        // Unlink the overflow buckets & clear key/elem to help GC.
        if h.flags&oldIterator == 0 && t.bucket.ptrdata != 0 {
            b := add(h.oldbuckets, oldbucket*uintptr(t.bucketsize))
            // Preserve b.tophash because the evacuation
            // state is maintained there.
            ptr := add(b, dataOffset)
            n := uintptr(t.bucketsize) - dataOffset
            // memclrHasPointers clears n bytes of typed memory starting at ptr.
            memclrHasPointers(ptr, n)
        }
    }

    if oldbucket == h.nevacuate {
        // 当 oldbuckets中的所有桶都完成迁移后,才会设置 h.oldbuckets = nil
        // 这时h.oldbuckets指向的内存空间才可以真正被释放
        advanceEvacuationMark(h, t, newbit)
    }
}

func advanceEvacuationMark(h *hmap, t *maptype, newbit uintptr) {
    h.nevacuate++
    // Experiments suggest that 1024 is overkill by at least an order of magnitude.
    // Put it in there as a safeguard anyway, to ensure O(1) behavior.
    stop := h.nevacuate + 1024
    if stop > newbit {
        stop = newbit
    }
    for h.nevacuate != stop && bucketEvacuated(t, h, h.nevacuate) {
        h.nevacuate++
    }
    if h.nevacuate == newbit { // newbit == # of oldbuckets
        // Growing is all done. Free old main bucket array.
        h.oldbuckets = nil
        // Can discard old overflow buckets as well.
        // If they are still referenced by an iterator,
        // then the iterator holds a pointers to the slice.
        if h.extra != nil {
            h.extra.oldoverflow = nil
        }
        h.flags &^= sameSizeGrow  // 清除 扩容标识
    } 
}

newoverflow

初始化一个可用的bmap溢出桶

优先尝试从预先分配的溢出桶数组中拿(参看makeBucketArray)

如果上述用尽,才调用newobject 初始化一个

func (h *hmap) newoverflow(t *maptype, b *bmap) *bmap {
    var ovf *bmap
    if h.extra != nil && h.extra.nextOverflow != nil {
        // We have preallocated overflow buckets available.
        // See makeBucketArray for more details.
        ovf = h.extra.nextOverflow
        if ovf.overflow(t) == nil {
            // We're not at the end of the preallocated overflow buckets. Bump the pointer.
            h.extra.nextOverflow = (*bmap)(add(unsafe.Pointer(ovf), uintptr(t.bucketsize)))
        } else {
            // This is the last preallocated overflow bucket.
            // Reset the overflow pointer on this bucket,
            // which was set to a non-nil sentinel value.
            ovf.setoverflow(t, nil)
            h.extra.nextOverflow = nil
        }
    } else {
        ovf = (*bmap)(newobject(t.bucket))
    }
    h.incrnoverflow()
    if t.bucket.ptrdata == 0 {
        h.createOverflow()
        *h.extra.overflow = append(*h.extra.overflow, ovf)
    }
    b.setoverflow(t, ovf)
    return ovf
}

核心实现

创建map

make(map[k]v, hint)

make(map[k]v)

主要流程

  • 根据传入的hint计算出合适的B(参看overLoadFactor)
  • 使用随机数初始化一个hash种子,用于后续hash计算
  • 如果计算出来的B==0,则先不初始化对应的bmap数组。如果B>0则申请一块连续的内存空间(参看makeBucketArray),初始化buckets(用于存放bmap数组)和extra.nextOverflow(预先分配的溢出桶)
// makemap implements Go map creation for make(map[k]v, hint).
// If the compiler has determined that the map or the first bucket
// can be created on the stack, h and/or bucket may be non-nil.
// If h != nil, the map can be created directly in h.
// If h.buckets != nil, bucket pointed to can be used as the first bucket.
func makemap(t *maptype, hint int, h *hmap) *hmap {
    mem, overflow := math.MulUintptr(uintptr(hint), t.bucket.size)
    if overflow || mem > maxAlloc {
        hint = 0
    }

    // initialize Hmap
    if h == nil {
        h = new(hmap)
    }
    h.hash0 = fastrand()

    // Find the size parameter B which will hold the requested # of elements.
    // For hint < 0 overLoadFactor returns false since hint < bucketCnt.
    B := uint8(0)
    for overLoadFactor(hint, B) {
        B++
    }
    h.B = B

    // allocate initial hash table
    // if B == 0, the buckets field is allocated lazily later (in mapassign)
    // If hint is large zeroing this memory could take a while.
    if h.B != 0 {
        var nextOverflow *bmap
        h.buckets, nextOverflow = makeBucketArray(t, h.B, nil)
        if nextOverflow != nil {
            h.extra = new(mapextra)
            h.extra.nextOverflow = nextOverflow
        }
    }

    return h
}

从map中Get元素

val, ok := h[key] 对应的源码实现为 mapaccess2_fat

val := h[key] 源码实现为mapaccess1_fat,原理类似不再赘述

  • 如果map为空或者map内元素数量为0,直接返回元素的默认值
  • 如果发现h.flags&hashWriting != 0 说明发生了并发读写 直接panic
  • 计算hash,找到对应的桶
  • 如果发现map正在扩容,且对应的桶没有迁移完毕,则从旧桶寻找,如果迁移完毕则从新桶寻找
  • 寻找路径为:先看第一个桶中元素是否存在,再看溢出桶是否存在,再看溢出桶的溢出桶...
func mapaccess2_fat(t *maptype, h *hmap, key, zero unsafe.Pointer) (unsafe.Pointer, bool) {
    e := mapaccess1(t, h, key)
    if e == unsafe.Pointer(&zeroVal[0]) {
        return zero, false
    }
    return e, true
}

// mapaccess1 returns a pointer to h[key].  Never returns nil, instead
// it will return a reference to the zero object for the elem type if
// the key is not in the map.
// NOTE: The returned pointer may keep the whole map live, so don't
// hold onto it for very long.
func mapaccess1(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {
   // 省略一些非核心代码
    if h == nil || h.count == 0 {
        if t.hashMightPanic() {
            t.hasher(key, 0) // see issue 23734
        }
        return unsafe.Pointer(&zeroVal[0])
    }
    if h.flags&hashWriting != 0 {
        fatal("concurrent map read and map write")
    }
    hash := t.hasher(key, uintptr(h.hash0))
    m := bucketMask(h.B)
    b := (*bmap)(add(h.buckets, (hash&m)*uintptr(t.bucketsize)))
    if c := h.oldbuckets; c != nil {
        if !h.sameSizeGrow() {
            // There used to be half as many buckets; mask down one more power of two.
            m >>= 1
        }
        oldb := (*bmap)(add(c, (hash&m)*uintptr(t.bucketsize)))
        if !evacuated(oldb) {
            b = oldb
        }
    }
    top := tophash(hash)
bucketloop:
    for ; b != nil; b = b.overflow(t) {
        for i := uintptr(0); i < bucketCnt; i++ {
            if b.tophash[i] != top {
                if b.tophash[i] == emptyRest {
                    break bucketloop
                }
                continue
            }
            k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
            if t.indirectkey() {
                k = *((*unsafe.Pointer)(k))
            }
            if t.key.equal(key, k) {
                e := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))
                if t.indirectelem() {
                    e = *((*unsafe.Pointer)(e))
                }
                return e
            }
        }
    }
    return unsafe.Pointer(&zeroVal[0])
}

向map中Set元素

h[key] = val 对应的源码实现为 mapassign

  • 如果map未初始化则直接panic
  • 如果发现h.flags&hashWriting != 0 说明发生了并发写 直接panic
  • 如果发现对应的桶未初始化,则先初始化之(对应于初始化map的时候未指定长度或者指定长度为0,在此处延迟创建)
  • 计算hash,找到对应的桶,如果发现map正在迁移,则调用growWork触发当前桶和隔壁桶的迁移工作
  • 如果发现map需要迁移(溢出桶过多tooManyOverflowBuckets或者overLoadFactor),会先触发map的迁移(具体参看hashGrow)
  • 遍历桶以及溢出桶,找到可插入的位置(已存在或者找到一个空闲可用的槽),如果没找到说明槽位都满了,初始化一个新的bmap(具体见newoverflow)使用之
// Like mapaccess, but allocates a slot for the key if it is not present in the map.
func mapassign(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {
    if h == nil {
        panic(plainError("assignment to entry in nil map"))
    }
    // 省略一些非核心代码
    if h.flags&hashWriting != 0 {
        fatal("concurrent map writes")
    }
    hash := t.hasher(key, uintptr(h.hash0))

    // Set hashWriting after calling t.hasher, since t.hasher may panic,
    // in which case we have not actually done a write.
    h.flags ^= hashWriting

    if h.buckets == nil {
        h.buckets = newobject(t.bucket) // newarray(t.bucket, 1)
    }

again:
    bucket := hash & bucketMask(h.B)
    if h.growing() {
        growWork(t, h, bucket)
    }
    b := (*bmap)(add(h.buckets, bucket*uintptr(t.bucketsize)))
    top := tophash(hash)

    var inserti *uint8
    var insertk unsafe.Pointer
    var elem unsafe.Pointer
bucketloop:
    for {
        for i := uintptr(0); i < bucketCnt; i++ {
            if b.tophash[i] != top {
                if isEmpty(b.tophash[i]) && inserti == nil {
                    inserti = &b.tophash[i]
                    insertk = add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
                    elem = add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))
                }
                if b.tophash[i] == emptyRest {
                    break bucketloop
                }
                continue
            }
            k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
            if t.indirectkey() {
                k = *((*unsafe.Pointer)(k))
            }
            if !t.key.equal(key, k) {
                continue
            }
            // already have a mapping for key. Update it.
            if t.needkeyupdate() {
                typedmemmove(t.key, k, key)
            }
            elem = add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))
            goto done
        }
        ovf := b.overflow(t)
        if ovf == nil {
            break
        }
        b = ovf
    }

    // Did not find mapping for key. Allocate new cell & add entry.

    // If we hit the max load factor or we have too many overflow buckets,
    // and we're not already in the middle of growing, start growing.
    if !h.growing() && (overLoadFactor(h.count+1, h.B) || tooManyOverflowBuckets(h.noverflow, h.B)) {
        hashGrow(t, h)
        goto again // Growing the table invalidates everything, so try again
    }

    if inserti == nil {
        // The current bucket and all the overflow buckets connected to it are full, allocate a new one.
        newb := h.newoverflow(t, b)
        inserti = &newb.tophash[0]
        insertk = add(unsafe.Pointer(newb), dataOffset)
        elem = add(insertk, bucketCnt*uintptr(t.keysize))
    }

    // store new key/elem at insert position
    if t.indirectkey() {
        kmem := newobject(t.key)
        *(*unsafe.Pointer)(insertk) = kmem
        insertk = kmem
    }
    if t.indirectelem() {
        vmem := newobject(t.elem)
        *(*unsafe.Pointer)(elem) = vmem
    }
    typedmemmove(t.key, insertk, key)
    *inserti = top
    h.count++

done:
    if h.flags&hashWriting == 0 {
        fatal("concurrent map writes")
    }
    h.flags &^= hashWriting
    if t.indirectelem() {
        elem = *((*unsafe.Pointer)(elem))
    }
    return elem
}

map迭代

具体细节暂按下不表

  • 每次迭代 返回元素的顺序并不是固定的
// mapiterinit initializes the hiter struct used for ranging over maps.
// The hiter struct pointed to by 'it' is allocated on the stack
// by the compilers order pass or on the heap by reflect_mapiterinit.
// Both need to have zeroed hiter since the struct contains pointers.
func mapiterinit(t *maptype, h *hmap, it *hiter) {
    // ...
    it.t = t
    if h == nil || h.count == 0 {
        return
    }

    // grab snapshot of bucket state
    it.B = h.B
    it.buckets = h.buckets
    if t.bucket.ptrdata == 0 {
        // Allocate the current slice and remember pointers to both current and old.
        // This preserves all relevant overflow buckets alive even if
        // the table grows and/or overflow buckets are added to the table
        // while we are iterating.
        h.createOverflow()
        it.overflow = h.extra.overflow
        it.oldoverflow = h.extra.oldoverflow
    }

    // decide where to start
    var r uintptr
    if h.B > 31-bucketCntBits {
        r = uintptr(fastrand64())
    } else {
        r = uintptr(fastrand())
    }
    it.startBucket = r & bucketMask(h.B)
    it.offset = uint8(r >> h.B & (bucketCnt - 1))

    // iterator state
    it.bucket = it.startBucket

    // Remember we have an iterator.
    // Can run concurrently with another mapiterinit().
    if old := h.flags; old&(iterator|oldIterator) != iterator|oldIterator {
        atomic.Or8(&h.flags, iterator|oldIterator)
    }

    mapiternext(it)
}

map删除

具体细节暂按下不表

  • map删除元素并不会导致map占用的内存减小
// Only clear key if there are pointers in it.
if t.indirectkey() {
    *(*unsafe.Pointer)(k) = nil
} else if t.key.ptrdata != 0 {
    memclrHasPointers(k, t.key.size)
}
e := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))
if t.indirectelem() {
    *(*unsafe.Pointer)(e) = nil
} else if t.elem.ptrdata != 0 {
    memclrHasPointers(e, t.elem.size)
} else {
    memclrNoHeapPointers(e, t.elem.size)
}
转载自:https://juejin.cn/post/7349124917378727948
评论
请登录