用go构建个简单的搜索（五）哥伦布Golomb编解码增强版

站长

2023年06月11日 20:09 · 阅读数 99

用go构建个简单的搜索（一）

用go构建个简单的搜索（二）最简单的索引文件

用go构建个简单的搜索（三）对上篇索引文件优化

用go构建个简单的搜索（四）哥伦布Golomb编解码实现

增强、注意事项、翻车地方

没看上篇的可以看这个公式

用go构建个简单的搜索（五）哥伦布Golomb编解码增强版

增强：把一个连续的数字数组通过编码顺序写进uin32[]数组中，然后根据编码反解析出原有的数组数据
注意一点 对于m的取数（不能随便设置）需要求数组的平均值，不然会翻车的（下午调的头疼发现这个问题） 有懂得可以留言
翻车描述
翻车修正：直接保存余数限定宽度为b

优势(随着数据量增大消攻效果会很大的

test := []uint32{182266, 44916, 144917, 244918, 284919, 244918}
uint8 : 0 to 255
uint16 : 0 to 65535
uint32 : 0 to 4294967295
上面的数组保存到文件如果以字符串保存：
字符串保存:6+5+6+6+6+6=35个字节
数字保存：int32:4*6=24字节
Golomb:位数:117位,字节数:14+1=15个字节

增强

go中用uint32[]连续性的保存编码，以及反解析数据参考下面代码

效果

用go构建个简单的搜索（五）哥伦布Golomb编解码增强版

// 压缩数组
func getGolombEncodeUnion(encode []uint32, num uint32) ([]uint32, uint32) {
	b, t := getBT(num)
	var result []uint32
	lenCount, currentLen := uint32(0), uint32(0)
	for index, en := range encode {
		testNum, len := getGolombEncode(en/num, en%num, b, t)
		if 0 == index {
			result = append(result, testNum)
			currentLen++
		} else {
			//int 64/8=8个字节
			if lenCount+len < currentLen*32 {
				result[currentLen-1] = result[currentLen-1] | testNum<<(lenCount%32)
			} else {
				if lenCount%32 == 0 { //刚好结束
					result = append(result, testNum)
					currentLen++
				} else { //一半一半的情况
					result[currentLen-1] = result[currentLen-1] | testNum<<(lenCount%32)
					result = append(result, testNum>>(32-lenCount%32))
					currentLen++
				}
			}
		}
		lenCount += len
	}
	return result, lenCount
}

// 解析数组
func getGolombDncodeUnion(encode []uint32, countLen uint32, num uint32) []uint32 {
	fmt.Printf("位数:%d,字节数:%d\n", countLen, countLen/8)
	b, t := getBT(num)
	var result []uint32
	currentLen := uint32(0)
	for {
		temp := encode[0]
		res, curLen := getGolombDecodeLen(temp, b, t, num)
		fmt.Printf("偏移：%d\n", curLen)
		currentLen += curLen //总长度

		encode[0] = (encode[0] >> curLen)
		for index := 1; index < len(encode); index++ {
			encode[index-1] = encode[index-1] | encode[index]<<(32-curLen)
			encode[index] = (encode[index] >> curLen)
		}
		if currentLen > countLen {
			break
		}
		result = append(result, res)
	}
	return result
}

转载自:https://juejin.cn/post/7243016090817232955