HashMap源码原理解析

站长

2023年05月21日 19:54 · 阅读数 118

提到HashMap，首先我们要搞明白它这个名字的由来，也就是首先看hash方法到底做了什么

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

首先一点，HashMap是接受null的，得到的hash值是0。这里我们粗略的认为key.hashCode()返回的是一个内存地址，>>> 是无符号右移高位补0 任何数跟0异或都是本身那么key != null的时候做的就是内存地址异或上内存地址右移16位

key.hashCode() ->        1111 1111 1111 1111 1111 0000 1111 0000
key.hashCode() >>>16   ->0000 0000 0000 0000 1111 1111 1111 1111
                                            ↓          
                         1111 1111 1111 1111 0000 1111 0000 1111

为什么要这么做呢？

因为每次计算下标索引的时候都是这么来算的，也就是怎么找自己应该存在数组的第几个的时候：

n = table.length
index = (n-1) & hash;

因为n总是2的倍数，也就是说hashmap的size永远是这样式儿的（1000,10000,100000,1000000）,减去一之后（111,1111,11111,111111）

那~ 1111 1111 1111 1111 0000 1111 0000 1110 0000 0000 0000 0000 0000 0000 0000 1111

得到的就是0001，永远只有最低的几位留下来了。

也就是说Value在HashMap中存储的位置由 Key的哈希值的高16位的后x位和低16位的后x位异或得出(x是hashmap数组长度length-1的二进制位数)。举个例子：当前hashmap有16位，16 - 1 = 15 -> 1111，所以x=4。有一个KV的key.hash=（1111 1111 1111 010 1111 1111 1111 1001），高16位为 (1111 1111 1111 1010) 低16位为(1111 1111 1111 1001) 这两个数的低4位异或 (1010)^(1001)->0011=5，所以这个数应该存在数组的第5位。

关于**(n-1) & hash**，其实是一个取模操作，不理解的看这里length = 2n 且X>0，X % length = X & (length - 1)

然后看它的源码结构

public class HashMap<K,V> extends AbstractMap<K,V>
    implements Map<K,V>, Cloneable, Serializable {

继承自AbstractMap，实现了Map接口关于这个问题；其实据java集合框架的创始人Josh Bloch描述，这样的写法是一个失误。在java集合框架中，类似这样的写法很多。stack overflow上面也进行过提问，而且找到了真正的答案stackoverflow.com/questions/2…

AbstractMap实现了Map接口，做了一些Map接口的默认实现

public abstract class AbstractMap<K,V> implements Map<K,V> {
public int size() { return entrySet().size();}
public boolean isEmpty() { return size() == 0;}
public V get(Object key) {
    Iterator<Entry<K,V>> i = entrySet().iterator();
    if (key==null) {
        while (i.hasNext()) {
            Entry<K,V> e = i.next();
            if (e.getKey()==null)
                return e.getValue();
        }
    } else {
        while (i.hasNext()) {
            Entry<K,V> e = i.next();
            if (key.equals(e.getKey()))
                return e.getValue();
        }
    }
    return null;
}

public abstract Set<Entry<K,V>> entrySet();

像get()方法，得到entrySet的迭代器，所以需要子类实现的仅仅是实现entrySet方法，就可以实现一个简单的Map，Like Below：

public class MyJavaMap extends AbstractMap<Integer, String> {
    @Override
    public Set<Entry<Integer, String>> entrySet() {
        return null;
    }
}

这里来看HashMap transient Node<K,V>[] table; transient Set<Map.Entry<K,V>> entrySet;

最主要的集合，首先，Node的数组。

支线一开始：这里看Node的源码：得知Node是实现了Map.Entry的HashMap的一个静态内部类。

static class Node<K,V> implements Map.Entry<K,V> {

    final int hash;
    final K key;
    V value;
    Node<K,V> next;

    Node(int hash, K key, V value, Node<K,V> next) {
        this.hash = hash;
        this.key = key;
        this.value = value;
        this.next = next;
    }
    public final K getKey()        { return key; }
    public final V getValue()      { return value; }
    public final String toString() { return key + "=" + value; }
    public final int hashCode() {
        return Objects.hashCode(key) ^ Objects.hashCode(value);
    }
    public final V setValue(V newValue) {
        V oldValue = value;
        value = newValue;
        return oldValue;
    }
    public final boolean equals(Object o) {
        if (o == this)
            return true;
        if (o instanceof Map.Entry) {
            Map.Entry<?,?> e = (Map.Entry<?,?>)o;
            if (Objects.equals(key, e.getKey()) &&
                Objects.equals(value, e.getValue()))
                return true;
        }
        return false;
    }
}

而Entry是Map接口的一个内部接口，Map.Entry表示Map的一个实体（一个KV的键值对），getKey getValue方法获取其中的内容

interface Entry<K, V> {
    K getKey();
    V getValue();
    V setValue(V value);
    boolean equals(Object o);
    int hashCode();

    public static <K extends Comparable<? super K>, V> Comparator<Map.Entry<K, V>> comparingByKey() {
        return (Comparator<Map.Entry<K, V>> & Serializable)
            (c1, c2) -> c1.getKey().compareTo(c2.getKey());
    }
...
}

支线一结束

继续支线开始之前的内容，创建了一个Node的数组table跟一个Map.Entry的Set-entrySet

现在来看最主要的方法get

public V get(Object key) {
    Node<K,V> e;
    return (e = getNode(hash(key), key)) == null ? null : e.value;
}

final Node<K,V> getNode(int hash, Object key) {
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
//table不为空
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & hash]) != null) {
        if (first.hash == hash && // always check first node
            ((k = first.key) == key || (key != null && key.equals(k))))
            return first;
        if ((e = first.next) != null) {
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            do {
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    return e;
            } while ((e = e.next) != null);
        }
    }
    return null;
}

发现在查找的时候使用了TreeNode来查找，JDK 1.8之后HashMap使用了红黑树来查找Value

在其他里面介绍红黑树，先往下看put，实际上是在putVal方法实现

public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, I;
   //如果table还没有初始化，使用resize初始化
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
   //找不到对应的node，也就是数组index的为空，没有冲突，就newNode放进去
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else {
   //有冲突，hash冲突
        Node<K,V> e; K k;
       //当前p跟需要插入的一致，则替换
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
       //如果是TreeNode,则使用红黑树的方式去插入数据结构
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
           //如果不是TreeNode，则使用普通方式插入，启动一个循环
            for (int binCount = 0; ; ++binCount) {
               //因为p并不是我们想找的Node，则把p.next赋值给e，如果p.next直接是null，就直接newNode
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st  
                        treeifyBin(tab, hash); 
                    break;
              
               //如果要插入的跟我们链表上的某一个相等（hash相等或者eauqls()）,则有重复的，直接break循环。有一样的就不需要插入了~
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
               //指针下移，继续循环，直到到达链表的尾部
                p = e;
            }
        }
       //如果e插入之前就存在，则返回oldValue，返回插入之前的值
        if (e != null) { // existing mapping for key
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }

    ++modCount;
    if (++size > threshold)
       //复杂的地方，在resize
        resize();
    afterNodeInsertion(evict);
    return null;
}

终于来到了resize方法，当总的size到达总的size * 扩容因子大小的时候，就需要扩容了，每次扩容都是乘以二，所以HashMap的size总是2的倍数

final Node<K,V>[] resize() {
   //老的数组存起来
    Node<K,V>[] oldTab = table;
    int oldCap = (oldTab == null) ? 0 : oldTab.length;
   //老的扩容极限值 threshold->HashMap的size大于threshold时会执行扩容操作（threshold = capacity*loadFactor）
    int oldThr = threshold;
    int newCap, newThr = 0;
    if (oldCap > 0) {
       //到达最大值，无法扩容
        if (oldCap >= MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }

       //扩容，乘以二
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
            newThr = oldThr << 1; // double threshold
    }
    else if (oldThr > 0) // initial capacity was placed in threshold
        newCap = oldThr;
    else {               // zero initial threshold signifies using defaults
       //为0时初始化
        newCap = DEFAULT_INITIAL_CAPACITY;
       //初始化threshold 默认容量 * 扩容因子
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }

    if (newThr == 0) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);
    }
   //扩容极限赋值
    threshold = newThr;
    @SuppressWarnings({"rawtypes","unchecked"})
        Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    table = newTab;
   //开始赋值操作
    if (oldTab != null) {
       //开启循环遍历老的table，这里用++j的好处是效率比较好，逻辑跟j++没有区别。。
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
           //oldTag 赋值给e 从0到table.size()
            if ((e = oldTab[j]) != null) {
                oldTab[j] = null;
               //没有next直接赋值给新的table数组
                if (e.next == null)
                    newTab[e.hash & (newCap - 1)] = e;
                else if (e instanceof TreeNode)
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                else { // preserve order  普通方式扩容 扩容表示size由 oldCap->(newCap=oldCap * 2)，新cap变成了旧cap的两倍
                    Node<K,V> loHead = null, loTail = null;    //hi lo表示扩容后是放在[0~oldCap]范围还是[oldCap+1 ~ newCap]范围里，所谓的高低之分
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next;
                    do {
                        next = e.next;  //老的next赋值给next
                        if ((e.hash & oldCap) == 0) {  //得到的值为0，表示要放到[0~oldCap]范围，并把所有==0的串起来先赋head，然后把所有的e扔到next上
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        else {         //得到的值不为0，也就是oldCap，表示放到[oldCap+1 ~ newCap]范围内
                            if (hiTail == null)
                                hiHead = e;
                            else
                                hiTail.next = e;
                            hiTail = e;
                        }
                    } while ((e = next) != null);
                    if (loTail != null) {  //把loTail.next置null
                        loTail.next = null;
                        newTab[j] = loHead;  //低位loHead放到[0~oldCap]范围
                    }
                    if (hiTail != null) {    //把hiTail.next置null
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;   //高位hiHead放到[oldCap+1 ~ newCap]范围
                    }
                }
            }
        }
    }
    return newTab;   
}

关于(e.hash & oldCap) == 0 & 都是1为1，否则为0，假如oldCap=16（10000）多个hash后5位分别为3（00011） 9（01001）13（01101）17（10001）27（11011）

hash二进制(e.hash & oldCap) 3000110 9010010 13011010 171000116（10000） 271101116（10000）

转换结果发现，结果不是oldCap就是0，所以说resize之后的Node要不然就放在原来的位置，要不然就是放在+oldCap的位置

附个HashMap相关的问题：

HashMap怎样实现相同key存入数据后不被覆盖或者说两个key的hashcode相同但是都能存进去？
为什么生成一个对象要同时实现hashcode()和equals()方法？

hashcode相同，但是equals不同

    public static void main(String[] args) throws IOException {
        HashMap<Key, String> map = new HashMap<>();
        map.put(new Key("a", "a"), "value1");
        map.put(new Key("a", "b"), "value2");
        LogUtil.d(map.size());
        for (Key key : map.keySet()) {
            LogUtil.d(map.get(key));
        }
/*
         * 结果：
         * value1
         * value2
         */
    }

    public static class Key {
        public String key = "";
        public String append = "";

        public Key(String key, String append) {
            this.key = key;
            this.append = append;
        }

        @Override
        public boolean equals(Object o) {
            Key key1 = (Key) o;
            return Objects.equals(key, key1.key) &&
                    Objects.equals(append, key1.append);
        }

        @Override
        public int hashCode() {
            return key != null ? key.hashCode() : 0;
        }
    }

关于红黑树：

www.cnblogs.com/mfrank/p/92…

转载自:https://juejin.cn/post/7213296062085660728