浅析ConcurrentHashMap | Wen Er Rd. NO. 48

这个类要从HashMap讲起，HashMap是一个简单的hash表，稍微详细一点的介绍可以在网上查一下资料，或者看我以前写的《java集合总结》。这个集合有一个缺点是线程不安全，也就是说，如果在多线程下同时读写可能会产生意想不到的结果。比如这样：

package thread;
import java.util.HashMap;
public class ConcurrentHashMapDemo {
    private static HashMap<Integer, Integer> map = new HashMap<Integer, Integer>(1);
    public static void main(String[] args) {
        Thread thread1 = new Thread(new Runnable() {
            @Override
            public void run() {
                for (int i =0;i < 1000; i++) {
                    map.put(i, i);
                }
                System.out.println(map.get(500));
            }
        });
        Thread thread2 = new Thread(new Runnable() {
            @Override
            public void run() {
                for (int i =1000;i < 2000; i++) {
                    map.put(i, i);
                }
                System.out.println(map.get(1500));
            }
        });
        thread1.start();
        thread2.start();
    }
}

他可能就会输出这样的结果：

1
2

null
1500

这是为什么？

导致HashMap线程不安全的根本原因是扩容。扩容就是在put加入元素的个数超过capacity * loadFactor的时候就会将内部Entry数组大小扩大至原来的2倍，然后将数组元素按照新的数组大小重新计算索引，放在新的数组中，同时修改每个节点的链表关系（主要是next和节点在链表中的位置）。在扩容的过程中，每个线程会申请扩容后大小的新数组，移动完成后将新数组复制到Entry数组中。如果多个线程同时进行扩容，由于线程的执行顺序的不确定性，导致多个线程操作节点的next值会发生不确定的变化，造成数据丢失，或者形成循环链表等问题，导致线程不安全。

这时我们急需一个线程安全的hashMap，HashTable刚好满足，如果上述代码中的HashMap改成HashTable，那么就不会再有问题。

public synchronized V put(K key, V value) {
        // Make sure the value is not null
        if (value == null) {
            throw new NullPointerException();
        }
        // Makes sure the key is not already in the hashtable.
        Entry<?,?> tab[] = table;
        int hash = key.hashCode();
        int index = (hash & 0x7FFFFFFF) % tab.length;
        @SuppressWarnings("unchecked")
        Entry<K,V> entry = (Entry<K,V>)tab[index];
        for(; entry != null ; entry = entry.next) {
            if ((entry.hash == hash) && entry.key.equals(key)) {
                V old = entry.value;
                entry.value = value;
                return old;
            }
        }
        addEntry(hash, key, value, index);
        return null;
    }

整个put都被synchronized包裹了，包括真正的addEntry里面涉及到扩容的代码，确实这样是解决了问题，但是是不是太粗糙了？想想在扩容的时候整段代码被锁住了，只有一个线程能访问，那是不是效率特别低？
幸好聪明地人们还是想出了解决办法：ConcurrentHashMap

final V putVal(K key, V value, boolean onlyIfAbsent) {
        if (key == null || value == null) throw new NullPointerException();
        int hash = spread(key.hashCode());
        int binCount = 0;
        for (Node<K,V>[] tab = table;;) {
            Node<K,V> f; int n, i, fh;
            if (tab == null || (n = tab.length) == 0)
                tab = initTable();
            else if ((f = tabAt(tab, i = (n - 1) & hash)) == null) {
                if (casTabAt(tab, i, null,
                             new Node<K,V>(hash, key, value, null)))
                    break;                   // no lock when adding to empty bin
            }
            else if ((fh = f.hash) == MOVED)
                tab = helpTransfer(tab, f);
            else {
                V oldVal = null;
                synchronized (f) {
                    if (tabAt(tab, i) == f) {
                        if (fh >= 0) {
                            binCount = 1;
                            for (Node<K,V> e = f;; ++binCount) {
                                K ek;
                                if (e.hash == hash &&
                                    ((ek = e.key) == key ||
                                     (ek != null && key.equals(ek)))) {
                                    oldVal = e.val;
                                    if (!onlyIfAbsent)
                                        e.val = value;
                                    break;
                                }
                                Node<K,V> pred = e;
                                if ((e = e.next) == null) {
                                    pred.next = new Node<K,V>(hash, key,
                                                              value, null);
                                    break;
                                }
                            }
                        }
                        else if (f instanceof TreeBin) {
                            Node<K,V> p;
                            binCount = 2;
                            if ((p = ((TreeBin<K,V>)f).putTreeVal(hash, key,
                                                           value)) != null) {
                                oldVal = p.val;
                                if (!onlyIfAbsent)
                                    p.val = value;
                            }
                        }
                    }
                }
                if (binCount != 0) {
                    if (binCount >= TREEIFY_THRESHOLD)
                        treeifyBin(tab, i);
                    if (oldVal != null)
                        return oldVal;
                    break;
                }
            }
        }
        addCount(1L, binCount);
        return null;
    }

这一段是ConcurrenthashMap的put代码，其中

1	int hash = spread(key.hashCode());

其实是对key的hashCode再做了一次分割，可以想象成一个二维数组，key的hashCode决定了第一维数组的下标，key的hashCode的spread值决定了二维数组的值，然后后面是针对不同场景的操作，比如空的时候直接插入，非空的时候对这某一个纬度加一个锁，确保影响的范围变小，这样就达到了既线程安全，效率又有一个明显的提升的效果，其中还有很多细节等待挖掘。