性能有坑 | 慎用 Java 8 ConcurrentHashMap 的 computeIfAbsent
前言
我们先看一段代码,代码中使用 Map
的时候,有可能会这么写:
Map<String, Value> map;
// ...
Value result = map.get(key);
if (null == result) {
result = this.calculateValue(key);
map.put(key, result);
}
return result;
Java 8 的 java.util.Map
里面有个方法 computeIfAbsent
,能够简化以上代码:
Map<String, Value> map;
// ...
return map.computeIfAbsent(key, this::calculateValue);
以上这种写法除了简洁,如果使用的是 java.util.concurrent.ConcurrentHashMap
,还能够在并发调用的情况下确保 calculateValue
方法不会被重复调用,保证原子性。
不过,前段时间对 Apache ShardingSphere-Proxy 做压测时遇到一个问题,当 BenchmarkSQL 连接 ShardingSphere Proxy 的 Terminal 数量比较高时,其中一条很简单的插入 SQL 执行延迟增加了很多。借助 Async Profiler 发现 Java 8 ConcurrentHashMap 的 computeIfAbsent 在性能上有坑。
不了解 Apache ShardingSphere 的读者可以参考 github.com/apache/shar…。
排查
考虑到当时的压测的现象是 BenchmarkSQL 并发数(Terminals)越高,New Order 业务中一条简单且重复执行的 insert SQL 执行延时越长。但是 ShardingSphere-Proxy 的所在机器的 CPU 也没有压满,考虑是不是 Proxy 代码层面存在瓶颈,于是借助 async-profiler
对压测状态下的 Proxy JVM 采样。
./profiler.sh -e lock --lock 1ms -d 180 -o jfr -f output.jfr $PID
关于
async-profiler
可以参考 github.com/jvm-profili…,后续我也考虑写一些相关文章。
使用 IDEA 读取采样获得的 jfr 文件,看到 Java Monitor Blocked
事件居然有三百多万次!
根据堆栈,找到 ShardingSphere 这段使用了
computeIfAbsent
代码,以下为节选:
// ...
private static final Map<String, SQLExecutionUnitBuilder> TYPE_TO_BUILDER_MAP = new ConcurrentHashMap<>(8, 1);
// ...
public DriverExecutionPrepareEngine(final String type, final int maxConnectionsSizePerQuery, final ExecutorDriverManager<C, ?, ?> executorDriverManager,
final StorageResourceOption option, final Collection<ShardingSphereRule> rules) {
super(maxConnectionsSizePerQuery, rules);
this.executorDriverManager = executorDriverManager;
this.option = option;
sqlExecutionUnitBuilder = TYPE_TO_BUILDER_MAP.computeIfAbsent(type,
key -> TypedSPIRegistry.getRegisteredService(SQLExecutionUnitBuilder.class, key, new Properties()));
}
// ...
以上这段代码在每一次 Proxy 与数据库交互前都会执行,即通过 Proxy 执行 CRUD 操作的必经之路,而且里面的 type
目前只有 2 种,分别是 JDBC.STATEMENT
和 JDBC.PREPARED_STATEMENT
,所以在高并发的情况下会有大量的线程调用同一个 key 的 computeIfAbsent
。
我的理解是,如果在 key 存在的情况下,computeIfAbsent
操作就不存在修改的情况了,直接 get 出来就好,那事实如何?
看一下 computeIfAbsent
方法的实现(JDK 是 Oracle 8u311),节选代码并加了一些注释:
public V computeIfAbsent(K key, Function<? super K, ? extends V> mappingFunction) {
if (key == null || mappingFunction == null)
throw new NullPointerException();
int h = spread(key.hashCode());
V val = null;
int binCount = 0;
for (Node<K,V>[] tab = table;;) {
Node<K,V> f; int n, i, fh;
if (tab == null || (n = tab.length) == 0)
// Map 初始化
tab = initTable();
else if ((f = tabAt(tab, i = (n - 1) & h)) == null) {
// key 不存在且 hash 对应的位置还没有东西
Node<K,V> r = new ReservationNode<K,V>();
synchronized (r) {
// 初始化 hash 对应的位置,放入 kv 等操作
}
}
else if ((fh = f.hash) == MOVED)
// Map 正忙着扩容
tab = helpTransfer(tab, f);
else {
// key 的 hash 对应的位置已经存在链表或红黑树
boolean added = false;
synchronized (f) {
if (tabAt(tab, i) == f) {
if (fh >= 0) {
// 去链表里面找 key
}
else if (f instanceof TreeBin) {
// 去红黑树里面找 key
}
}
}
// 省略部分代码
}
}
// 省略部分代码
return val;
}
根据我对源码的理解,即使 key 存在,computeIfAbsent
去找 key 的时候,都会进入 synchronized
代码。
那这相比 ConcurrentHashMap
不加锁的 get
操作不就影响性能了吗?Google 一下相应的话题,发现了一些内容:
bugs.openjdk.java.net/browse/JDK-…
这个问题早就有人提过了,也在 JDK 9 处理了。截至本文编写 JDK 17 已经正式发布了。
解决
在目前 JDK 8 仍然盛行的环境下,我们有必要考虑如何避免上面的问题,于是相应的处理方法就诞生了:github.com/apache/shar…
SQLExecutionUnitBuilder result;
if (null == (result = TYPE_TO_BUILDER_MAP.get(type))) {
result = TYPE_TO_BUILDER_MAP.computeIfAbsent(type, key -> TypedSPIRegistry.getRegisteredService(SQLExecutionUnitBuilder.class, key, new Properties()));
}
return result;
每次从 Map 中获取 value 前,都先用 get
做一次检查,value 不存在才使用 computeIfAbsent
放入 value。由于 ConcurrentHashMap
的 computeIfAbsent
可以保证操作原子性,这里也不需要自己加 synchronized
或者做多重检查之类的操作。
问题解决~
附:JMH 测试
测试环境
测试代码
package icu.wwj.jmh.dangling;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Level;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.Setup;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.annotations.Threads;
import org.openjdk.jmh.annotations.Warmup;
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;
@Fork(3)
@Warmup(iterations = 3, time = 5)
@Measurement(iterations = 3, time = 5)
@Threads(16)
@State(Scope.Benchmark)
public class ConcurrentHashMapBenchmark {
private static final String KEY = "key";
private static final Object VALUE = new Object();
private final Map<String, Object> concurrentMap = new ConcurrentHashMap<>(1, 1);
@Setup(Level.Iteration)
public void setup() {
concurrentMap.clear();
}
@Benchmark
public Object benchGetBeforeComputeIfAbsent() {
Object result = concurrentMap.get(KEY);
if (null == result) {
result = concurrentMap.computeIfAbsent(KEY, __ -> VALUE);
}
return result;
}
@Benchmark
public Object benchComputeIfAbsent() {
return concurrentMap.computeIfAbsent(KEY, __ -> VALUE);
}
}
JDK 8 测试结果
# JMH version: 1.33
# VM version: JDK 1.8.0_311, Java HotSpot(TM) 64-Bit Server VM, 25.311-b11
# VM invoker: /usr/local/java/jdk1.8.0_311/jre/bin/java
# VM options: -Dvisualvm.id=172855224679674 -javaagent:/home/wuweijie/.local/share/JetBrains/Toolbox/apps/IDEA-U/ch-0/213.5744.223/lib/idea_rt.jar=38763:/home/wuweijie/.local/share/JetBrains/Toolbox/apps/IDEA-U/ch-0/213.5744.223/bin -Dfile.encoding=UTF-8
# Blackhole mode: full + dont-inline hint (default, use -Djmh.blackhole.autoDetect=true to auto-detect)
# Warmup: 3 iterations, 5 s each
# Measurement: 3 iterations, 5 s each
# Timeout: 10 min per iteration
# Threads: 16 threads, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: icu.wwj.jmh.dangling.ConcurrentHashMapBenchmark.benchComputeIfAbsent
# Run progress: 0.00% complete, ETA 00:03:00
# Fork: 1 of 3
# Warmup Iteration 1: 11173878.242 ops/s
# Warmup Iteration 2: 8471364.065 ops/s
# Warmup Iteration 3: 8766401.960 ops/s
Iteration 1: 8776260.796 ops/s
Iteration 2: 8632907.974 ops/s
Iteration 3: 8557264.788 ops/s
# Run progress: 16.67% complete, ETA 00:02:33
# Fork: 2 of 3
# Warmup Iteration 1: 7757506.431 ops/s
# Warmup Iteration 2: 8176991.807 ops/s
# Warmup Iteration 3: 8795107.589 ops/s
Iteration 1: 8668883.337 ops/s
Iteration 2: 8866318.073 ops/s
Iteration 3: 8848517.540 ops/s
# Run progress: 33.33% complete, ETA 00:02:02
# Fork: 3 of 3
# Warmup Iteration 1: 8154698.571 ops/s
# Warmup Iteration 2: 8317945.491 ops/s
# Warmup Iteration 3: 8884286.732 ops/s
Iteration 1: 8912555.062 ops/s
Iteration 2: 8894750.001 ops/s
Iteration 3: 8780504.227 ops/s
Result "icu.wwj.jmh.dangling.ConcurrentHashMapBenchmark.benchComputeIfAbsent":
8770884.644 ±(99.9%) 210678.797 ops/s [Average]
(min, avg, max) = (8557264.788, 8770884.644, 8912555.062), stdev = 125371.573
CI (99.9%): [8560205.847, 8981563.442] (assumes normal distribution)
# JMH version: 1.33
# VM version: JDK 1.8.0_311, Java HotSpot(TM) 64-Bit Server VM, 25.311-b11
# VM invoker: /usr/local/java/jdk1.8.0_311/jre/bin/java
# VM options: -Dvisualvm.id=172855224679674 -javaagent:/home/wuweijie/.local/share/JetBrains/Toolbox/apps/IDEA-U/ch-0/213.5744.223/lib/idea_rt.jar=38763:/home/wuweijie/.local/share/JetBrains/Toolbox/apps/IDEA-U/ch-0/213.5744.223/bin -Dfile.encoding=UTF-8
# Blackhole mode: full + dont-inline hint (default, use -Djmh.blackhole.autoDetect=true to auto-detect)
# Warmup: 3 iterations, 5 s each
# Measurement: 3 iterations, 5 s each
# Timeout: 10 min per iteration
# Threads: 16 threads, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: icu.wwj.jmh.dangling.ConcurrentHashMapBenchmark.benchGetBeforeComputeIfAbsent
# Run progress: 50.00% complete, ETA 00:01:31
# Fork: 1 of 3
# Warmup Iteration 1: 1881091972.510 ops/s
# Warmup Iteration 2: 1843432746.197 ops/s
# Warmup Iteration 3: 2353506882.860 ops/s
Iteration 1: 2389458285.091 ops/s
Iteration 2: 2391001171.657 ops/s
Iteration 3: 2387181602.010 ops/s
# Run progress: 66.67% complete, ETA 00:01:01
# Fork: 2 of 3
# Warmup Iteration 1: 1872514017.315 ops/s
# Warmup Iteration 2: 1855584197.510 ops/s
# Warmup Iteration 3: 2342392977.207 ops/s
Iteration 1: 2378551289.692 ops/s
Iteration 2: 2374081014.168 ops/s
Iteration 3: 2389909613.865 ops/s
# Run progress: 83.33% complete, ETA 00:00:30
# Fork: 3 of 3
# Warmup Iteration 1: 1880210774.729 ops/s
# Warmup Iteration 2: 1804266170.900 ops/s
# Warmup Iteration 3: 2337740394.373 ops/s
Iteration 1: 2363741084.192 ops/s
Iteration 2: 2372565304.724 ops/s
Iteration 3: 2388015878.515 ops/s
Result "icu.wwj.jmh.dangling.ConcurrentHashMapBenchmark.benchGetBeforeComputeIfAbsent":
2381611693.768 ±(99.9%) 16356182.057 ops/s [Average]
(min, avg, max) = (2363741084.192, 2381611693.768, 2391001171.657), stdev = 9733301.586
CI (99.9%): [2365255511.711, 2397967875.825] (assumes normal distribution)
# Run complete. Total time: 00:03:03
REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.
Benchmark Mode Cnt Score Error Units
ConcurrentHashMapBenchmark.benchComputeIfAbsent thrpt 9 8770884.644 ± 210678.797 ops/s
ConcurrentHashMapBenchmark.benchGetBeforeComputeIfAbsent thrpt 9 2381611693.768 ± 16356182.057 ops/s
可以看到,两种方式在性能上相差了很多个数量级,直接调用 computeIfAbsent
的性能是每秒百万级,先调用 get
做检查的性能是每秒十亿级,而且这仅仅是 16 线程的测试。
在资源方面,benchComputeIfAbsent
测试期间 CPU 利用率一直维持在 20% 左右;而 benchGetBeforeComputeIfAbsent
测试期间的 CPU 利用率一直 100%。
JDK 17 测试结果
# JMH version: 1.33
# VM version: JDK 17.0.1, Java HotSpot(TM) 64-Bit Server VM, 17.0.1+12-LTS-39
# VM invoker: /usr/local/java/jdk-17.0.1/bin/java
# VM options: -Dvisualvm.id=173221627574053 -javaagent:/home/wuweijie/.local/share/JetBrains/Toolbox/apps/IDEA-U/ch-0/213.5744.223/lib/idea_rt.jar=33189:/home/wuweijie/.local/share/JetBrains/Toolbox/apps/IDEA-U/ch-0/213.5744.223/bin -Dfile.encoding=UTF-8
# Blackhole mode: full + dont-inline hint (default, use -Djmh.blackhole.autoDetect=true to auto-detect)
# Warmup: 3 iterations, 5 s each
# Measurement: 3 iterations, 5 s each
# Timeout: 10 min per iteration
# Threads: 16 threads, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: icu.wwj.jmh.dangling.ConcurrentHashMapBenchmark.benchComputeIfAbsent
# Run progress: 0.00% complete, ETA 00:03:00
# Fork: 1 of 3
# Warmup Iteration 1: 1544327446.565 ops/s
# Warmup Iteration 2: 1475077923.449 ops/s
# Warmup Iteration 3: 1565544222.606 ops/s
Iteration 1: 1564346089.698 ops/s
Iteration 2: 1560062375.891 ops/s
Iteration 3: 1552569020.412 ops/s
# Run progress: 16.67% complete, ETA 00:02:33
# Fork: 2 of 3
# Warmup Iteration 1: 1617143507.004 ops/s
# Warmup Iteration 2: 1433136907.916 ops/s
# Warmup Iteration 3: 1527623176.866 ops/s
Iteration 1: 1522331660.180 ops/s
Iteration 2: 1524798683.186 ops/s
Iteration 3: 1522686827.744 ops/s
# Run progress: 33.33% complete, ETA 00:02:02
# Fork: 3 of 3
# Warmup Iteration 1: 1671732222.173 ops/s
# Warmup Iteration 2: 1462966231.429 ops/s
# Warmup Iteration 3: 1553792663.545 ops/s
Iteration 1: 1549840468.944 ops/s
Iteration 2: 1549245571.349 ops/s
Iteration 3: 1554801575.735 ops/s
Result "icu.wwj.jmh.dangling.ConcurrentHashMapBenchmark.benchComputeIfAbsent":
1544520252.571 ±(99.9%) 27953594.118 ops/s [Average]
(min, avg, max) = (1522331660.180, 1544520252.571, 1564346089.698), stdev = 16634735.479
CI (99.9%): [1516566658.453, 1572473846.689] (assumes normal distribution)
# JMH version: 1.33
# VM version: JDK 17.0.1, Java HotSpot(TM) 64-Bit Server VM, 17.0.1+12-LTS-39
# VM invoker: /usr/local/java/jdk-17.0.1/bin/java
# VM options: -Dvisualvm.id=173221627574053 -javaagent:/home/wuweijie/.local/share/JetBrains/Toolbox/apps/IDEA-U/ch-0/213.5744.223/lib/idea_rt.jar=33189:/home/wuweijie/.local/share/JetBrains/Toolbox/apps/IDEA-U/ch-0/213.5744.223/bin -Dfile.encoding=UTF-8
# Blackhole mode: full + dont-inline hint (default, use -Djmh.blackhole.autoDetect=true to auto-detect)
# Warmup: 3 iterations, 5 s each
# Measurement: 3 iterations, 5 s each
# Timeout: 10 min per iteration的
# Threads: 16 threads, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: icu.wwj.jmh.dangling.ConcurrentHashMapBenchmark.benchGetBeforeComputeIfAbsent
# Run progress: 50.00% complete, ETA 00:01:31
# Fork: 1 of 3
# Warmup Iteration 1: 1813078468.960 ops/s
# Warmup Iteration 2: 1944438216.902 ops/s
# Warmup Iteration 3: 2232703681.960 ops/s
Iteration 1: 2233727123.664 ops/s
Iteration 2: 2233657163.983 ops/s
Iteration 3: 2229008772.953 ops/s
# Run progress: 66.67% complete, ETA 00:01:01
# Fork: 2 of 3
# Warmup Iteration 1: 1767187585.805 ops/s
# Warmup Iteration 2: 1900420998.518 ops/s
# Warmup Iteration 3: 2175122268.840 ops/s
Iteration 1: 2180409680.029 ops/s
Iteration 2: 2181398523.091 ops/s
Iteration 3: 2176454597.329 ops/s
# Run progress: 83.33% complete, ETA 00:00:30
# Fork: 3 of 3
# Warmup Iteration 1: 1822355551.990 ops/s
# Warmup Iteration 2: 1832618832.110 ops/s
# Warmup Iteration 3: 2225265888.631 ops/s
Iteration 1: 2240765668.888 ops/s
Iteration 2: 2225847700.599 ops/s
Iteration 3: 2232257415.965 ops/s
Result "icu.wwj.jmh.dangling.ConcurrentHashMapBenchmark.benchGetBeforeComputeIfAbsent":
2214836294.056 ±(99.9%) 45190341.578 ops/s [Average]
(min, avg, max) = (2176454597.329, 2214836294.056, 2240765668.888), stdev = 26892047.412
CI (99.9%): [2169645952.478, 2260026635.633] (assumes normal distribution)
# Run complete. Total time: 00:03:03
REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.
Benchmark Mode Cnt Score Error Units
ConcurrentHashMapBenchmark.benchComputeIfAbsent thrpt 9 1544520252.571 ± 27953594.118 ops/s
ConcurrentHashMapBenchmark.benchGetBeforeComputeIfAbsent thrpt 9 2214836294.056 ± 45190341.578 ops/s
JDK 17 测试结果看来,computeIfAbsent
的性能相比先 get
稍微低一些,但性能至少在同一个数量级上了。而且两个用例运行期间 CPU 都是满载的。
总结
- 如果在 Java 8 的环境下使用
ConcurrentHashMap
,一定要注意是否会并发对同一个 key 调用computeIfAbsent
,如果存在需要先尝试调用get
。
Object result = concurrentMap.get(KEY);
if (null == result) {
result = concurrentMap.computeIfAbsent(KEY, __ -> VALUE);
}
return result;
- 或者干脆升级到 Java 11 或 Java 17。
转载自:https://juejin.cn/post/7094561581631012878