hive下merge into实现增量更新方法
一、hive下增量的两种方法
下面列出了两种hive批量增量的方法,我们详细讲一下merge into方法。
- union all方式
insert overwrite table 原表
select * from 增量表 union all
select a.* from 原表 a left join 增量表 b on 1=1 and a.业务不会变化的字段=b.业务不会变化的字段 where b.业务不会变化的字段 is null;
- merge into
MERGE INTO <target table> AS T USING <source expression/table> AS S
ON <``boolean` `expression1>
WHEN MATCHED [AND <``boolean` `expression2>] THEN UPDATE SET <set clause list>
WHEN MATCHED [AND <``boolean` `expression3>] THEN DELETE
WHEN NOT MATCHED [AND <``boolean` `expression4>] THEN INSERT VALUES<value list>
二、测试过程(hive: V 2.2.1)
- 创建存量表a和增量表b,准备测试数据;
如果要支持delete和update,则目标表需是AcidOutputFormat,必须分桶。
而且目前只有ORCFileformat支持AcidOutputFormat,不仅如此建表时必须指定参数
('transactional' = true)
。否则在merge into数据时会报如下错:
org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException [Error 10297]: Attempt to do update or delete on table demo.a that does not use an AcidOutputFormat or is not bucketed
表a:
CREATE table demo.a (id int,name string) CLUSTERED by (id) into 2 buckets STORED as orc TBLPROPERTIES('transactional'='true');
表a插入存量数据:
insert into TABLE demo.a VALUES (1,'wf')
表b:
CREATE table demo.b (id int,name string) stored as orc;
表b插入增量数据(存在旧数据更新,新增数据两种场景):
insert into TABLE demo.b VALUES (1,'wf'),(3,'bb'),(4,'2132'),(8,'21'),(9,'12'),(11,'1')
- 将表b数据更新到表a:
merge into demo.a as T1 USING demo.b as T2
ON T1.id = T2.id
when matched then UPDATE set name = 'wf1'
when NOT matched then insert VALUES(T2.id,T2.name);
- 检查数据结果:
- 表a旧数据已更新为新数据wf1;
- 表b下新增数据已更新到表a。
select * from demo.a;
转载自:https://juejin.cn/post/7005501029250301988