在SQL中串联两个隐约相关表的快速方法
每隔一段时间,我都会遇到一个神秘的NATURAL JOIN
SQL操作符的用例,当我能把它变成NATURAL FULL JOIN
,我就更高兴了。过去的几篇关于这个主题的博文包括:
最近,我在Reddit上偶然发现了一个问题:
起初我想到了UNION CORRESPONDING
语法,这在大多数SQL方言中并不存在,即使它是一个标准功能。但后来,我想起这又是一个关于NATURAL FULL JOIN
的完美用例,这次与上面的例子稍有不同,即两个表的内容要进行比较。这一次,我们要确保两个连接的表永远没有匹配的行,以获得类似UNION
的行为。
考虑一下Sakila数据库,在该数据库中,我们有3个包含人的表,包括:
ACTOR
CUSTOMER
STAFF
这些表的定义如下:
CREATE TABLE actor (
actor_id integer NOT NULL PRIMARY KEY,
first_name varchar(45) NOT NULL,
last_name varchar(45) NOT NULL,
last_update timestamp NOT NULL
);
CREATE TABLE customer (
customer_id integer NOT NULL PRIMARY KEY,
store_id integer NOT NULL,
first_name varchar(45) NOT NULL,
last_name varchar(45) NOT NULL,
email varchar(50),
address_id integer NOT NULL,
active boolean NOT NULL,
create_date date NOT NULL,
last_update timestamp
);
CREATE TABLE staff (
staff_id integer NOT NULL,
first_name varchar(45) NOT NULL,
last_name varchar(45) NOT NULL,
address_id integer NOT NULL,
email varchar(50),
store_id integer NOT NULL,
active boolean NOT NULL,
username varchar(16) NOT NULL,
password varchar(40),
last_update timestamp NOT NULL,
picture bytea
);
可以看出,常见的列实际上只有(FIRST_NAME, LAST_NAME, LAST_UPDATE)
,所有其他的列都是针对表的。使用下面的查询,我们可以将所有的数据连接起来:
SELECT *
FROM (SELECT 'actor' AS source, * FROM actor) AS a
NATURAL FULL JOIN (SELECT 'customer' AS source, * FROM customer) AS c
NATURAL FULL JOIN (SELECT 'staff' AS source, * FROM staff) AS s;
结果看起来是这样的:
|source |first_name|last_name|last_update |actor_id|...|customer_id|...|staff_id|...|
|--------|----------|---------|-----------------------|--------|---|-----------|---|--------|---|
|actor |PENELOPE |GUINESS |2006-02-15 04:34:33.000|1 |...| | | | |
|actor |NICK |WAHLBERG |2006-02-15 04:34:33.000|2 |...| | | | |
|actor |ED |CHASE |2006-02-15 04:34:33.000|3 |...| | | | |
|customer|MARY |SMITH |2006-02-15 04:57:20.000| | |1 |...| | |
|customer|PATRICIA |JOHNSON |2006-02-15 04:57:20.000| | |2 |...| | |
|customer|LINDA |WILLIAMS |2006-02-15 04:57:20.000| | |3 |...| | |
|staff |Mike |Hillyer |2006-02-15 04:57:16.000| | | | |1 |...|
|staff |Jon |Stephens |2006-02-15 04:57:16.000| | | | |2 |...|
一些观察结果:
NATURAL JOIN
的匹配列(即具有相同名称的列)位于开头。它们包括合成的SOURCE
列,这对每个连接源来说都是不同的,所以我们从来没有匹配过,这是我们想要的。我们想要UNION
语义(即连接3个表),而不是匹配它们。- 之后列出了每个表所特有的列。它们只包含属于相关的数据。
SOURCE
这种技术显然不适合每天使用,但偶尔也会有用。所以,不要低估了 "数据 "的力量。NATURAL FULL JOIN
注意事项
我作了点小弊,实际的Sakila数据库模式在CUSTOMER
和STAFF
表之间有一个冲突:
CREATE TABLE customer (
customer_id integer NOT NULL PRIMARY KEY,
store_id integer NOT NULL,
first_name varchar(45) NOT NULL,
last_name varchar(45) NOT NULL,
email varchar(50),
address_id integer NOT NULL,
activebool boolean NOT NULL,
create_date date NOT NULL,
last_update timestamp
active integer NOT NULL, -- This is an integer
);
有了这个表的定义,我的查询就出现了这个错误:
SQL错误[42804]。ERROR:JOIN/USING类型整数和布尔值不能匹配
因此,为了解决这个问题,我修补了CUSTOMER
表的定义:
-- Patch
WITH customer AS (
SELECT
customer_id,
store_id,
first_name,
last_name,
email,
address_id,
activebool as active,
create_date,
last_update
FROM customer
)
-- Original query
SELECT *
FROM (SELECT 'actor' AS source, * FROM actor) AS a
NATURAL FULL JOIN (SELECT 'customer' AS source, * FROM customer) AS c
NATURAL FULL JOIN (SELECT 'staff' AS source, * FROM staff) AS s;
希望BigQuery的有用 [* REPLACE (...)](https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#select_replace)
语法能够更广泛地使用。
转载自:https://juejin.cn/post/7126041305674678308