LeetCode数据库SQL练习(二)
- 练习题目来自:leetcode-cn.com/
删除重复的电子邮箱
编写一个 SQL 查询,来删除 Person
表中所有重复的电子邮箱,重复的邮箱里只保留 Id 最小 的那个。
+----+------------------+
| Id | Email |
+----+------------------+
| 1 | john@example.com |
| 2 | bob@example.com |
| 3 | john@example.com |
+----+------------------+
Id 是这个表的主键。
例如,在运行你的查询语句之后,上面的 Person
表应返回以下几行:
+----+------------------+
| Id | Email |
+----+------------------+
| 1 | john@example.com |
| 2 | bob@example.com |
+----+------------------+
提示:
-
执行 SQL 之后,输出是整个
Person
表。 -
使用
delete
语句。 -
找出所有重复的电子邮箱,删除Id大的重复邮箱
-
SQL1:
-
delete p1 from Person p1, Person p2 where p1.Email = p2.Email and p1.Id > p2.Id
-
-
SQL2:
-
delete p1 from Person p1 left join Person p2 on p1.Email = p2.Email where p1.Id > p2.Id
-
-
SQL3:
-
delete from Person where Id not in (select p.mId from (select min(Id) mId from Person group by Email) p)
-
上升的温度
表 Weather
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| id | int |
| recordDate | date |
| temperature | int |
+---------------+---------+
id 是这个表的主键
该表包含特定日期的温度信息
编写一个 SQL 查询,来查找与之前(昨天的)日期相比温度更高的所有日期的 id
。
返回结果 不要求顺序 。 查询结果格式如下例:
Weather
+----+------------+-------------+
| id | recordDate | Temperature |
+----+------------+-------------+
| 1 | 2015-01-01 | 10 |
| 2 | 2015-01-02 | 25 |
| 3 | 2015-01-03 | 20 |
| 4 | 2015-01-04 | 30 |
+----+------------+-------------+
Result table:
+----+
| id |
+----+
| 2 |
| 4 |
+----+
2015-01-02 的温度比前一天高(10 -> 25)
2015-01-04 的温度比前一天高(20 -> 30)
-
此题目中id与recordDate并不是正相关关系,所以必须从recordDate下手。
-
相关日期函数:
- TO_DAYS(date)
- 计算日期date距离0000年1月1日的天数。
- DATE_ADD()
- 函数向日期添加指定的时间间隔。
- DATE_ADD(date,INTERVAL expr type)
- date参数是合法的日期表达式。expr参数是您希望添加的时间间隔。
- DATEDIFF()
- 函数返回两个日期之间的天数。
- DATEDIFF(date1, date2)
- date1和date2参数是合法的日期或日期/时间表达式, date1 - date2
- TO_DAYS(date)
-
SQL1:
-
select w1.id from Weather w1, Weather w2 where w1.Temperature > w2.Temperature and TO_DAYS(w1.recordDate) = TO_DAYS(w2.RecordDate) + 1
-
-
SQL2:
-
select w1.id from Weather w1, Weather w2 where w1.Temperature > w2.Temperature and DATE_ADD(w2.recordDate, INTERVAL 1 DAY) = w1.recordDate
-
-
SQL3:
-
select w1.id from Weather w1, Weather w2 where w1.Temperature > w2.Temperature and DATEDIFF(w1.recordDate, w2.recordDate) = 1
-
行程和用户
表:Trips
+-------------+----------+
| Column Name | Type |
+-------------+----------+
| Id | int |
| Client_Id | int |
| Driver_Id | int |
| City_Id | int |
| Status | enum |
| Request_at | date |
+-------------+----------+
Id 是这张表的主键。
这张表中存所有出租车的行程信息。每段行程有唯一 Id ,其中 Client_Id 和 Driver_Id 是 Users 表中 Users_Id 的外键。
Status 是一个表示行程状态的枚举类型,枚举成员为(‘completed’, ‘cancelled_by_driver’, ‘cancelled_by_client’) 。
表:Users
+-------------+----------+
| Column Name | Type |
+-------------+----------+
| Users_Id | int |
| Banned | enum |
| Role | enum |
+-------------+----------+
Users_Id 是这张表的主键。
这张表中存所有用户,每个用户都有一个唯一的 Users_Id ,Role 是一个表示用户身份的枚举类型,枚举成员为 (‘client’, ‘driver’, ‘partner’) 。
Banned 是一个表示用户是否被禁止的枚举类型,枚举成员为 (‘Yes’, ‘No’) 。
写一段 SQL 语句查出"2013-10-01"
至"2013-10-03"
期间非禁止用户(乘客和司机都必须未被禁止)的取消率。非禁止用户即 Banned 为 No 的用户,禁止用户即 Banned 为 Yes 的用户。
取消率 的计算方式如下:(被司机或乘客取消的非禁止用户生成的订单数量) / (非禁止用户生成的订单总数)。
返回结果表中的数据可以按任意顺序组织。其中取消率 Cancellation Rate
需要四舍五入保留 两位小数 。
查询结果格式如下例所示:
Trips 表:
+----+-----------+-----------+---------+---------------------+------------+
| Id | Client_Id | Driver_Id | City_Id | Status | Request_at |
+----+-----------+-----------+---------+---------------------+------------+
| 1 | 1 | 10 | 1 | completed | 2013-10-01 |
| 2 | 2 | 11 | 1 | cancelled_by_driver | 2013-10-01 |
| 3 | 3 | 12 | 6 | completed | 2013-10-01 |
| 4 | 4 | 13 | 6 | cancelled_by_client | 2013-10-01 |
| 5 | 1 | 10 | 1 | completed | 2013-10-02 |
| 6 | 2 | 11 | 6 | completed | 2013-10-02 |
| 7 | 3 | 12 | 6 | completed | 2013-10-02 |
| 8 | 2 | 12 | 12 | completed | 2013-10-03 |
| 9 | 3 | 10 | 12 | completed | 2013-10-03 |
| 10 | 4 | 13 | 12 | cancelled_by_driver | 2013-10-03 |
+----+-----------+-----------+---------+---------------------+------------+
Users 表:
+----------+--------+--------+
| Users_Id | Banned | Role |
+----------+--------+--------+
| 1 | No | client |
| 2 | Yes | client |
| 3 | No | client |
| 4 | No | client |
| 10 | No | driver |
| 11 | No | driver |
| 12 | No | driver |
| 13 | No | driver |
+----------+--------+--------+
Result 表:
+------------+-------------------+
| Day | Cancellation Rate |
+------------+-------------------+
| 2013-10-01 | 0.33 |
| 2013-10-02 | 0.00 |
| 2013-10-03 | 0.50 |
+------------+-------------------+
2013-10-01:
- 共有 4 条请求,其中 2 条取消。
- 然而,Id=2 的请求是由禁止用户(User_Id=2)发出的,所以计算时应当忽略它。
- 因此,总共有 3 条非禁止请求参与计算,其中 1 条取消。
- 取消率为 (1 / 3) = 0.33
2013-10-02:
- 共有 3 条请求,其中 0 条取消。
- 然而,Id=6 的请求是由禁止用户发出的,所以计算时应当忽略它。
- 因此,总共有 2 条非禁止请求参与计算,其中 0 条取消。
- 取消率为 (0 / 2) = 0.00
2013-10-03:
- 共有 3 条请求,其中 1 条取消。
- 然而,Id=8 的请求是由禁止用户发出的,所以计算时应当忽略它。
- 因此,总共有 2 条非禁止请求参与计算,其中 1 条取消。
- 取消率为 (1 / 2) = 0.50
- 计算取消率 Cancellation Rate的几种方法:
round(sum(if (Status = 'completed', 0, 1)) / count(Status), 2)
round(avg(Status!='completed'), 2)
round (cast(sum(case when Status = 'completed' then 0 else 1 end) as float) / count(t.Status),2)
- SQL:
-
select Request_at Day, round(avg(Status!='completed'), 2) 'Cancellation Rate' from Trips t join Users u1 on (t.Client_Id = u1.Users_Id and u1.Banned = 'No') join Users u2 on (t.Driver_Id = u2.Users_Id and u2.Banned = 'No') where Request_at between '2013-10-01' and '2013-10-03' group by Request_at
-
游戏玩法分析系列
活动表 Activity
:
+--------------+---------+
| Column Name | Type |
+--------------+---------+
| player_id | int |
| device_id | int |
| event_date | date |
| games_played | int |
+--------------+---------+
表的主键是 (player_id, event_date)。
这张表展示了一些游戏玩家在游戏平台上的行为活动。
每行数据记录了一名玩家在退出平台之前,当天使用同一台设备登录平台后打开的游戏的数目(可能是 0 个)。
1、写一条 SQL 查询语句获取每位玩家 第一次登陆平台的日期。 查询结果的格式如下所示:
Activity 表:
+-----------+-----------+------------+--------------+
| player_id | device_id | event_date | games_played |
+-----------+-----------+------------+--------------+
| 1 | 2 | 2016-03-01 | 5 |
| 1 | 2 | 2016-05-02 | 6 |
| 2 | 3 | 2017-06-25 | 1 |
| 3 | 1 | 2016-03-02 | 0 |
| 3 | 4 | 2018-07-03 | 5 |
+-----------+-----------+------------+--------------+
Result 表:
+-----------+-------------+
| player_id | first_login |
+-----------+-------------+
| 1 | 2016-03-01 |
| 2 | 2017-06-25 |
| 3 | 2016-03-02 |
+-----------+-------------+
- SQL:
-
select player_id, min(event_date) first_login from Activity group by player_id
-
2、请编写一个 SQL 查询,描述每一个玩家首次登陆的设备名称。 查询结果格式在以下示例中:
Activity table:
+-----------+-----------+------------+--------------+
| player_id | device_id | event_date | games_played |
+-----------+-----------+------------+--------------+
| 1 | 2 | 2016-03-01 | 5 |
| 1 | 2 | 2016-05-02 | 6 |
| 2 | 3 | 2017-06-25 | 1 |
| 3 | 1 | 2016-03-02 | 0 |
| 3 | 4 | 2018-07-03 | 5 |
+-----------+-----------+------------+--------------+
Result table:
+-----------+-----------+
| player_id | device_id |
+-----------+-----------+
| 1 | 2 |
| 2 | 3 |
| 3 | 1 |
+-----------+-----------+
-
where后面跟两个列名,只有这两个列同时在第一个查询中才会输出
-
SQL:
-
select player_id, device_id from Activity where (player_id, event_date) in (select player_id, min(event_date) first_login from Activity group by player_id)
-
3、编写一个 SQL 查询,同时报告每组玩家和日期,以及玩家到目前为止玩了多少游戏。也就是说,在此日期之前玩家所玩的游戏总数。详细情况请查看示例。 查询结果格式如下所示:
Activity table:
+-----------+-----------+------------+--------------+
| player_id | device_id | event_date | games_played |
+-----------+-----------+------------+--------------+
| 1 | 2 | 2016-03-01 | 5 |
| 1 | 2 | 2016-05-02 | 6 |
| 1 | 3 | 2017-06-25 | 1 |
| 3 | 1 | 2016-03-02 | 0 |
| 3 | 4 | 2018-07-03 | 5 |
+-----------+-----------+------------+--------------+
Result table:
+-----------+------------+---------------------+
| player_id | event_date | games_played_so_far |
+-----------+------------+---------------------+
| 1 | 2016-03-01 | 5 |
| 1 | 2016-05-02 | 11 |
| 1 | 2017-06-25 | 12 |
| 3 | 2016-03-02 | 0 |
| 3 | 2018-07-03 | 5 |
+-----------+------------+---------------------+
对于 ID 为 1 的玩家,2016-05-02 共玩了 5+6=11 个游戏,2017-06-25 共玩了 5+6+1=12 个游戏。
对于 ID 为 3 的玩家,2018-07-03 共玩了 0+5=5 个游戏。
请注意,对于每个玩家,我们只关心玩家的登录日期。
- SQL1:
- 窗口函数,以玩家分组按时间排序。
-
select player_id, event_date, sum(games_played) over(partition by player_id order by event_date) games_played_so_far from Activity
- SQL2:
-
select t1.player_id, t1.event_date, sum(t2.games_played) games_played_so_far from Activity t1, Activity t2 where t1.player_id = t2.player_id and t1.event_date >= t2.event_date group by t1.player_id, t1.event_date order by player_id, event_date
-
4、编写一个 SQL 查询,报告在首次登录的第二天再次登录的玩家的比率,四舍五入到小数点后两位。换句话说,您需要计算从首次登录日期开始至少连续两天登录的玩家的数量,然后除以玩家总数。 查询结果格式如下所示:
Activity table:
+-----------+-----------+------------+--------------+
| player_id | device_id | event_date | games_played |
+-----------+-----------+------------+--------------+
| 1 | 2 | 2016-03-01 | 5 |
| 1 | 2 | 2016-03-02 | 6 |
| 2 | 3 | 2017-06-25 | 1 |
| 3 | 1 | 2016-03-02 | 0 |
| 3 | 4 | 2018-07-03 | 5 |
+-----------+-----------+------------+--------------+
Result table:
+-----------+
| fraction |
+-----------+
| 0.33 |
+-----------+
只有 ID 为 1 的玩家在第一天登录后才重新登录,所以答案是 1/3 = 0.33
-
题目要求计算从首次登录日期开始至少连续两天登录的玩家的数量,然后除以玩家总数。先找到玩家首次登录的日期,计算首次登录的日期与登录日期的差值,为1则代表连续两天登录,找出差值为1的玩家数。
-
首次登录的第二天再次登录:
sum(if(DATEDIFF(a.event_date, b.first_date) = 1, 1, 0))
sum(case when DATEDIFF(a.event_date, b.first_date) = 1) then 1 else 0 end)
-
SQL:
-
select round(sum(if(DATEDIFF(t1.event_date, t2.first_date) = 1, 1, 0)) / (select count(distinct player_id) from Activity), 2) fraction from Activity t1 left join (select player_id, min(event_date) first_date from Activity group by player_id) t2 on t1.player_id = t2.player_id
-
5、玩家的 安装日期 定义为该玩家的第一个登录日。
玩家的 第一天留存率 定义为:假定安装日期为 X
的玩家的数量为 N
,其中在 X
之后的某一天重新登录的玩家数量为 M
,M/N
就是第一天留存率,四舍五入到小数点后两位。
编写一个 SQL 查询,报告所有安装日期、当天安装游戏的玩家数量和玩家的第一天留存率。
查询结果格式如下所示:
Activity 表:
+-----------+-----------+------------+--------------+
| player_id | device_id | event_date | games_played |
+-----------+-----------+------------+--------------+
| 1 | 2 | 2016-03-01 | 5 |
| 1 | 2 | 2016-03-02 | 6 |
| 2 | 3 | 2017-06-25 | 1 |
| 3 | 1 | 2016-03-01 | 0 |
| 3 | 4 | 2016-07-03 | 5 |
+-----------+-----------+------------+--------------+
Result 表:
+------------+----------+----------------+
| install_dt | installs | Day1_retention |
+------------+----------+----------------+
| 2016-03-01 | 2 | 0.50 |
| 2017-06-25 | 1 | 0.00 |
+------------+----------+----------------+
玩家 1 和 3 在 2016-03-01 安装了游戏,但只有玩家 1 在 2016-03-02 重新登录,所以 2016-03-01 的第一天留存率是 1/2=0.50
玩家 2 在 2017-06-25 安装了游戏,但在 2017-06-26 没有重新登录,因此 2017-06-25 的第一天留存率为 0/1=0.00
- SQL1:
- 次日留存率 = 第一天新增用户数在第二天活跃的用户数 / 当天新增用户数
- 输出结果中,install_dt为首次安装时间即首次登录时间,找到最小时间求出即可
- installs为当天的玩家数量,使用distinct避免重复
- Day1_retention为次日留存率,使用时间差值计算出连续两天登录的人数,除以第一天安装游戏的总人数,结果保留两位小数
- 这里注意需要用登陆时间分组
-
select t2.install_date install_dt, count(distinct t1.player_id) installs, round(count(distinct case when DATEDIFF(event_date,install_date) = 1 then t1.player_id else NULL end) / count(distinct t1.player_id), 2) Day1_retention from Activity t1 left join (select player_id, min(event_date) install_date from Activity group by player_id) t2 on t1.player_id = t2.player_id group by install_date
员工薪水中位数
Employee
表包含所有员工。Employee
表有三列:员工Id,公司名和薪水。
+-----+------------+--------+
|Id | Company | Salary |
+-----+------------+--------+
|1 | A | 2341 |
|2 | A | 341 |
|3 | A | 15 |
|4 | A | 15314 |
|5 | A | 451 |
|6 | A | 513 |
|7 | B | 15 |
|8 | B | 13 |
|9 | B | 1154 |
|10 | B | 1345 |
|11 | B | 1221 |
|12 | B | 234 |
|13 | C | 2345 |
|14 | C | 2645 |
|15 | C | 2645 |
|16 | C | 2652 |
|17 | C | 65 |
+-----+------------+--------+
请编写SQL查询来查找每个公司的薪水中位数。挑战点:你是否可以在不使用任何内置的SQL函数的情况下解决此问题。
+-----+------------+--------+
|Id | Company | Salary |
+-----+------------+--------+
|5 | A | 451 |
|6 | A | 513 |
|12 | B | 234 |
|9 | B | 1154 |
|14 | C | 2645 |
+-----+------------+--------+
-
中位数:
- 按公司分组、薪水排序,总数C为奇数,序号rk=(C+1)/2,总数C为偶数,序号rk为 C/2 和C/2+1,而 C/2 < (C+1)/2 < C/2+1 可合一简化
rk between (cnt/2) and (cnt/2 + 1)
(cnt%2 = 1 and rk = floor(cnt/2) + 1) or (cnt%2 = 0 and (rk = floor(cnt/2) or rk = floor(cnt/2) + 1))
rk in (floor((cnt + 1)/2), floor((cnt + 2)/2))
abs(rn - (cnt + 1)/2) < 1
-
SQL1:
-
select Id, Company, Salary from (select *, row_number() over(partition by Company order by Salary) rk, count(*) over (partition by Company) cnt from Employee) t where (cnt%2 = 1 and rk = floor(cnt/2) + 1) or (cnt%2 = 0 and (rk = floor(cnt/2) or rk = floor(cnt/2) + 1))
-
-
SQL2:
-
select Id, Company, Salary from (select *, row_number() over(partition by Company order by Salary) rk, count(*) over (partition by company) cnt from Employee) t where rk between (cnt/2) and (cnt/2 + 1)
-
-
SQL3:
- 中位数出现的频率一定 ≥ 大于它的数和小于它的数的差的绝对值:
-
select t1.Id, t1.Company, t1.Salary from Employee t1 left join Employee t2 on t1.Company = t2.Company group by t1.Company, t1.Salary having sum(case when t1.Salary = t2.Salary then 1 else 0 end) >= abs(sum(sign(t1.Salary - t2.Salary))) order by t1.Id
至少有5名直接下属的经理
Employee
表包含所有员工和他们的经理。每个员工都有一个 Id,并且还有一列是经理的 Id。
+------+----------+-----------+----------+
|Id |Name |Department |ManagerId |
+------+----------+-----------+----------+
|101 |John |A |null |
|102 |Dan |A |101 |
|103 |James |A |101 |
|104 |Amy |A |101 |
|105 |Anne |A |101 |
|106 |Ron |B |101 |
+------+----------+-----------+----------+
给定 Employee
表,请编写一个SQL查询来查找至少有5名直接下属的经理。对于上表,您的SQL查询应该返回:
+-------+
| Name |
+-------+
| John |
+-------+
注意: 没有人是自己的下属。
- SQL:
-
select Name from Employee where Id in (select ManagerId from Employee group by ManagerId having count(ManagerId) >= 5)
-
给定数字的频率查询中位数
Numbers
表保存数字的值及其频率。
+----------+-------------+
| Number | Frequency |
+----------+-------------|
| 0 | 7 |
| 1 | 1 |
| 2 | 3 |
| 3 | 1 |
+----------+-------------+
在此表中,数字为 0, 0, 0, 0, 0, 0, 0, 1, 2, 2, 2, 3
,所以中位数是 (0 + 0) / 2 = 0
。
+--------+
| median |
+--------|
| 0.0000 |
+--------+
请编写一个查询来查找所有数字的中位数并将结果命名为 median
。
-
SQL1:
- 如果 Number 为中位数,Number(包含本身)前累计的数字应大于等于总数/2,同时Number(不包含本身)前累计数字应小于等于总数/2:
-
select avg(cast(num as float)) median from (select num, Frequency, sum(Frequency) over(order by num) - Frequency prev_sum, sum(Frequency) over(order by num) curr_sum from Numbers) t1, (select sum(Frequency) total_sum from Numbers) t2 where t1.prev_sum <= (cast(t2.total_sum as float) / 2) and t1.curr_sum >= (cast(t2.total_sum as float) / 2)
-
SQL2:
-
select avg(num) median from (select n1.num from Numbers n1 join Numbers n2 on n1.num >= n2.num group by n1.num having sum(n2.Frequency) >= (select sum(Frequency) from Numbers)/2 and sum(n2.Frequency) - avg(n1.Frequency) <= (select sum(Frequency) from Numbers) / 2 )s
-
-
SQL3:
- 某一数的正序和逆序累计均大于等于整个序列的数字个数的一半,将最后选定的一个或两个中位数进行求均值即可:
-
select avg(num) median from ((select num, sum(frequency) over(order by num) sum1, sum(frequency) over(order by num desc) sum2 from Numbers) t1, (select sum(frequency) sum3 from Numbers) t2) where sum1 >= sum3 / 2 and sum2 >= sum3 / 2
转载自:https://juejin.cn/post/7075182703330459685