【学习笔记】阿里云天池龙珠计划SQL训练营-Task06:综合练习题-10道经典题目
本笔记为阿里云天池龙珠计划SQL训练营的学习内容
链接为:https://tianchi.aliyun.com/specials/promotion/aicampsql
练习题1:
请使用A股上市公司季度营收预测数据集《Income Statement.xls》和《Company Operating.xlsx》和《Market Data.xlsx》,以Market Data为主表,将三张表中的TICKER_SYMBOL为600383和600048的信息合并在一起。只需要显示以下字段。
| 表名 | 字段名 |
|---|---|
| Income Statement | TICKER_SYMBOL |
| Income Statement | END_DATE |
| Income Statement | T_REVENUE |
| Income Statement | T_COGS |
| Income Statement | N_INCOME |
| Market Data | TICKER_SYMBOL |
| Market Data | END_DATE_ |
| Market Data | CLOSE_PRICE |
| Company Operating | TICKER_SYMBOL |
| Company Operating | INDIC_NAME_EN |
| Company Operating | END_DATE |
| Company Operating | VALUE |
解题思路:先从三张表中找出TICKER_SYMBOL为600383和600048的信息,在以Market Data为主表,采用left join对三表进行合并
SELECT * FROM
(SELECT TICKER_SYMBOL,END_DATE,CLOSE_PRICE from `market data` where TICKER_SYMBOL in('600383','600048')) as m
left JOIN
(select TICKER_SYMBOL,END_DATE,T_REVENUE,T_COGS,N_INCOME FROM `income statement` where TICKER_SYMBOL in ('600383','600048')) as i
on m.TICKER_SYMBOL = i.TICKER_SYMBOL AND m.END_DATE = i.END_DATE
left JOIN
(SELECT TICKER_SYMBOL,INDIC_NAME_EN,END_DATE,`VALUE` from `company operating` where TICKER_SYMBOL in('600383','600048')) as c
on m.TICKER_SYMBOL = c.TICKER_SYMBOL and m.END_DATE = c.END_DATE
练习题2:
请使用 Wine Quality Data 数据集《winequality-red.csv》,找出 pH=3.03的所有红葡萄酒,然后,对其 citric acid 进行中式排名(相同排名的下一个名次应该是下一个连续的整数值。换句话说,名次之间不应该有“间隔”)
解题思路:1、采用where将ph=3.03的找出;2、中式排名:dense_rank()窗口函数
SELECT
*,
DENSE_RANK() over (ORDER BY `citric acid`) as NUM
FROM
`winequality-red`
WHERE pH = 3.03
练习题3:
使用Coupon Usage Data for O2O中的数据集《ccf_offline_stage1_test_revised.csv》,试分别找出在2016年7月期间,发放优惠券总金额最多和发放优惠券张数最多的商家。
这里只考虑满减的金额,不考虑打几折的优惠券。
解题思路:
1、时间范围:2016年7月;
2、优惠券金额格式处理:SUBSTRING_INDEX( Discount_rate, ':',- 1 ),同时需要剔除折扣优惠券
一、发放优惠券总金额最多的商家
SELECT
Merchant_id,
Date_received,
sum(sUBSTRING_INDEX( Discount_rate, ':',- 1 )) AS sum_amount
FROM
ccf_offline_stage1_test_revised
WHERE
DATE_FORMAT( Date_received, '%Y%m' ) = '201607'
AND SUBSTRING_INDEX( Discount_rate, ':',- 1 ) >= 1
GROUP BY Merchant_id
ORDER BY sum_amount desc
LIMIT 1
二、发放优惠券张数最多的商家
SELECT Merchant_id,Date_received,COUNT(Coupon_id) as count_coupon_id
from ccf_offline_stage1_test_revised
WHERE
DATE_FORMAT( Date_received, '%Y%m' ) = '201607'
AND SUBSTRING_INDEX( Discount_rate, ':',- 1 ) >= 1
GROUP BY Merchant_id
ORDER BY count_coupon_id desc
LIMIT 1
练习题4:
请使用A股上市公司季度营收预测中的数据集《Macro&Industry.xlsx》中的sheet-INDIC_DATA,请计算全社会用电量:第一产业:当月值 在2015年用电最高峰是发生在哪月?并且相比去年同期增长/减少了多少个百分比?
解题思路:
1、全社会用电量:第一产业——Total Electricity Consumption: Primary Industry,获取ID为indic_id =2020101522
2、筛选2015年最高峰的月份
SELECT PERIOD_DATE,max(DATA_VALUE) MAX_VALUE FROM `macro industry` WHERE indic_id = '2020101522' AND YEAR(PERIOD_DATE) = 2015
GROUP BY PERIOD_DATE ORDER BY MAX_VALUE DESC LIMIT 1
3、根据最高峰月份,提取去年同期值,进行同比
SELECT
a.PERIOD_DATE,a.max_value,
CONCAT(round((a.max_value-b.max_value)/b.max_value*100,2),'%') as g_rate
FROM
(
SELECT
PERIOD_DATE,
max( DATA_VALUE ) MAX_VALUE
FROM
`macro industry`
WHERE
indic_id = '2020101522'
AND YEAR ( PERIOD_DATE ) = 2015
GROUP BY
PERIOD_DATE
ORDER BY
MAX_VALUE DESC
LIMIT 1) as a
left join
(
SELECT
PERIOD_DATE,
max( DATA_VALUE ) MAX_VALUE
FROM
`macro industry`
WHERE
indic_id = '2020101522'
AND YEAR ( PERIOD_DATE ) = 2014
GROUP BY
PERIOD_DATE
ORDER BY
MAX_VALUE DESC
LIMIT 1) as b
ON YEAR ( a.PERIOD_DATE ) = YEAR ( b.PERIOD_DATE ) + 1
AND MONTH ( a.PERIOD_DATE ) = MONTH (b.PERIOD_DATE)
练习题5:
使用Coupon Usage Data for O2O中的数据集《ccf_online_stage1_train.csv》,试统计在2016年6月期间,线上总体优惠券弃用率为多少?并找出优惠券弃用率最高的商家。
弃用率 = 被领券但未使用的优惠券张数 / 总的被领取优惠券张数
解题思路:
1、时间端2016年6月
2、被领券但未使用的优惠券:date is null and Coupon_id is not null
3、总的被领取优惠券:Coupon_id is not null
一、线上总体优惠券弃用率:
SELECT
CONCAT(round(sum(case when date is null and Coupon_id is not null then 1 else 0 end)/
sum(case when Coupon_id is not null then 1 else 0 end)*100,2),'%') as discard_rate
FROM
ccf_online_stage1_train
WHERE
DATE_FORMAT( Date_received, '%Y%m' ) = '201606'
二、优惠券弃用率最高的商家
SELECT
Merchant_id,
CONCAT(round(sum(case when date is null and Coupon_id is not null then 1 else 0 end)/
sum(case when Coupon_id is not null then 1 else 0 end)*100,2),'%') as discard_rate
FROM
ccf_online_stage1_train
WHERE
DATE_FORMAT( Date_received, '%Y%m' ) = '201606'
GROUP BY Merchant_id
ORDER BY discard_rate desc
LIMIT 1
练习题6:
请使用 Wine Quality Data 数据集《winequality-white.csv》,找出 pH=3.63的所有白葡萄酒,然后,对其 residual sugar 量进行英式排名(非连续的排名)
解题思路:1、ph=3.63,2、英式排名:rank()窗口函数
SELECT
*,
RANK() over(ORDER BY `residual sugar`) num_rank
FROM
`winequality-white`
WHERE
ph = 3.63
练习题7:
请使用A股上市公司季度营收预测中的数据集《Market Data.xlsx》中的sheet-DATA,
计算截止到2018年底,市值最大的三个行业是哪些?以及这三个行业里市值最大的三个公司是哪些?(每个行业找出前三大的公司,即一共要找出9个)
解题思路:1、2018年年底,实际数据只到18年5月底
2、三大行业用sum进行求和排序,三个行业中最大市值的三个公司,是以行业为分组进行窗口函数
一、市值最大的三个行业
SELECT
TYPE_NAME_EN,
TYPE_NAME_CN,
sum( MARKET_VALUE ) as sum_value
FROM
`market data`
WHERE
DATE_FORMAT( END_DATE, '%Y%m' ) = '201805'
GROUP BY TYPE_NAME_EN
ORDER BY sum_value desc
LIMIT 3
二、市值最大的三个行业中的最大三家公司,共计9家
SELECT * from
(SELECT
TYPE_NAME_EN,
TYPE_NAME_CN,
TICKER_SYMBOL,
MARKET_VALUE,
DENSE_RANK() over(PARTITION by TYPE_NAME_EN ORDER BY MARKET_VALUE desc) num_value
FROM
`market data`
WHERE
DATE_FORMAT( END_DATE, '%Y%m' ) = '201805' and TYPE_NAME_EN in ('Bank','Medicine and Biology','Non-bank Finance')
GROUP BY TYPE_NAME_EN,TICKER_SYMBOL) as a
where num_value in (1,2,3)
练习题8:
使用Coupon Usage Data for O2O中的数据集《ccf_online_stage1_train.csv》和《ccf_offline_stage1_train.csv》,试找出在2016年6月期间,线上线下累计优惠券使用次数最多的顾客。
解题思路:1、时间2016年6月;2、分别统计线上跟线下使用次数,再进行合并求和得出使用次数最多的顾客
SELECT user_id,sum(action) as sum_action from
(SELECT user_id,count(user_id) as action from ccf_online_stage1_train
where DATE_FORMAT(date,'%Y%m') = 201606 AND Coupon_id is not null
GROUP BY User_id
union all
SELECT user_id,count(user_id) as action from ccf_offline_stage1_train
where DATE_FORMAT(date,'%Y%m') = 201606 AND Coupon_id is not null
GROUP BY user_id) b
GROUP BY user_id
ORDER BY sum_action desc
LIMIT 1
练习题9:
请使用A股上市公司季度营收预测数据集《Income Statement.xls》中的sheet-General Business和《Company Operating.xlsx》中的sheet-EN。
找出在数据集所有年份中,按季度统计,白云机场旅客吞吐量最高的那一季度对应的净利润 是多少?(注意,是单季度对应的净利润,非累计净利润。)
解题思路:1、先找吞吐量最高的季度;2、再匹配对应净利润最多的月份
3、用到的函数:QUARTER(),是对季度周期的时间函数
SELECT a.TICKER_SYMBOL,a.tuntu,b.amount FROM
(SELECT
TICKER_SYMBOL,
YEAR ( END_DATE ) YEAR,
QUARTER ( END_DATE ) Q,
sum( `VALUE` ) AS tuntu
FROM
`company operating`
WHERE
INDIC_NAME_EN = 'Baiyun Airport:Aircraft take-off and landing times'
GROUP BY TICKER_SYMBOL,YEAR,Q
ORDER BY tuntu desc
LIMIT 1) a
INNER JOIN
( select TICKER_SYMBOL,
YEAR(END_DATE) year,
QUARTER(END_DATE) Q,
SUM(N_INCOME) as amount
from `income statement`
group by TICKER_SYMBOL,year,Q) b
on a.TICKER_SYMBOL = b.TICKER_SYMBOL AND a.year = b.year and a.Q = b.Q
练习题10:
使用Coupon Usage Data for O2O中的数据集《ccf_online_stage1_train.csv》和《ccf_offline_stage1_train.csv》,试找出在2016年6月期间,线上线下累计被使用优惠券满减最多的前3名商家。
比如商家A,消费者A在其中使用了一张200减50的,消费者B使用了一张30减1的,那么商家A累计被使用优惠券满减51元。
解题思路:1、时间:2016年6月;2、分别合计线上、线下被使用的优惠券数量;3、合并计算优惠券数额
SELECT Merchant_id,sum(discount) as discount_amount from
(SELECT
Merchant_id,
SUBSTRING_INDEX( Discount_rate, ':',- 1 ) as discount
FROM
ccf_online_stage1_train
WHERE
DATE_FORMAT( date, '%Y%m' ) = 201606
AND date IS NOT NULL
AND Coupon_id IS NOT NULL
union all
SELECT Merchant_id,SUBSTRING_INDEX(Discount_rate,':',-1) as discount
from ccf_offline_stage1_train
where DATE_FORMAT(date,'%Y%m') = 201606
and date is not null
and Coupon_id is not null) a
GROUP BY Merchant_id
ORDER BY discount_amount desc
LIMIT 3
