Advertisement

【学习笔记】阿里云天池龙珠计划SQL训练营-Task06:综合练习题-10道经典题目

阅读量:

本笔记为阿里云天池龙珠计划SQL训练营的学习内容

链接为:https://tianchi.aliyun.com/specials/promotion/aicampsql

练习题1:

请使用A股上市公司季度营收预测数据集《Income Statement.xls》和《Company Operating.xlsx》和《Market Data.xlsx》,以Market Data为主表,将三张表中的TICKER_SYMBOL为600383和600048的信息合并在一起。只需要显示以下字段。

表名 字段名
Income Statement TICKER_SYMBOL
Income Statement END_DATE
Income Statement T_REVENUE
Income Statement T_COGS
Income Statement N_INCOME
Market Data TICKER_SYMBOL
Market Data END_DATE_
Market Data CLOSE_PRICE
Company Operating TICKER_SYMBOL
Company Operating INDIC_NAME_EN
Company Operating END_DATE
Company Operating VALUE

解题思路:先从三张表中找出TICKER_SYMBOL为600383和600048的信息,在以Market Data为主表,采用left join对三表进行合并

复制代码
 SELECT * FROM

    
 (SELECT TICKER_SYMBOL,END_DATE,CLOSE_PRICE from `market data` where TICKER_SYMBOL in('600383','600048')) as m 
    
 left JOIN
    
 (select TICKER_SYMBOL,END_DATE,T_REVENUE,T_COGS,N_INCOME FROM `income statement` where TICKER_SYMBOL in ('600383','600048')) as i
    
 on m.TICKER_SYMBOL = i.TICKER_SYMBOL AND m.END_DATE = i.END_DATE
    
 left JOIN
    
 (SELECT TICKER_SYMBOL,INDIC_NAME_EN,END_DATE,`VALUE` from `company operating` where TICKER_SYMBOL in('600383','600048')) as c
    
 on m.TICKER_SYMBOL = c.TICKER_SYMBOL and m.END_DATE = c.END_DATE

练习题2:

请使用 Wine Quality Data 数据集《winequality-red.csv》,找出 pH=3.03的所有红葡萄酒,然后,对其 citric acid 进行中式排名(相同排名的下一个名次应该是下一个连续的整数值。换句话说,名次之间不应该有“间隔”)

解题思路:1、采用where将ph=3.03的找出;2、中式排名:dense_rank()窗口函数

复制代码
 SELECT

    
 	*,
    
 	DENSE_RANK() over (ORDER BY `citric acid`) as NUM  
    
 FROM
    
 	`winequality-red` 
    
 WHERE pH = 3.03

练习题3:

使用Coupon Usage Data for O2O中的数据集《ccf_offline_stage1_test_revised.csv》,试分别找出在2016年7月期间,发放优惠券总金额最多和发放优惠券张数最多的商家。

这里只考虑满减的金额,不考虑打几折的优惠券。

解题思路:

1、时间范围:2016年7月;

2、优惠券金额格式处理:SUBSTRING_INDEX( Discount_rate, ':',- 1 ),同时需要剔除折扣优惠券

一、发放优惠券总金额最多的商家

复制代码
 SELECT

    
 	Merchant_id,
    
 	Date_received,
    
 	sum(sUBSTRING_INDEX( Discount_rate, ':',- 1 )) AS sum_amount 
    
 FROM
    
 	ccf_offline_stage1_test_revised 
    
 WHERE
    
 	DATE_FORMAT( Date_received, '%Y%m' ) = '201607' 
    
 	AND SUBSTRING_INDEX( Discount_rate, ':',- 1 ) >= 1
    
 GROUP BY Merchant_id
    
 ORDER BY sum_amount desc
    
 LIMIT 1

二、发放优惠券张数最多的商家

复制代码
 SELECT Merchant_id,Date_received,COUNT(Coupon_id) as count_coupon_id

    
 from ccf_offline_stage1_test_revised
    
 WHERE
    
 DATE_FORMAT( Date_received, '%Y%m' ) = '201607' 
    
 	AND SUBSTRING_INDEX( Discount_rate, ':',- 1 ) >= 1
    
 GROUP BY Merchant_id
    
 ORDER BY count_coupon_id desc
    
 LIMIT 1

练习题4:

请使用A股上市公司季度营收预测中的数据集《Macro&Industry.xlsx》中的sheet-INDIC_DATA,请计算全社会用电量:第一产业:当月值 在2015年用电最高峰是发生在哪月?并且相比去年同期增长/减少了多少个百分比?

解题思路:

1、全社会用电量:第一产业——Total Electricity Consumption: Primary Industry,获取ID为indic_id =2020101522

2、筛选2015年最高峰的月份

复制代码
 SELECT PERIOD_DATE,max(DATA_VALUE) MAX_VALUE FROM `macro industry` WHERE indic_id = '2020101522' AND YEAR(PERIOD_DATE) = 2015

    
 GROUP BY PERIOD_DATE ORDER BY MAX_VALUE DESC LIMIT 1

3、根据最高峰月份,提取去年同期值,进行同比

复制代码
 SELECT

    
 	a.PERIOD_DATE,a.max_value,
    
 	CONCAT(round((a.max_value-b.max_value)/b.max_value*100,2),'%') as g_rate
    
 FROM
    
 	(
    
 	SELECT
    
 		PERIOD_DATE,
    
 		max( DATA_VALUE ) MAX_VALUE 
    
 	FROM
    
 		`macro industry` 
    
 	WHERE
    
 		indic_id = '2020101522' 
    
 		AND YEAR ( PERIOD_DATE ) = 2015 
    
 	GROUP BY
    
 		PERIOD_DATE 
    
 	ORDER BY
    
 	MAX_VALUE DESC 
    
 	LIMIT 1) as a 
    
 	left join
    
 	(
    
 	SELECT
    
 		PERIOD_DATE,
    
 		max( DATA_VALUE ) MAX_VALUE 
    
 	FROM
    
 		`macro industry` 
    
 	WHERE
    
 		indic_id = '2020101522' 
    
 		AND YEAR ( PERIOD_DATE ) = 2014
    
 	GROUP BY
    
 		PERIOD_DATE 
    
 	ORDER BY
    
 	MAX_VALUE DESC 
    
 	LIMIT 1) as b
    
 ON YEAR ( a.PERIOD_DATE ) = YEAR ( b.PERIOD_DATE ) + 1 
    
 AND MONTH ( a.PERIOD_DATE ) = MONTH (b.PERIOD_DATE)

练习题5:

使用Coupon Usage Data for O2O中的数据集《ccf_online_stage1_train.csv》,试统计在2016年6月期间,线上总体优惠券弃用率为多少?并找出优惠券弃用率最高的商家。

弃用率 = 被领券但未使用的优惠券张数 / 总的被领取优惠券张数

解题思路:

1、时间端2016年6月

2、被领券但未使用的优惠券:date is null and Coupon_id is not null

3、总的被领取优惠券:Coupon_id is not null

一、线上总体优惠券弃用率:

复制代码
 SELECT

    
 	CONCAT(round(sum(case when date is null and Coupon_id is not null then 1 else 0 end)/
    
 	sum(case when Coupon_id is not null then 1 else 0 end)*100,2),'%') as discard_rate		
    
 FROM
    
 	ccf_online_stage1_train 
    
 WHERE
    
 	DATE_FORMAT( Date_received, '%Y%m' ) = '201606'

二、优惠券弃用率最高的商家

复制代码
 SELECT

    
 Merchant_id,
    
 	CONCAT(round(sum(case when date is null and Coupon_id is not null then 1 else 0 end)/
    
 	sum(case when Coupon_id is not null then 1 else 0 end)*100,2),'%') as discard_rate		
    
 FROM
    
 	ccf_online_stage1_train 
    
 WHERE
    
 	DATE_FORMAT( Date_received, '%Y%m' ) = '201606'
    
 GROUP BY Merchant_id
    
 ORDER BY discard_rate desc
    
 LIMIT 1

练习题6:

请使用 Wine Quality Data 数据集《winequality-white.csv》,找出 pH=3.63的所有白葡萄酒,然后,对其 residual sugar 量进行英式排名(非连续的排名)

解题思路:1、ph=3.63,2、英式排名:rank()窗口函数

复制代码
 SELECT

    
 	*,
    
 	RANK() over(ORDER BY `residual sugar`) num_rank
    
 FROM
    
 	`winequality-white` 
    
 WHERE
    
 	ph = 3.63

练习题7:

请使用A股上市公司季度营收预测中的数据集《Market Data.xlsx》中的sheet-DATA,

计算截止到2018年底,市值最大的三个行业是哪些?以及这三个行业里市值最大的三个公司是哪些?(每个行业找出前三大的公司,即一共要找出9个)

解题思路:1、2018年年底,实际数据只到18年5月底

2、三大行业用sum进行求和排序,三个行业中最大市值的三个公司,是以行业为分组进行窗口函数

一、市值最大的三个行业

复制代码
 SELECT

    
 	TYPE_NAME_EN,
    
 	TYPE_NAME_CN,
    
 	sum( MARKET_VALUE )  as sum_value
    
 FROM
    
 	`market data` 
    
 WHERE
    
 	DATE_FORMAT( END_DATE, '%Y%m' ) = '201805'
    
 GROUP BY TYPE_NAME_EN
    
 ORDER BY sum_value desc
    
 LIMIT 3

二、市值最大的三个行业中的最大三家公司,共计9家

复制代码
 SELECT * from

    
 (SELECT
    
 	TYPE_NAME_EN,
    
 	TYPE_NAME_CN,
    
 	TICKER_SYMBOL,
    
 	MARKET_VALUE,
    
 	DENSE_RANK() over(PARTITION by TYPE_NAME_EN ORDER BY MARKET_VALUE desc) num_value
    
 FROM
    
 	`market data` 
    
 WHERE
    
 	DATE_FORMAT( END_DATE, '%Y%m' ) = '201805' and TYPE_NAME_EN in ('Bank','Medicine and Biology','Non-bank Finance')
    
 GROUP BY TYPE_NAME_EN,TICKER_SYMBOL) as a
    
 where num_value in (1,2,3)

练习题8:

使用Coupon Usage Data for O2O中的数据集《ccf_online_stage1_train.csv》和《ccf_offline_stage1_train.csv》,试找出在2016年6月期间,线上线下累计优惠券使用次数最多的顾客。

解题思路:1、时间2016年6月;2、分别统计线上跟线下使用次数,再进行合并求和得出使用次数最多的顾客

复制代码
 SELECT user_id,sum(action) as sum_action from

    
 (SELECT user_id,count(user_id) as action from ccf_online_stage1_train 
    
 where DATE_FORMAT(date,'%Y%m') = 201606 AND Coupon_id is not null
    
 GROUP BY User_id
    
 union all 
    
 SELECT user_id,count(user_id) as action from ccf_offline_stage1_train 
    
 where DATE_FORMAT(date,'%Y%m') = 201606 AND Coupon_id is not null
    
 GROUP BY user_id) b
    
 GROUP BY user_id
    
 ORDER BY sum_action desc
    
 LIMIT 1

练习题9:

请使用A股上市公司季度营收预测数据集《Income Statement.xls》中的sheet-General Business和《Company Operating.xlsx》中的sheet-EN。

找出在数据集所有年份中,按季度统计,白云机场旅客吞吐量最高的那一季度对应的净利润 是多少?(注意,是单季度对应的净利润,非累计净利润。)

解题思路:1、先找吞吐量最高的季度;2、再匹配对应净利润最多的月份

3、用到的函数:QUARTER(),是对季度周期的时间函数

复制代码
 SELECT a.TICKER_SYMBOL,a.tuntu,b.amount FROM

    
 (SELECT
    
 	TICKER_SYMBOL,
    
 	YEAR ( END_DATE ) YEAR,
    
 	QUARTER ( END_DATE ) Q,
    
 	sum( `VALUE` ) AS tuntu 
    
 FROM
    
 	`company operating` 
    
 WHERE
    
 	INDIC_NAME_EN = 'Baiyun Airport:Aircraft take-off and landing times'
    
 GROUP BY TICKER_SYMBOL,YEAR,Q
    
 ORDER BY tuntu desc
    
 LIMIT 1) a
    
 INNER JOIN
    
 ( select TICKER_SYMBOL,
    
 					YEAR(END_DATE) year,
    
 					QUARTER(END_DATE) Q,
    
 					SUM(N_INCOME) as amount
    
 	from `income statement`
    
 	group by TICKER_SYMBOL,year,Q) b
    
 	on a.TICKER_SYMBOL = b.TICKER_SYMBOL AND a.year = b.year and a.Q = b.Q

练习题10:

使用Coupon Usage Data for O2O中的数据集《ccf_online_stage1_train.csv》和《ccf_offline_stage1_train.csv》,试找出在2016年6月期间,线上线下累计被使用优惠券满减最多的前3名商家。

比如商家A,消费者A在其中使用了一张200减50的,消费者B使用了一张30减1的,那么商家A累计被使用优惠券满减51元。

解题思路:1、时间:2016年6月;2、分别合计线上、线下被使用的优惠券数量;3、合并计算优惠券数额

复制代码
 SELECT Merchant_id,sum(discount) as discount_amount from

    
 (SELECT
    
 	Merchant_id,
    
 	SUBSTRING_INDEX( Discount_rate, ':',- 1 ) as discount 
    
 FROM
    
 	ccf_online_stage1_train 
    
 WHERE
    
 	DATE_FORMAT( date, '%Y%m' ) = 201606 
    
 	AND date IS NOT NULL 
    
 	AND Coupon_id IS NOT NULL
    
 union all
    
  SELECT Merchant_id,SUBSTRING_INDEX(Discount_rate,':',-1) as discount
    
 	from ccf_offline_stage1_train
    
 	where DATE_FORMAT(date,'%Y%m') = 201606
    
 	and date is not null
    
 	and Coupon_id is not null) a  
    
 GROUP BY Merchant_id
    
 ORDER BY discount_amount desc
    
 LIMIT 3

全部评论 (0)

还没有任何评论哟~