Coursera | Databases and SQL for Data Science(IBM) | Assignment
其他资源:
Coursera课程|数据库与SQL数据分析(IBM)|练习题解答
无
Coursera | Databases and SQL for Data Science | Assignment
- 课程作业
- 理解数据集
- 芝加哥的社会经济指标
- 芝加哥公立学校
- 芝加哥犯罪数据
- 芝加哥的社会经济指标
- 理解数据集
以下是对输入文本的同义改写版本
以下是对输入文本的同义改写版本
以上为改写后的文本内容
这个assignment也不算太难。除了最初的数据导入步骤外,其余的部分与quiz类似
!!!:
这里是在向您介绍如何导入本地数据库。具体来说,请按照教程设置好所需表结构及凭证信息,并确保所有设置均为本地环境参数!无需过多说明的是,默认情况下直接运行该操作应该就能顺利完成。请注意,在此过程中请确保所有设置均为本地环境参数!后续SQL语句的操作逻辑保持不变!😊
Course Assignment
Using this Python notebook you will:
- Grasp the details of three datasets from Chicago
- Import three datasets into three tables across a Db2 database
- Run SQL queries to address assignment-related queries
Understand the datasets
To finish the assignment issues in this notebook, you will access three data sources located on the City of Chicago's Data Portal platform.
- 数据集:社会经济指标在芝加哥
- 数据集:芝加哥公立学校进展报告卡(2011年)
- 数据集:芝加哥犯罪数据(从2001年至今)
Socioeconomic Indicators in Chicago
This dataset includes a collection of six socioeconomic metrics of public health importance in each Chicago community area from 2008 to 2012.
For this exercise, you are to be using a snapshot of this dataset, which can be accessed via:
A detailed description of this dataset and its original data can be retrieved from the Chicago Data Portal located at: https://data.cityofchicago.org/Health-Human-Services/Census-Data-Selected-socioeconomic-indicators-in-C/kn9c-c2s2
Chicago Public Schools
This dataset includes all school-level performance data that were utilized to generate CPS School Report Cards for the 2011-2012 academic year. The dataset is made available by the Chicago Data Portal.
In this assignment, you will utilize a sample of this dataset, which is available for download from: Chicago Public School
The primary and foundational datasets, along with their comprehensive overview, are accessible through the Chicago Data Portal at https://data.cityofchicago.org/Education/Chicago-Public-Schools-Progress-Report-Cards-2011-/9xs2-f89t?cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-DB0201EN-SkillsNetwork-20127838&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ.
Chicago Crime Data
该数据集反映了自2001年以来在芝加哥发生的犯罪事件报告(除谋杀案件外,在每一起受害者的资料中存在的情况下)),不包括最近七天的数据。
This dataset is comprehensively extensive, exceeding 1.5 gigabytes in size and containing over 6.5 million entries. For the purposes of this assignment, we will utilize a significantly reduced sample from this dataset, which can be downloaded from: Chicago Crime Data.
A comprehensive overview of the specific data collection and its original source can be accessed via the Chicago Data Portal at: https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2
Download the datasets
In numerous instances, where data analysis is required, datasets are frequently accessible in .CSV format, particularly online. By clicking on the links provided below, you can conveniently access and store these .CSV files for further processing.
CENSUS_DATA: Census Dataset
CHICAGO_PUBLIC SCHOOL SYSTEM High-Performing Public Schools in Chicago
The Chicago Crime Data Repository provides access to the comprehensive dataset detailing crime statistics and incidents across the city of Chicago. (https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DB0201EN-SkillsNetwork/labs/FinalModule_Coursera/data/Chicago_Crime_Data-v2.csv)
NOTE: 请注意,请确保您使用上方提供的链接下载数据集而非直接从芝加哥数据港下载。这些链接中的版本是原始数据集的子集,并且对某些列名进行了修改以使其更适合数据库环境,这将使完成此任务更容易。
Store the datasets in database tables
To analyze the data using SQL, it first needs to be stored in the database.
While importing the dataset into a Pandas DataFrame and subsequently writing back to the database, as demonstrated in our Week 3 Lab exercise, this approach leads to mapping data types that might not be ideal for efficient SQL operations. For instance, a lengthy textual field could end up being stored as a CLOB rather than a VARCHAR.
Therefore, it is strongly advised to perform a manual load of the table using the database console interface, as detailed in Week 2 Lab 1 Part II. The primary variation from this lab compared to Week 2 Lab 1 Part II lies in Step 5 of the instructions, where you are required to click on create "(+) New Table" and specify the desired table name before proceeding.

启动Db2控制台,并导入LOAD工具,随后拖拽包含第一组数据的CSV文件。然后创建一个新的数据表,请按照屏幕上的操作步骤完成数据加载。请为新创建的表命名为以下名称:
- CENSUS_DATA
- CHICAGO_PUBLIC_SCHOOLS
- CHICAGO_CRIME_DATA
Connect to the database
First, we need to install or enable the SQL extension. Then, we should create a session with the database.
%load_ext sql
The sql extension is already loaded. To reload it, use:
%reload_ext sql
Next, input your DB2 connection string into the following cell. Remember that you previously set up Service Credentials for your DB2 instance during Week 3’s first lab. Refer to these credentials’ username field and enter everything after db2:// (excluding any trailing double quotes) into the cell below, following ibm_db_sa:// format.

# Remember the connection string is of the format:
# %sql ibm_db_sa://my-username:my-password@my-hostname:my-port/my-db-name
# Enter the connection string for your Db2 on Cloud database instance below
%sql ibm_db_sa://wls55462:2ls9wcg2s5l7n%4008@dashdb-txn-sbox-yp-dal09-04.services.dal.bluemix.net:50000/BLUDB
'Connected: wls55462@BLUDB'
Problems
Now write and execute SQL queries to solve assignment problems
Problem 1
Find the total number of crimes recorded in the CRIME table
%sql select COUNT(*) as TOTAL_CRIMES from CHICAGO_CRIME_DATA
| total_crimes |
|---|
| 533 |
Problem 2
Retrieve first 10 rows from the CRIME table
%sql SELECT * FROM CHICAGO_CRIME_DATA LIMIT 10
| id | case_number | DATE | block | iucr | primary_type | description | location_description | arrest | domestic | beat | district | ward | community_area_number | fbicode | x_coordinate | y_coordinate | YEAR | updatedon | latitude | longitude | location |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3512276 | HK587712 | 08/28/2004 05:50:56 PM | 047XX S KEDZIE AVE | 890 | THEFT | FROM BUILDING | SMALL RETAIL STORE | FALSE | FALSE | 911 | 9 | 14 | 58 | 6 | 1155838 | 1873050 | 2004 | 02/10/2018 03:50:01 PM | 41.80744050 | -87.70395585 | (41.8074405, -87.703955849) |
| 3406613 | HK456306 | 06/26/2004 12:40:00 PM | 009XX N CENTRAL PARK AVE | 820 | THEFT | $500 AND UNDER | OTHER | FALSE | FALSE | 1112 | 11 | 27 | 23 | 6 | 1152206 | 1906127 | 2004 | 02/28/2018 03:56:25 PM | 41.89827996 | -87.71640551 | (41.898279962, -87.716405505) |
| 8002131 | HT233595 | 04/04/2011 05:45:00 AM | 043XX S WABASH AVE | 820 | THEFT | $500 AND UNDER | NURSING HOME/RETIREMENT HOME | FALSE | FALSE | 221 | 2 | 3 | 38 | 6 | 1177436 | 1876313 | 2011 | 02/10/2018 03:50:01 PM | 41.81593313 | -87.62464213 | (41.815933131, -87.624642127) |
| 7903289 | HT133522 | 12/30/2010 04:30:00 PM | 083XX S KINGSTON AVE | 840 | THEFT | FINANCIAL ID THEFT: OVER $300 | RESIDENCE | FALSE | FALSE | 423 | 4 | 7 | 46 | 6 | 1194622 | 1850125 | 2010 | 02/10/2018 03:50:01 PM | 41.74366532 | -87.56246276 | (41.743665322, -87.562462756) |
| 10402076 | HZ138551 | 02/02/2016 07:30:00 PM | 033XX W 66TH ST | 820 | THEFT | $500 AND UNDER | ALLEY | FALSE | FALSE | 831 | 8 | 15 | 66 | 6 | 1155240 | 1860661 | 2016 | 02/10/2018 03:50:01 PM | 41.77345530 | -87.70648047 | (41.773455295, -87.706480471) |
| 7732712 | HS540106 | 09/29/2010 07:59:00 AM | 006XX W CHICAGO AVE | 810 | THEFT | OVER $500 | PARKING LOT/GARAGE(NON.RESID.) | FALSE | FALSE | 1323 | 12 | 27 | 24 | 6 | 1171668 | 1905607 | 2010 | 02/10/2018 03:50:01 PM | 41.89644677 | -87.64493868 | (41.896446772, -87.644938678) |
| 10769475 | HZ534771 | 11/30/2016 01:15:00 AM | 050XX N KEDZIE AVE | 810 | THEFT | OVER $500 | STREET | FALSE | FALSE | 1713 | 17 | 33 | 14 | 6 | 1154133 | 1933314 | 2016 | 02/10/2018 03:50:01 PM | 41.97284491 | -87.70860008 | (41.972844913, -87.708600079) |
| 4494340 | HL793243 | 12/16/2005 04:45:00 PM | 005XX E PERSHING RD | 860 | THEFT | RETAIL THEFT | GROCERY FOOD STORE | TRUE | FALSE | 213 | 2 | 3 | 38 | 6 | 1180448 | 1879234 | 2005 | 02/28/2018 03:56:25 PM | 41.82387989 | -87.61350386 | (41.823879885, -87.613503857) |
| 3778925 | HL149610 | 01/28/2005 05:00:00 PM | 100XX S WASHTENAW AVE | 810 | THEFT | OVER $500 | STREET | FALSE | FALSE | 2211 | 22 | 19 | 72 | 6 | 1160129 | 1838040 | 2005 | 02/28/2018 03:56:25 PM | 41.71128051 | -87.68917910 | (41.711280513, -87.689179097) |
| 3324217 | HK361551 | 05/13/2004 02:15:00 PM | 033XX W BELMONT AVE | 820 | THEFT | $500 AND UNDER | SMALL RETAIL STORE | FALSE | FALSE | 1733 | 17 | 35 | 21 | 6 | 1153590 | 1921084 | 2004 | 02/28/2018 03:56:25 PM | 41.93929582 | -87.71092344 | (41.939295821, -87.710923442) |
Problem 3
How many crimes involve an arrest?
%sql SELECT COUNT(*) AS NUMBER_OF_ARRESTS \
FROM CHICAGO_CRIME_DATA \
WHERE ARREST = "TRUE"
| number_of_arrests |
|---|
| 163 |
Problem 4
What are the distinct kinds of crimes that have been uniquely recorded at the GAS STATION sites?
%sql SELECT DISTINCT PRIMARY_TYPE \
FROM CHICAGO_CRIME_DATA \
WHERE LOCATION_DESCRIPTION = 'GAS STATION'
| primary_type |
|---|
| CRIMINAL TRESPASS |
| NARCOTICS |
| ROBBERY |
| THEFT |
Hint: Which column lists types of crimes e.g. THEFT?
Problem 5
Enumerate all Community Areas in the CENUS_DATA table where their names begin with the letter ‘B’.
%sql SELECT COMMUNITY_AREA_NAME \
FROM CENSUS_DATA \
WHERE COMMUNITY_AREA_NAME LIKE 'B%'
| community_area_name |
|---|
| Belmont Cragin |
| Burnside |
| Brighton Park |
| Bridgeport |
| Beverly |
Problem 6
Which schools in Community Areas 10 to 15 are healthy school certified?
%sql SELECT NAME_OF_SCHOOL \
FROM CHICAGO_PUBLIC_SCHOOLS \
WHERE COMMUNITY_AREA_NUMBER BETWEEN 10 AND 15 \
AND HEALTHY_SCHOOL_CERTIFIED = 'Yes'
| name_of_school |
|---|
| Rufus M Hitch Elementary School |
Problem 7
What is the average school Safety Score?
%sql SELECT AVG(SAFETY_SCORE) AS AVERAGE \
FROM CHICAGO_PUBLIC_SCHOOLS
| average |
|---|
| 49.504873 |
Problem 8
The top five Community Areas are ranked by their average College Enrollment numbers.
%sql SELECT COMMUNITY_AREA_NAME, AVG(COLLEGE_ENROLLMENT) as Number_of_Students\
FROM CHICAGO_PUBLIC_SCHOOLS \
GROUP BY COMMUNITY_AREA_NAME \
ORDER BY Number_of_Students DESC LIMIT 5
| community_area_name | number_of_students |
|---|---|
| ARCHER HEIGHTS | 2411.500000 |
| MONTCLARE | 1317.000000 |
| WEST ELSDON | 1233.333333 |
| BRIGHTON PARK | 1205.875000 |
| BELMONT CRAGIN | 1198.833333 |
Problem 9
Apply a sub-query to be determined by the Community Area with the lowest value for school Safety Score.
%sql SELECT COMMUNITY_AREA_NAME, SAFETY_SCORE \
FROM CHICAGO_PUBLIC_SCHOOLS \
WHERE SAFETY_SCORE = \
(SELECT MIN(SAFETY_SCORE) FROM CHICAGO_PUBLIC_SCHOOLS)
| community_area_name | safety_score |
|---|---|
| WASHINGTON PARK | 1 |
Problem 10
Calculate the Per Capita Income for a Community Area that has a School Safety Score rated at 1.
%sql SELECT CD.COMMUNITY_AREA_NAME, CD.PER_CAPITA_INCOME \
FROM CENSUS_DATA CD, CHICAGO_PUBLIC_SCHOOLS CPS \
WHERE CD.COMMUNITY_AREA_NUMBER = CPS.COMMUNITY_AREA_NUMBER \
AND CPS.SAFETY_SCORE = 1
| community_area_name | per_capita_income |
|---|---|
| Washington Park | 13785 |
