Advertisement

Coursera | Databases and SQL for Data Science(IBM) | Assignment

阅读量:

其他资源:
Coursera课程|数据库与SQL数据分析(IBM)|练习题解答

Coursera | Databases and SQL for Data Science | Assignment

  • 课程作业
    • 理解数据集
      • 芝加哥的社会经济指标
        • 芝加哥公立学校
        • 芝加哥犯罪数据

以下是对输入文本的同义改写版本

以下是对输入文本的同义改写版本

以上为改写后的文本内容

这个assignment也不算太难。除了最初的数据导入步骤外,其余的部分与quiz类似

!!!:
这里是在向您介绍如何导入本地数据库。具体来说,请按照教程设置好所需表结构及凭证信息,并确保所有设置均为本地环境参数!无需过多说明的是,默认情况下直接运行该操作应该就能顺利完成。请注意,在此过程中请确保所有设置均为本地环境参数!后续SQL语句的操作逻辑保持不变!😊

Course Assignment

Using this Python notebook you will:

  1. Grasp the details of three datasets from Chicago
  2. Import three datasets into three tables across a Db2 database
  3. Run SQL queries to address assignment-related queries

Understand the datasets

To finish the assignment issues in this notebook, you will access three data sources located on the City of Chicago's Data Portal platform.

  1. 数据集:社会经济指标在芝加哥
  2. 数据集:芝加哥公立学校进展报告卡(2011年)
  3. 数据集:芝加哥犯罪数据(从2001年至今)

Socioeconomic Indicators in Chicago

This dataset includes a collection of six socioeconomic metrics of public health importance in each Chicago community area from 2008 to 2012.

For this exercise, you are to be using a snapshot of this dataset, which can be accessed via:

A detailed description of this dataset and its original data can be retrieved from the Chicago Data Portal located at: https://data.cityofchicago.org/Health-Human-Services/Census-Data-Selected-socioeconomic-indicators-in-C/kn9c-c2s2

Chicago Public Schools

This dataset includes all school-level performance data that were utilized to generate CPS School Report Cards for the 2011-2012 academic year. The dataset is made available by the Chicago Data Portal.

In this assignment, you will utilize a sample of this dataset, which is available for download from: Chicago Public School

The primary and foundational datasets, along with their comprehensive overview, are accessible through the Chicago Data Portal at https://data.cityofchicago.org/Education/Chicago-Public-Schools-Progress-Report-Cards-2011-/9xs2-f89t?cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-DB0201EN-SkillsNetwork-20127838&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ.

Chicago Crime Data

该数据集反映了自2001年以来在芝加哥发生的犯罪事件报告(除谋杀案件外,在每一起受害者的资料中存在的情况下)),不包括最近七天的数据。

This dataset is comprehensively extensive, exceeding 1.5 gigabytes in size and containing over 6.5 million entries. For the purposes of this assignment, we will utilize a significantly reduced sample from this dataset, which can be downloaded from: Chicago Crime Data.

A comprehensive overview of the specific data collection and its original source can be accessed via the Chicago Data Portal at: https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2

Download the datasets

In numerous instances, where data analysis is required, datasets are frequently accessible in .CSV format, particularly online. By clicking on the links provided below, you can conveniently access and store these .CSV files for further processing.

CENSUS_DATA: Census Dataset

CHICAGO_PUBLIC SCHOOL SYSTEM High-Performing Public Schools in Chicago

The Chicago Crime Data Repository provides access to the comprehensive dataset detailing crime statistics and incidents across the city of Chicago. (https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DB0201EN-SkillsNetwork/labs/FinalModule_Coursera/data/Chicago_Crime_Data-v2.csv)

NOTE: 请注意,请确保您使用上方提供的链接下载数据集而非直接从芝加哥数据港下载。这些链接中的版本是原始数据集的子集,并且对某些列名进行了修改以使其更适合数据库环境,这将使完成此任务更容易。

Store the datasets in database tables

To analyze the data using SQL, it first needs to be stored in the database.

While importing the dataset into a Pandas DataFrame and subsequently writing back to the database, as demonstrated in our Week 3 Lab exercise, this approach leads to mapping data types that might not be ideal for efficient SQL operations. For instance, a lengthy textual field could end up being stored as a CLOB rather than a VARCHAR.

Therefore, it is strongly advised to perform a manual load of the table using the database console interface, as detailed in Week 2 Lab 1 Part II. The primary variation from this lab compared to Week 2 Lab 1 Part II lies in Step 5 of the instructions, where you are required to click on create "(+) New Table" and specify the desired table name before proceeding.

启动Db2控制台,并导入LOAD工具,随后拖拽包含第一组数据的CSV文件。然后创建一个新的数据表,请按照屏幕上的操作步骤完成数据加载。请为新创建的表命名为以下名称:

  1. CENSUS_DATA
  2. CHICAGO_PUBLIC_SCHOOLS
  3. CHICAGO_CRIME_DATA

Connect to the database

First, we need to install or enable the SQL extension. Then, we should create a session with the database.

复制代码
    %load_ext sql
复制代码
    The sql extension is already loaded. To reload it, use:
      %reload_ext sql

Next, input your DB2 connection string into the following cell. Remember that you previously set up Service Credentials for your DB2 instance during Week 3’s first lab. Refer to these credentials’ username field and enter everything after db2:// (excluding any trailing double quotes) into the cell below, following ibm_db_sa:// format.

复制代码
    # Remember the connection string is of the format:
    # %sql ibm_db_sa://my-username:my-password@my-hostname:my-port/my-db-name
    # Enter the connection string for your Db2 on Cloud database instance below
    %sql ibm_db_sa://wls55462:2ls9wcg2s5l7n%4008@dashdb-txn-sbox-yp-dal09-04.services.dal.bluemix.net:50000/BLUDB
复制代码
    'Connected: wls55462@BLUDB'

Problems

Now write and execute SQL queries to solve assignment problems

Problem 1

Find the total number of crimes recorded in the CRIME table

复制代码
    %sql select COUNT(*) as TOTAL_CRIMES from CHICAGO_CRIME_DATA
total_crimes
533

Problem 2

Retrieve first 10 rows from the CRIME table

复制代码
    %sql SELECT * FROM CHICAGO_CRIME_DATA LIMIT 10
id case_number DATE block iucr primary_type description location_description arrest domestic beat district ward community_area_number fbicode x_coordinate y_coordinate YEAR updatedon latitude longitude location
3512276 HK587712 08/28/2004 05:50:56 PM 047XX S KEDZIE AVE 890 THEFT FROM BUILDING SMALL RETAIL STORE FALSE FALSE 911 9 14 58 6 1155838 1873050 2004 02/10/2018 03:50:01 PM 41.80744050 -87.70395585 (41.8074405, -87.703955849)
3406613 HK456306 06/26/2004 12:40:00 PM 009XX N CENTRAL PARK AVE 820 THEFT $500 AND UNDER OTHER FALSE FALSE 1112 11 27 23 6 1152206 1906127 2004 02/28/2018 03:56:25 PM 41.89827996 -87.71640551 (41.898279962, -87.716405505)
8002131 HT233595 04/04/2011 05:45:00 AM 043XX S WABASH AVE 820 THEFT $500 AND UNDER NURSING HOME/RETIREMENT HOME FALSE FALSE 221 2 3 38 6 1177436 1876313 2011 02/10/2018 03:50:01 PM 41.81593313 -87.62464213 (41.815933131, -87.624642127)
7903289 HT133522 12/30/2010 04:30:00 PM 083XX S KINGSTON AVE 840 THEFT FINANCIAL ID THEFT: OVER $300 RESIDENCE FALSE FALSE 423 4 7 46 6 1194622 1850125 2010 02/10/2018 03:50:01 PM 41.74366532 -87.56246276 (41.743665322, -87.562462756)
10402076 HZ138551 02/02/2016 07:30:00 PM 033XX W 66TH ST 820 THEFT $500 AND UNDER ALLEY FALSE FALSE 831 8 15 66 6 1155240 1860661 2016 02/10/2018 03:50:01 PM 41.77345530 -87.70648047 (41.773455295, -87.706480471)
7732712 HS540106 09/29/2010 07:59:00 AM 006XX W CHICAGO AVE 810 THEFT OVER $500 PARKING LOT/GARAGE(NON.RESID.) FALSE FALSE 1323 12 27 24 6 1171668 1905607 2010 02/10/2018 03:50:01 PM 41.89644677 -87.64493868 (41.896446772, -87.644938678)
10769475 HZ534771 11/30/2016 01:15:00 AM 050XX N KEDZIE AVE 810 THEFT OVER $500 STREET FALSE FALSE 1713 17 33 14 6 1154133 1933314 2016 02/10/2018 03:50:01 PM 41.97284491 -87.70860008 (41.972844913, -87.708600079)
4494340 HL793243 12/16/2005 04:45:00 PM 005XX E PERSHING RD 860 THEFT RETAIL THEFT GROCERY FOOD STORE TRUE FALSE 213 2 3 38 6 1180448 1879234 2005 02/28/2018 03:56:25 PM 41.82387989 -87.61350386 (41.823879885, -87.613503857)
3778925 HL149610 01/28/2005 05:00:00 PM 100XX S WASHTENAW AVE 810 THEFT OVER $500 STREET FALSE FALSE 2211 22 19 72 6 1160129 1838040 2005 02/28/2018 03:56:25 PM 41.71128051 -87.68917910 (41.711280513, -87.689179097)
3324217 HK361551 05/13/2004 02:15:00 PM 033XX W BELMONT AVE 820 THEFT $500 AND UNDER SMALL RETAIL STORE FALSE FALSE 1733 17 35 21 6 1153590 1921084 2004 02/28/2018 03:56:25 PM 41.93929582 -87.71092344 (41.939295821, -87.710923442)

Problem 3

How many crimes involve an arrest?

复制代码
    %sql SELECT COUNT(*) AS NUMBER_OF_ARRESTS \
    FROM CHICAGO_CRIME_DATA \
    WHERE ARREST = "TRUE"
number_of_arrests
163

Problem 4

What are the distinct kinds of crimes that have been uniquely recorded at the GAS STATION sites?

复制代码
    %sql SELECT DISTINCT PRIMARY_TYPE \
    FROM CHICAGO_CRIME_DATA \
    WHERE LOCATION_DESCRIPTION = 'GAS STATION'
primary_type
CRIMINAL TRESPASS
NARCOTICS
ROBBERY
THEFT

Hint: Which column lists types of crimes e.g. THEFT?

Problem 5

Enumerate all Community Areas in the CENUS_DATA table where their names begin with the letter ‘B’.

复制代码
    %sql SELECT COMMUNITY_AREA_NAME \
    FROM CENSUS_DATA \
    WHERE COMMUNITY_AREA_NAME LIKE 'B%'
community_area_name
Belmont Cragin
Burnside
Brighton Park
Bridgeport
Beverly

Problem 6

Which schools in Community Areas 10 to 15 are healthy school certified?

复制代码
    %sql SELECT NAME_OF_SCHOOL \
    FROM CHICAGO_PUBLIC_SCHOOLS \
    WHERE COMMUNITY_AREA_NUMBER BETWEEN 10 AND 15 \
    AND HEALTHY_SCHOOL_CERTIFIED = 'Yes'
name_of_school
Rufus M Hitch Elementary School

Problem 7

What is the average school Safety Score?

复制代码
    %sql SELECT AVG(SAFETY_SCORE) AS AVERAGE \
    FROM CHICAGO_PUBLIC_SCHOOLS
average
49.504873

Problem 8

The top five Community Areas are ranked by their average College Enrollment numbers.

复制代码
    %sql SELECT COMMUNITY_AREA_NAME, AVG(COLLEGE_ENROLLMENT) as Number_of_Students\
    FROM CHICAGO_PUBLIC_SCHOOLS \
    GROUP BY COMMUNITY_AREA_NAME \
    ORDER BY Number_of_Students DESC LIMIT 5
community_area_name number_of_students
ARCHER HEIGHTS 2411.500000
MONTCLARE 1317.000000
WEST ELSDON 1233.333333
BRIGHTON PARK 1205.875000
BELMONT CRAGIN 1198.833333

Problem 9

Apply a sub-query to be determined by the Community Area with the lowest value for school Safety Score.

复制代码
    %sql SELECT COMMUNITY_AREA_NAME, SAFETY_SCORE \
    FROM CHICAGO_PUBLIC_SCHOOLS \
    WHERE SAFETY_SCORE = \
    (SELECT MIN(SAFETY_SCORE) FROM CHICAGO_PUBLIC_SCHOOLS)
community_area_name safety_score
WASHINGTON PARK 1

Problem 10

Calculate the Per Capita Income for a Community Area that has a School Safety Score rated at 1.

复制代码
    %sql SELECT CD.COMMUNITY_AREA_NAME, CD.PER_CAPITA_INCOME \
    FROM CENSUS_DATA CD, CHICAGO_PUBLIC_SCHOOLS CPS \
    WHERE CD.COMMUNITY_AREA_NUMBER = CPS.COMMUNITY_AREA_NUMBER \
    AND CPS.SAFETY_SCORE = 1
community_area_name per_capita_income
Washington Park 13785

全部评论 (0)

还没有任何评论哟~