Advertisement

2-Introduction-to-Relational-Databases-in-SQL

阅读量:

文章目录

    1. Begin with Your Database
    • 1.1 Overview of Relational Databases (video)
    • A Closer Look at Attributes (video)
    • Retrieving Information from Schema using SELECT (video)
    • 1.4 Tables: The Building Blocks of Databases (video)
    • Crafting Your Initial Tables
    • As You Evolve Your Database: Adding Columns with ALTER TABLEs
    • Adjustments: Updating Your Database as the Structure Changes (video)
    • A Guide to Renaming and Dropping Columns in Affiliations (video)
    • Migrating Data Efficiently with INSERT INTO SELECT DISTINCT
    • A Final Step: Eradicating Unnecessary Tables
  • 2. 实施数据一致性管理

    • 2.1 确保高质量数据(视频)
      • 2.2 数据约束类型
      • 2.3 符合数据类型要求
      • 2.4 类型转换(CAST)(视频)
      • 2.5 处理数据类型(视频)
      • 2.6 修改列数据类型(ALTER COLUMN)
      • 2.7 使用函数转换数据类型
      • 2.8 非空与唯一约束(视频)
      • 2.9 防止插入NULL值(SET NOT NULL)
      • 2.10 插入NULL值时会发生什么?
      • 2.11 创建唯一列(ADD CONSTRAINT)
  • 3. 唯一标识记录的关键约束

    • 3.1 主键与外键(视频)
    • 3.2 学习SELECT COUNT DISTINCT语句
    • 3.3 通过SELECT COUNT DISTINCT识别键
    • 3.4 主键(视频)
    • 3.5 认识主键
    • 3.6 在表中添加主键约束
    • 3.7 外设主键(视频)
    • 3.8 添加外设序列号主键
    • 3.9 将列连接到外设序列号主键
    • 3.10 在深入学习前测试你的知识
  • 4. 将表通过外键连接

    • 4.1 建立N对一关系(视频)
      • 4.2 引用包含外键的表
      • 4.3 探索外键约束
      • 4.4 链接相关联的表
      • 4.5 建立更复杂的关系(视频)
      • 4.6 向‘关联’表添加外键
      • 4.7 填充‘教授ID’列
      • 4.8 删除‘firstname’和‘lastname’
      • 4.9 设置引用完整性(视频)
      • 4.10 引用完整性违反的情况
      • 4.11 修改引用完整性行为的方式(视频)
      • 4.12 总结(视频)
      • 4.13 统计每个大学的关联数量
      • 4.14 将所有表连接在一起

1. You First Database

1.1 Introduction to Relational Databases (video)

1.2 Attributes of Relational Databases

1.3 Query Information_Schema with SELECT

It serves as a comprehensive meta-database, designed to store metadata related to your operational database. It includes multiple tables that allow you to access information using the standard SQL SELECT * FROM statement.

  • databases中的表及其详细信息
  • fields:在各个数据库中的各个表格中各字段的详细信息

Within this exercise, the requirement is to retrieve data solely from the 'public' schema that is defined as the column table_schema within both the tables and columns tables. This particular schema encompasses details about user-defined tables and databases. In contrast, other categories of table_schema, such as those related to system metadata, are outside the scope of this module since we are focusing exclusively on user-defined schemas.

Instruction 1 Obtain detailed information about every table name in the current database environment, while ensuring that your query is restricted exclusively to tables associated with the public schema.

复制代码
    -- Query the right table in information_schema
    SELECT table_name 
    FROM information_schema.tables
    -- Specify the correct table_schema value
    WHERE table_schema = 'public';

Please review the columns in university_professors through the process of selecting all entries in information_schema.columns that belong to this table.

复制代码
    -- Query the right table in information_schema to get columns
    SELECT column_name, data_type 
    FROM information_schema.columns 
    WHERE table_name = 'university_professors' AND table_schema = 'public';

Instruction 3: Finally, output the top five records from the university_professors table.

复制代码
    -- Query the first five rows of our table
    SELECT * 
    FROM university_professors 
    LIMIT 5;

1. 4 Tables: At the Core of Every Database (video)

1.5 CREATE Your First Few TABLEs

For your guidance, you are now beginning to develop an improved database structure. As part of this process, you will establish tables specifically for professors and universities. The remaining tables, which cover other essential aspects, will be set up automatically by the system.

The syntax for creating simple tables is as follows:

复制代码
    CREATE TABLE table_name (
     column_a data_type,
     column_b data_type,
     column_c data_type
    );

Attention: 在编写文档或代码时,在引用表格名称、列名以及数据类型时无需以引号括起来。

Instruction 1: Construct a database table named professors that includes two text columns: first name and last name.

复制代码
    -- Create a table for the professors entity type
    CREATE TABLE professors (
     firstname text,
     lastname text
    );
    
    -- Print the contents of this table
    SELECT * 
    FROM professors;

Construct a table named universities that contains three textual columns: short university names, the full university names, and university cities.

复制代码
    -- Create a table for the universities entity type
    CREATE TABLE universities (
     university_shortname text,
     university text,
     university_city text
    );
    
    -- Print the contents of this table
    SELECT * 
    FROM universities;

1.6 ADD a COLUMN with ALTER TABLEs

Regrettably, we overlooked the university_shortname column when populating the professors table. You may have already noticed...

firstname

professors

lastname

university_shortname

Within chapter four of this course, it is necessary for you to have access to a specific section dedicated to linking up the professors' data with university information.

Inserting new columns into existing databases can be a straightforward process, particularly when those databases are currently empty. It is an easy task for database administrators to add new columns, especially when the tables have no existing fields.

To add columns you can use the following SQL query:

复制代码
    ALTER TABLE table_name
    ADD COLUMN column_name data_type;

Update professors to include the university_shortname text column.

复制代码
    -- Add the university_shortname column
    ALTER TABLE professors
    ADD COLUMN university_shortname text;
    
    -- Print the contents of this table
    SELECT * 
    FROM professors;

1.7 Update Your Database as the Structure Changes (video)

1.8 RENAME and DROP COLUMNs in Affiliations

The affiliation table, as discussed in the video, remains vacant. You will be tasked with fixing these issues during this exercise.

You’ll use the following queries:

  • To rename columns:
复制代码
    ALTER TABLE table_name
    RENAME COLUMN old_name TO new_name;
  • To delete columns:
复制代码
    ALTER TABLE table_name
    DROP COLUMN column_name;

Rebrand the organisation column as organization within the affiliations context.

复制代码
    -- Rename the organisation column
    ALTER TABLE affiliations
    RENAME COLUMN organisation TO organization;

Instruction 2:
Delete the university_shortname column in affiliations.

复制代码
    -- Rename the organisation column
    ALTER TABLE affiliations
    RENAME COLUMN organisation TO organization;
    
    -- Delete the university_shortname column
    ALTER TABLE affiliations
    DROP COLUMN university_shortname;

1.9 Migrate Data with INSERT INTO SELECT DISTINCT

Now, it has been a long time since we've had this opportunity to transfer the data into dedicated tables. You will utilize the following approach.

复制代码
    INSERT INTO ... 
    SELECT DISTINCT ... 
    FROM ...;

It can be broken up into two parts:

First part:

复制代码
    SELECT DISTINCT column_name1, column_name2, ... 
    FROM table_a;

This selects all distinct values in table table_a – nothing new for you.

Second part:

复制代码
    INSERT INTO table_b ...;

Append this section to the beginning of the document, which are then inserted into table_b.

最后一个要点:但只有在填满所有空白后,请确保所有代码同时运行。

Instruction 1:

  • Insert a list of unique professors from university_professors into the professors table.
  • Output every record from the professors table.
复制代码
    -- Insert unique professors into the new table
    INSERT INTO professors 
    SELECT DISTINCT firstname, lastname, university_shortname 
    FROM university_professors;
    
    -- Doublecheck the contents of professors
    SELECT * 
    FROM professors;

Instruction 2: Populate every DISTINCT affiliation into the field of affiliations from university_professors.

复制代码
    -- Insert unique affiliations into the new table
    INSERT INTO affiliations 
    SELECT DISTINCT firstname, lastname, function, organization 
    FROM university_professors;
    
    -- Doublecheck the contents of affiliations
    SELECT * 
    FROM affiliations;

1.10 Delete tables with DROP TABLE

The university_professors table is now no longer needed and can be safely removed.

For table deletion, you can use the simple command:

复制代码
    DROP TABLE table_name;

Instruction:
Delete the university_professors table.

复制代码
    -- Delete the university_professors table
    DROP TABLE university_professors;

2. Enforce Data Consistency with Attribute Constrains

2.1 Better Data Quality with Constrains (video)

2.2 Types of Database Constrains

2.3 Conforming with Data Types

I constructed a fictional database table solely for illustrative purposes. The table comprises three columns designated as date, integer, and text respectively.

复制代码
    CREATE TABLE transactions (
     transaction_date date, 
     amount integer,
     fee text
    );

Have a look at the contents of the transactions table.

This field is designed to store transaction dates. Referencing the PostgreSQL documentation, it is clear that date values can be entered using formats such as YYYY-MM-DD or DD/MM/YY.

这两个列 amountfee 出现为数字类型但其实只有第二个一个是被建模为文本类型的——在下一个练习中你将会处理这个问题。

Instruction:

  • Run the provided sample code.
  • Despite it not working, examine the error message and fix the statement appropriately before re-running it again.
复制代码
    -- Let's add a record to the table
    INSERT INTO transactions (transaction_date, amount, fee) 
    VALUES ('2018-09-24', 5454, '30');
    
    -- Doublecheck the contents
    SELECT *
    FROM transactions;

2.4 Types CASTs

In the video, you observed that type conversions represent a potential approach for information issues. If you know that a specific column stores numbers as text, you can convert the column to a numeric form, for example, into integer.

复制代码
    SELECT CAST(some_column AS integer)
    FROM table;

Currently, the some_column variable is temporarily designated as an integer type rather than a text type, which implies that numerical operations can be executed on this column.

Instruction:

Run the provided sample code. When the sample code doesn't function properly, insert an integer type cast at the correct location and re-run it.

复制代码
    -- Calculate the net amount as amount + fee
    SELECT transaction_date, amount + CAST(fee AS integer) AS net_amount  
    FROM transactions;

2.5 Working with Data Types (video)

2.6 Change Types with ALTER COLUMN

The method for modifying the data type of a column is simple. The subsequent code block modifies the data type of the specified \texttt{column\_name} within \texttt{table\_name} to \texttt{varchar(10)}:

复制代码
    ALTER TABLE table_name
    ALTER COLUMN column_name
    TYPE varchar(10)

Now it’s time to start adding constraints to your database.

Instruction 1: Examine all unique university_shortname values in the professors table, noting the length of each string.

复制代码
    -- Select the university_shortname column
    SELECT DISTINCT(university_shortname) 
    FROM professors;

Instruction 2: 定义一个固定长度的字符字段来适配university_shortname的正确长度。

复制代码
    -- Specify the correct fixed-length character type
    ALTER TABLE professors
    ALTER COLUMN university_shortname
    TYPE char(3);

Instruction 3:
Change the type of the firstname column to varchar(64).

复制代码
    -- Change the type of firstname
    ALTER TABLE professors
    ALTER COLUMN firstname
    TYPE varchar(64);

2.7 Convert Types USING a Function

If you prefer not to allocate excessive storage space for a specific varchar column, it is possible to truncate its values prior to altering its data type.

For this, you can use the following syntax:

复制代码
    ALTER TABLE table_name
    ALTER COLUMN column_name
    TYPE varchar(x)
    USING SUBSTRING(column_name FROM 1 FOR x)

One should approach it as follows: Since one aims to allocate merely x characters for each column named $column_name$', one must extract a substring from every value—specifically, its initial $x$ characters—and discard the remainder. In this manner, all values will conform to the required format of varchar(x).

Instruction:

  • Execute the sample code without modification and pay attention to any errors.
  • Proceed by using the SUBSTRING() function to trim firstname down to 16 characters, thereby enabling a change in its data type to varchar(16).
复制代码
    -- Convert the values in firstname to a max. of 16 characters
    ALTER TABLE professors 
    ALTER COLUMN firstname 
    TYPE varchar(16)
    USING SUBSTRING(firstname FROM 1 FOR 16)

2.8 The Not-Null and Unique Constrains (video)

2.9 Disallow NULL values with SET NOT NULL

The professors table has nearly reached completion. Nonetheless, it permits the entry of NULL values. Despite the potential absence of certain information regarding specific professors, there are definitely columns that must be included in the table structure.

Instruction 1:
Add a not-null constraint for the firstname column.

复制代码
    -- Disallow NULL values in firstname
    ALTER TABLE professors 
    ALTER COLUMN firstname SET NOT NULL;

Instruction 2:
Add a not-null constraint for the lastname column.

复制代码
    -- Disallow NULL values in lastname
    ALTER TABLE professors 
    ALTER COLUMN lastname SET NOT NULL;

2.10 What Happens If You Try to Enter NULLs?

Execute the following statement:

复制代码
    INSERT INTO professors (firstname, lastname, university_shortname)
    VALUES (NULL, 'Miller', 'ETH');

Why does this throw an error?

The current statement breaches a non-null constraint that you have just outlined.

2.11 Make Your Columns UNIQUE with ADD CONSTRAINT

After observing a video, you are required to insert the UNIQUE keyword into the column designated as unique. However, this approach is exclusively applicable to newly created tables.

复制代码
    CREATE TABLE table_name (
     column_name UNIQUE
    );

If you desire to impose a distinctiveness requirement on an existing table, you can do so in this manner:

复制代码
    ALTER TABLE table_name
    ADD CONSTRAINT some_name UNIQUE(column_name);

Please ensure it is not the same as the ALTER COLUMN syntax for the not-null constraint. Additionally, it is necessary to assign a name some_name to the constraint.

Create a unique constraint for the university_shortname column within the universities table and name it as university_shortname_unq (UIN_unq).

复制代码
    -- Make universities.university_shortname unique
    ALTER TABLE universities
    ADD CONSTRAINT university_shortname_unq UNIQUE(university_shortname);

Instruction 2:
Add a unique constraint to the organization column in organizations. Give it the name organization_unq.

复制代码
    -- Make organizations.organization unique
    ALTER TABLE organizations
    ADD CONSTRAINT organization_unq UNIQUE(organization);

3. Unique Identify Records with Key Constraints

3.1 Keys and Superkeys (video)

3.2 Get to Know SELECT COUNT DISTINCT

Your database has not yet established any primary keys thus far and you are unaware of which columns or composite primary key candidates would serve well as primary key candidates.

There is a straightforward approach to determine whether a particular column (or group of columns) is composed exclusively of unique values, which can subsequently identify the records within the table.

You are already familiar with the SELECT DISTINCT construct from the first chapter. Now, you should enclose all elements within the COUNT() function, and PostgreSQL will compute and return the total number of distinct rows for those specified columns.

复制代码
    SELECT COUNT(DISTINCT(column_a, column_b, ...))
    FROM table;

Instruction 1:
First, find out the number of rows in universities.

复制代码
    -- Count the number of rows in universities
    SELECT COUNT(*) 
    FROM universities;

Instruction 2:
Then, find out how many unique values there are in the university_city column.

复制代码
    -- Count the number of distinct values in the university_city column
    SELECT COUNT(DISTINCT(university_city)) 
    FROM universities;

3.3 Identify Keys with SELECT COUNT DISTINCT

Among databases, there exists an elementary approach to identify the criteria required for a key in a ready, populated database.

Identify unique record counts across every possible column combination. If the resulting number x matches the total rows when considering that combination, you have found a superkey.

Successively remove columns until removing further columns would cause the number x to decrease. Upon reaching this point, you have identified a potential candidate key.

该表格包含551条记录。它仅有一个候选主键,并且是两个属性的组合。如果您想尝试不同的组合,请使用"Run code"按钮进行操作。一旦找到解决方案,请提交您的答案。

Instruction:

复制代码
    -- Try out different combinations
    SELECT COUNT(DISTINCT(firstname, lastname)) 
    FROM professors;

3.4 Primary Keys (video)

3.5 Identify the Primary Key

Examine the sample table from the earlier video. As a database designer, you must choose wisely which column will be designated as the primary key.

license_no serial_no make model year
Texas ABC-739 A69352 Ford Mustang 2
Florida TVP-347 B43696 Oldsmobile Cutlass 5
New York MPO-22 X83554 Oldsmobile Delta 1
California 432-TFY C43742 Mercedes 190-D 99
California RSK-629 Y82935 Toyota Camry 4
Texas RSK-629 U028365 Jaguar XJS 4

Among the following columns and/or column combinations, which ones can most effectively serve as a primary key?

PK = {license_no}

3.6 ADD Key CONSTRAINTs to the Tables

A couple of tables in your database already have well-fitting candidate keys, each having a single column. The organizations and universities use the organization and university_shortname columns, respectively.

In this exercise, you will rename these columns to id using the RENAME COLUMN command to rename them, and then specify primary key constraints for them. This is just as simple as adding unique constraints (see the last exercise in Chapter 2):

复制代码
    ALTER TABLE table_name
    ADD CONSTRAINT some_name PRIMARY KEY (column_name)

Note that you can also specify more than one column in the brackets.

Instruction 1:

  • The organization column is renamed as id within organizations.
  • Set the id column as a primary key and rename it as organization_pk.
复制代码
    -- Rename the organization column to id
    ALTER TABLE organizations
    RENAME COLUMN organization TO id;
    
    -- Make id a primary key
    ALTER TABLE organizations
    ADD CONSTRAINT organization_pk PRIMARY KEY (id);

Instruction 2:

  • Rename the university_shortname column to id in universities.
  • Make id a primary key and name it university_pk.
复制代码
    -- Rename the university_shortname column to id
    ALTER TABLE universities
    RENAME COLUMN university_shortname TO id;
    
    -- Make id a primary key
    ALTER TABLE universities
    ADD CONSTRAINT university_pk PRIMARY KEY (id);

3.7 Surrogate Keys (video)

3.8 ADD A SERIAL Surrogate Key

There is no singular column candidate key in the professors table (only a composite key candidate comprising firstname and lastname). It is recommended that you will add a new column named id to this table.

This column features a specialized data type serial, which transforms the column into one with auto-incrementing numbers. This implies that whenever a new professor is added to the database, it will automatically receive a unique identifier not previously assigned within the same table. This setup ensures that every new entry receives a unique identifier not previously present in the database: it serves as an ideal primary key!

Create a new field id, specifying the data type as serial, for the professors table.

复制代码
    -- Add the new column to the table
    ALTER TABLE professors 
    ADD COLUMN id serial;

Instruction 2:
Make id a primary key and name it professors_pkey.

复制代码
    -- Make id a primary key
    ALTER TABLE professors 
    ADD CONSTRAINT professors_pkey PRIMARY KEY (id);

Issue a query that retrieves all column names and first 10 records from professors.

复制代码
    -- Have a look at the first 10 rows of professors
    SELECT *
    FROM professors
    LIMIT 10;

3.9 CONCATenate Columns to A Surrogate Key

A method to add a surrogate key to an existing table can be achieved by joining existing columns using the CONCAT() function.

Let’s think of the following example table:

复制代码
    CREATE TABLE cars (
     make varchar(64) NOT NULL,
     model varchar(64) NOT NULL,
     mpg integer NOT NULL
    ):

The table is populated with 10 rows of completely fictional data.

Heavily, the table lacks a primary key. None of its columns consist solely of unique values; this implies that certain columns must be merged to create a composite key.

During the subsequent exercises, you are tasked with merging make and model to create a unified surrogate key.

Calculate the total number of unique row groups that incorporate both the make and model columns.

复制代码
    -- Count the number of distinct rows with columns make, model
    SELECT COUNT(DISTINCT(make, model))
    FROM cars;

Instruction 2:
Add a new column id with the data type varchar(128).

复制代码
    -- Add the id column
    ALTER TABLE cars
    ADD COLUMN id varchar(128);

Instruct step 3: Merge the make and model strings into the id column by employing an UPDATE query on the specified table and utilizing the CONCAT() function.

复制代码
    -- Update id with make + model
    UPDATE cars
    SET id = CONCAT(make, model);

Instruction 4:
Make id a primary key and name it id_pk.

复制代码
    -- Make id a primary key
    ALTER TABLE cars
    ADD CONSTRAINT id_pk PRIMARY KEY(id);
    
    -- Have a look at the table
    SELECT * FROM cars;

3.10 Test Your Knowledge before Advancing

Before moving on to the next chapter, let's review what you have learned so far about attributes and key constraints. If you are uncertain about the answer, please review chapters 2 and 3 respectively.

Let’s think of an entity type “student”. A student has:

  • A family name that is no longer than 128 characters in length, which cannot have any missing values,
  • A Social Security identifier that is exactly 9 digits in length, consisting solely of numeric characters,
  • A phone number that is precisely 12 characters long, composed exclusively of integer digits and other allowable symbols (notably, some students do not possess such a number).

Instruction:

  • Based on the outlined student entity description, construct a table named students that includes appropriate data types.
  • Implement a primary key constraint for the social security number field ssn.

Note that there is no formal length specification for the integer column. the application must ensure it is a valid social security number!

复制代码
    -- Create the table
    CREATE TABLE students (
      last_name varchar(128) NOT NULL,
      ssn integer[9] UNIQUE,
      phone_no char(12)
    );

4. Glue Together Tables with Foreign Keys

4.1 Model 1:N Relationships with Foreign Keys (video)

4.2 REFERENCE A Table with A FOREIGN KEY

If you need the professors table to link to the universities table in your database, you should define a corresponding column in the professors table that establishes a link to a specific column in the universities table.

As just shown in the video, the syntax for that looks like this:

复制代码
    ALTER TABLE a 
    ADD CONSTRAINT a_fkey FOREIGN KEY (b_id) REFERENCES b (id);

Table a must now reference table b through the identifier specified by $b_id$, which uniquely maps to $id$. The foreign key $a_fkey$ traditionally serves as a conventional identifier for establishing relationships between tables.

Typically, when a foreign key references another primary key with an id attribute, it adopts the form x\_id, where x represents the singular name of the referencing table.

The professors dataset now features a renamed field previously known as university_shortname, now referred to as university_id.

复制代码
    -- Rename the university_shortname column
    ALTER TABLE professors
    RENAME COLUMN university_shortname TO university_id;

Instruction 2:

  • Designate this foreign key as professors_fkey.
  • Create a foreign key on the university_id field within the professors table, which references the id column of the universities table.
复制代码
    -- Add a foreign key on professors referencing universities
    ALTER TABLE professors 
    ADD CONSTRAINT professors_fkey FOREIGN KEY (university_id) REFERENCES universities (id);

4.3 Explore Foreign Key Constrains

Primary Key Constraints enable you to enforce organizational structure within your database mini-world. Within your database context, scholars affiliated with institutions from Switzerland must be restricted to ensure compliance with the universities table. Because only universities from Switzerland are included in the universities table.

The foreign key field in the professors table is configured to reference the universities table that you have just created. This setup ensures that only existing universities can be referenced when adding new data. I’d like to verify this functionality.

Instruction:

  • Execute the sample code and examine the error messages for troubleshooting.
    • I'm seeing an issue with your university ID. Please update it to reflect Albert Einstein's dissertation location, which is correctly noted as ETH Zurich.
复制代码
    -- Try to insert a new professor
    INSERT INTO professors (firstname, lastname, university_id)
    VALUES ('Albert', 'Einstein', 'UZH');

4.4 JOIN Tables Linked by A Foreign Key

Let’s join these two tables to analyze the data further!

Some people may have learned the way SQL joins function through the Intro to SQL for Data Science course, which included the final exercise, or through the 'Joining Data' module in PostgreSQL.

Here’s a quick recap on how joins generally work:

复制代码
    SELECT ...
    FROM table_a
    JOIN table_b
    ON ...
    WHERE ...

Though foreign and primary keys aren't mandatory for join operations, they still provide valuable insights into expected results. Take an example: if a record linked from Table A is guaranteed to exist in Table B, then a join operation from Table A will reliably find corresponding data in Table B. If such a connection doesn't hold true, the foreign key constraint would be violated.

Instruction:

  • Perform an inner join between professors and universities where professors' university_id matches universities' id, equivalent to retaining all records where the foreign key of professors equals the primary key of universities.
    • Filter the dataset to include only entries where university_city is set to 'Zurich.'
复制代码
    -- Select all professors working for universities in the city of Zurich
    SELECT professors.lastname, universities.id, universities.university_city
    FROM professors
    JOIN universities
    ON professors.university_id = universities.id
    WHERE universities.university_city = 'Zurich';

4.5 Model More Complex Relationships (video)

4.6 Add Foreign Keys to the “Affiliations” Table

Currently, this table is structured with fields including firstname, lastname, function, and organization. As shown in the preview below (on or near), At present time

You will be redesigning the affiliations table in place, specifically by not requiring the creation of a temporary table for storing intermediate data.

Create a professor_id field with integer data type within the affiliations table, designating it as a foreign key that references the ID column in the professors table.

复制代码
    -- Add a professor_id column
    ALTER TABLE affiliations
    ADD COLUMN professor_id integer REFERENCES professors (id);

Rename the _instruction_2_ section in the _affiliations_ table as _instruction_id_.

复制代码
    -- Rename the organization column to organization_id
    ALTER TABLE affiliations
    RENAME organization TO organization_id;

The system requires implementing a directive statement to add a foreign key constraint on the _organization_id_ field such that it ensures referencing to the _id_ column within the _organizations_ table.

复制代码
    ALTER TABLE affiliations
    ADD CONSTRAINT affiliations_organization_fkey FOREIGN KEY (organization_id) REFERENCES organizations (id);

4.7 Populate the “professor_id” Column

Currently, it is necessary to additionally populate the professors_id. The process will involve obtaining the ID directly from the professors table.

Here’s a way to update columns of a table based on values in another table:

复制代码
    UPDATE table_a
    SET column_to_update = table_b.column_to_update_from
    FROM table_b
    WHERE condition1 AND condition2 AND ...;

This query does the following:

  1. For each row in table_a, find the corresponding row in table_b where condition1, condition2, etc., are met.
  2. Set the value of column_to_update to the value of column_to_update_from (from that corresponding row).
    The conditions usually compare other columns of both tables, e.g. table_a.some_column = table_b.some_column. Of course, this query only makes sense if there is only one matching row in table_b.

Initially, examine the current status of the affiliations by retrieving 10 rows and all columns.

复制代码
    -- Have a look at the 10 first rows of affiliations
    SELECT *
    FROM affiliations
    LIMIT 10;

_Assign a new value to the professor_id column, specifically using the matching identifier from the id column within the professors table. This refers to rows within the professors table where both first name and last name exactly match those found in affiliations.

复制代码
    -- Set professor_id to professors.id where firstname, lastname correspond to rows in professors
    UPDATE affiliations
    SET professor_id = professors.id
    FROM professors
    WHERE affiliations.firstname = professors.firstname AND affiliations.lastname = professors.lastname;

Review the initial ten rows of every column in the affiliations table again. Have the professor IDs been correctly matched?

复制代码
    -- Have a look at the 10 first rows of affiliations again
    SELECT *
    FROM affiliations
    LIMIT 10;

4.8 Drop “firstname” and “lastname”

The fields representing the person's first and last names from the affiliations table were utilized to create a connection during the previous exercise, allowing appropriate professor IDs to be transferred. This was made possible because each row in the affiliations corresponds uniquely to one professor. In essence: {firstname, lastname} serves as a candidate key for the professors table, representing a unique combination of columns.

It isn't achieved through affiliations though, because as mentioned in the video, professors may have multiple affiliations.

Since professors are now being referenced using professor_id, the firstname and lastname columns have become unnecessary. Therefore, it's appropriate to remove these columns. Moreover, one key objective of databases is to minimize redundancy in various scenarios.

Remove the firstname and lastname fields from the affiliations table.

复制代码
    -- Drop the firstname column
    ALTER TABLE affiliations
    DROP COLUMN firstname;
    
    -- Drop the lastname column
    ALTER TABLE affiliations
    DROP COLUMN lastname

4.9 Referential Integrity (video)

4.10 Referential Integrity Violations

4.11 Change the Referential Integrity Behavior of A Key

So far, you implemented three foreign key constraints:

professors.university_id to universities.id
affiliations.organization_id to organizations.id
affiliations.professor_id to professors.id
These foreign keys currently have the behavior ON DELETE NO ACTION. Here, you’re going to change that behavior for the column referencing organizations from affiliations. If an organization is deleted, all its affiliations (by any professor) should also be deleted.

Modify altering a key constraint won't work when using ALTER COLUMN. Instead, it's necessary to remove the existing primary key constraint before creating a new one with altered ON DELETE behavior.

To delete constraints, though, you must have knowledge of their names. The necessary information regarding constraint deletion is also contained within $information_schema$.

It is advisable to examine the existing foreign key constraints by querying table_constraints within the information_schema database.

复制代码
    -- Identify the correct constraint name
    SELECT constraint_name, table_name, constraint_type
    FROM information_schema.table_constraints
    WHERE constraint_type = 'FOREIGN KEY';

Remove the affiliations_organization_id_fkey foreign key constraint from within the affiliations table.

复制代码
    -- Drop the right foreign key constraint
    ALTER TABLE affiliations
    DROP CONSTRAINT affiliations_organization_id_fkey;

Create a new foreign key for the affiliations table, which will cascade deletions when an associated record is deleted from the organizations table. Specify the foreign key as affiliations_organization_id_fkey.

复制代码
    -- Add a new foreign key constraint from affiliations to organizations which cascades deletion
    ALTER TABLE affiliations
    ADD CONSTRAINT affiliations_organization_id_fkey FOREIGN KEY (organization_id) REFERENCES organizations (id) ON DELETE CASCADE;

Verify that the deletion cascade functions properly by executing DELETE and SELECT statements.

复制代码
    -- Delete an organization 
    DELETE FROM organizations 
    WHERE id = 'CUREM';

4.12 Roundup (video)

4.13Count Affiliations Per University

Once your data has been prepared for analysis, let's execute illustrative SQL statements against the database. You will be utilizing previously learned techniques, including grouping by columns and joining tables.

In this exercise, you will identify which university boasts the highest number of affiliations based on its faculty. To accomplish this, you must have access to both the affiliations and professors tables. Notably, the latter table also includes information about university_id.

As a quick repetition, remember that joins have the following structure:

复制代码
    SELECT table_a.column1, table_a.column2, table_b.column1, ... 
    FROM table_a
    JOIN table_b 
    ON table_a.column = table_b.column

This combines $table_a$ and $table_b$, but only for rows where $table_a.column$ matches $table_b.column$.

Instruction:

  • Calculate each university's total affiliation count.
    • Sort the results based on this count in descending order.
复制代码
    -- Count the total number of affiliations per university
    SELECT COUNT(*), professors.university_id 
    FROM affiliations
    JOIN professors
    ON affiliations.professor_id = professors.id
    -- Group by the ids of professors
    GROUP BY professors.university_id 
    ORDER BY count DESC;

4.14 Join All the Table Together

The final exercise session requires you to locate The academic city of the leading professor, which is in The highest number of affiliations within The sector of Media & Communication.

To achieve this goal, you must merge every table together, group them by any specific column, and then apply selection criteria to retrieve only those rows that belong to the correct sector.

Combine all the tables within this database, beginning with $affiliations$, $professors$, $organizations$, and $universities; then examine the resulting data set.

复制代码
    -- Join all tables
    SELECT *
    FROM affiliations
    JOIN professors
    ON affiliations.professor_id = professors.id
    JOIN organizations
    ON affiliations.organization_id = organizations.id
    JOIN universities
    ON professors.university_id = universities.id;

Instruction 2:

  • Proceed to group the result by organization sector, professor, and university city.
    • Count the resulting count of records.
复制代码
    SELECT COUNT(*), organizations.organization_sector, 
    professors.id, universities.university_city
    FROM affiliations
    JOIN professors
    ON affiliations.professor_id = professors.id
    JOIN organizations
    ON affiliations.organization_id = organizations.id
    JOIN universities
    ON professors.university_id = universities.id
    GROUP BY organizations.organization_sector, 
    professors.id, universities.university_city;

Only retain those rows where the organization sector is "Media & communication", and sort the table by count in descending order.

复制代码
    SELECT COUNT(*), organizations.organization_sector, 
    professors.id, universities.university_city
    FROM affiliations
    JOIN professors
    ON affiliations.professor_id = professors.id
    JOIN organizations
    ON affiliations.organization_id = organizations.id
    JOIN universities
    ON professors.university_id = universities.id
    WHERE organizations.organization_sector = 'Media & communication'
    GROUP BY organizations.organization_sector, 
    professors.id, universities.university_city
    ORDER BY count DESC;

全部评论 (0)

还没有任何评论哟~