Home IDS 410

Notes Index

7 SQL

Charles E. Oyibo


Introduction

E. F. Codd, A Relational Model of Data for Large Shared Data Banks. 1970.

The Role of SQL in a Database Architecture

An SQL-based relational database application involves a user interface, a set of tables in the database, and a relational database management system (RDBMS) with an SQL capability.

A relational DBMS (RDBMS) is a data management system that implements a relational data model, one where data are stored in a collection of tables, and the data relationships are represented by common values in related tables, not links.

The original purposes of the SQL standard were:

  1. To specify the syntax and semantics of SQL data definition and manipulation languages
  2. To define the data structures and basic operations for designing, accessing, maintaining, controlling, and protecting and SQL database
  3. To provide a vehicle for portability of database definition and application modules between conforming DBMSs
  4. To specify both minimal (Level 1) and complete (Level 2) standards, which permit different degrees of adoption in products
  5. To provide an initial standard, although incomplete, that will be enhanced later to include specifications for handling such topics as referential integrity, transaction management, user-defined functions, join-operators beyond the equi-join, and national character sets (among others)

The benefits of using a standardized relational language include the following:

The SQL Environment

Catalog: A set of schemas that, when put together, constitute a description of a database.

Schema: That structure which contains descriptions of objects created by a user, such as base tables, views, and constraints, as part of the database.

SQL commands can be classified into three types:

Data Definition Language (DDL): Those commands used to define a database, including creating, altering, and dropping tables, and establishing constraints. [Used in the physical design and maintenance phases of the database development process]

Data Manipulation Language (DML): Those commands used to maintain and query a database, including updating, inserting, modifying, and querying data. [Used in the implementation phase of the database development process]

DML commands may be issued interactively, so that a result is returned immediately following execuation of the statement, or they may be included within programs written in a 3GL language such as C or COBOL. Embedding SQL commands may provide the programmer with more control over timing of report generation, interface appearance, error handling, and database security. This chapter focuses on basic DML commands in interactive (rather than embedded) format.

Data Control Language (DCL): Commands used to control a database, including administering (granting or revoking) priviledges and the committing (saving) of data. [Used in the implementation and maintenance phases of the database development process]

Sample Oracle8 Data Types

Defining a Database in SQL

The basis syntax to create a database is:

CREATE SCHEMA database_name;
AUTHORIZATION owner_userid;

Generating SQL Database Definitions

CREATE SCHEMA
  • defines that portion of a database that a particular user owns
  • dependent on catalog and contain schema objects, including base tables and views, domains, constraints, assertions, character sets, collation, etc.
CREATE TABLE
  • defines a new table and its columns
  • may be base or derived table
  • tables are dependent on schema
CREATE VIEW
  • defines a logical table from one or more tables or views
  • views may not be indexed
  • there are limitations to updating data through views

DROP SCHEMA
DROP TABLE
DROP VIEW

  • reverses CREATE SCHEMA, CREATE TABLE, CREATE VIEW respectively (i.e. deletes schema, table, or view)
  • usually, only the table creator may delete the table

ALTER SCHEMA
ALTER TABLE
ALTER VIEW

  • used to alter the definition of an existing base table by adding, dropping, or changing a column or by dropping a constraint

Other CREATE commands:

CREATE CHARACTER SET  
CREATE COLLATION  
CREATE TRANSLATION  
CREATE ASSERTION  
CREATE DOMAIN  

Creating Tables

See steps to follow when preparing to create a table:

  1. Identify data types for attributes
  2. Identify columns that can and cannot be null
  3. Identify columns that must be unique (candidate keys)
    • Pertinent SQL commands: CONSTRAINT, UNIQUE, PRIMARY KEY, NOT NULL
  4. Identify primary key-foreign key mates
  5. Determine default values
  6. Identify constraints on columns (domain specifications)
  7. Create the table and associated indexes
    • CREATE TABLE, CREATE INDEX

General syntax of the create table statement used in data definition language:

CREATE TABLE tablename
({column definition [table constraint]}.,..
[ON COMMIT {DELETE | PRESERVE} ROWS]);


where column definition ::=
column_name

	{domain name | datatype [(size)]}
	[column_constaint_clause...]
	[default value]
	[collate clause]

and table constraint ::=
	[CONSTRAINT constraint_name]
	Constraint_type [constraint_attributes]

Using and Defining Views

The purpose of a view is to simplify query commands; but a view may also provide valuable data security and significantly enhance programming productivity.

Base table: A table in the relational data model containing the inserted raw data. Base tables correspond to the relations that are identified in the database's conceptual schems.

Dynamic view: A virtual table that is created dynamically upon request by a user. A dynamic view is not a temporary table; rather, its definition is stored in the system catelog and the contents of the view are materialized as a result of an SQL query that uses the view.

Materialized view: Copies or replicas of data based on SQL queries created in the same manner as dynamic views. A materialized view exists as a table and must be synchronized with its associated base tables (refreshed) at appropriate intervals or events.

Consider an invoice that requires data from four tables: CUSTOMER_T, ORDER_T, ORDER_LINE_T, and PRODUCT_T. A (dynamic) view allows us to predefine this association into a single virtual table as part of the database so that if we need an invoice (a view), we do not have to reconstruct the joining of tables. Hence, we can define a view, INVOICE_V, by specifying an SQL query using the SELECT ... FROM ... WHERE construct:

Determine the data elements necessary to create an invoice for a customer. Save this query as a view named INVOICE_V.

CREATE VIEW INVOICE_V AS
  SELECT CUSTOMER_T.CUSTOMER_ID, CUSTOMER_ADDRESS,
         ORDER_T.ORDER_ID

         PRODUCT_T.PRODUCT_ID, ORDERED_QUANTITY,
		      // and other columns
		  FROM CUSTOMER_T, ORDER_T, ORDER_LINE_T, PRODUCT_T
         WHERE CUSTOMER_T.CUSTOMER_ID = ORDER_T.CUSTOMER_ID
           AND ORDER_T.ORDER_ID = ORDER_LINE_T.ORDER_ID
           AND PRODUCT_T.PRODUCT_ID = ORDER_LINE_T.PRODUCT_ID

Now we can use our newly-created view INVOICE_V in a query:

Generate an invoice for order number 1004.

SELECT CUSTOMER_ID, CUSTOMER_ADDRESS, PRODUCT_ID, QUANTITY, 
  // other columns as required
  FROM INVOICE_V
    WHERE ORDER_ID = 1004;

A view may join multiple tables or views together and may contain derived (or virtual) columns:

What is the total value of orders placed for each furniture product?

CREATE VIEW ORDER_TOTALS_V AS
SELECT PRODUCT_ID PRODUCT, SUM(UNIT_PRICE*QUANTITY) TOTAL
  FROM INVOICE_V
    GROUPBY PRODUCT_ID;

Idea to Ponder: "A view should be created for every single base table, even if that view is identical to the base table." Discuss the advantages and shortcomings of this position.See p. 270.

 

List all furniture products that have ever had a UNIT_PRICE over $300.

CREATE VIEW EXPENSIVE_STUFF_V AS
  SELECT PRODUCT_ID, PRODUCT_NAME, UNIT_PRICE
    FROM PRODUCT_T
      WHERE UNIT_PRICE > 300
        WITH CHECK OPTION;

We use the WITH CHECK OPTION clause to prevent a update that will remove a row from the view. (The view is designed to hold items that have had a unit price over $300 at least at one time. If we were to reduce the price of an item in the product table without the WITH CHECK OPTION constraint, that item will be removed from that view--obviously, not what we want.)

In Oracle8, the text of all views are stored in DBA_VIEWS. Users with system pridileges can find access information in DBA_VIEWS:

What information is available about the view named EXPENSIVE_STUFF?

SELECT * FROM DBA_VIEWS WHERE VIEW_NAME = 'EXPENSIVE_STUFF_V';

Materialized Views, like the dynamic views we have been discussing, can also be constructed in different ways for various purposes. Tables may be replicated in whole or in part, joined to create the materialized view, and updated on a predetermined time interval or triggered when the table needs to be accessed. The difference, to summarize, is that dynamic views are virtual views; they do not represent actual tables. Materialized views, on the other hand, are actual tables that are based on base tables.

Creating Data Integrity Controls

Referential Integrity: An integrity constraint specifying that the value (or existence) of an attribute in one relation depends on the value (or existence) of a primary key in the same or another relation (or be NULL). [See Ch5 Definition.]

The SQL command REFERENCES prevents a foreign key value from being added if it is not already a valid value in the referenced primary key column. Problem: If the value of the primary key changes after we refer to it using a foreign key elsewhere, the connection between the two tables is lost.

Solutions:

  1. The ON UPDATE RESTRICT clause asserts that primary key values cannot be changed once they are established, unless no foreign key references that value in any child table.
  2. Pass the change through to the child table(s) by using the ON UPDATE CASCADE option.
  3. Allow the update on the table with the primary key, but change the foreign key value to NULL using the ON UPDATE SET NULL option. This is not a real solution because we loss the connection between the two tables.

Were a customer record to be deleted (rather than updated, as above), ON DELETE RESTRICT, CASCADE, or SET NULL are also available. In addition, there is an ON DELETE SET DEFAULT clause.

Changing Table Definitions

ALTER TABLE CUSTOMER_T
  ADD (TYPE VARCHAR(2));

The ALTER TABLE command may include keywords such as ADD, DROP, or ALTER and allow changing the column's names, datatype, lenght, and constraints.

Removing Tables

To remove a table from a database, the owner of the table may use the DROP TABLE command. Views are dropped using the DROP VIEW COMMAND.

Drop the CUSTOMER_T table from its database schema:

DROP TABLE CUSTOMER_T;

Data Manipulation Language (DML): Inserting, Updatng, and Deleting Data

Four data manipulation language commands are used in SQL:

While the INSERT, UPDATE, and DELETE commands allow modification of the data in the tables, it is the SELECT command that allows one to query the data contained in the tables.

Inserting Data

If entering a value for every column in the table, we use a command like the following. (Note that the data values are ordered in the same order as the columns in the table.

Command: To insert a row of data into a table where value will be inserted for every attribute:

INSERT INTO CUSTOMER_T VALUES
(001, 'Contemporary Casuals', '13 S. Himes St.', 'Chicago', 'IL', 60565);

When data will not be entered into every column in the table, either enter the value NULL for the empty fields or specify those columns to which data are to be added.

To insert a row of data into a table where some attributes will be left null:

INSERT INTO PRODUCT_T (PRODUCT_ID, PRODUCT_NAME, PRODUCT_FINISH, 
UNIT_PRICE, ON_HAND)
  VALUES (1, 'End Table', 'Cherry', 175, 8);

In general, the INSERT command places a new row in a table based on values supplied in the statement, copies of one or more rows derived from other database data into the table, or extracts data from one table and inserts them into another.

Populating a table CA_CUSTOMER_T by using a subset of another table CUSTOMER_T that has the same structure:

 INSERT INTO CA_CUSTOMER_T
   SELECT * FROM CUSTOMER_T
     WHERE STATE = 'CA';

Batch Input

In Oracle: SQL * Loader.

Deleting Database Contents

To delete rows that meet a certain criterion from the customer table:

 DELETE FROM CUSTOMER_T
   WHERE STATE = 'HI';

To delete all rows from customer table:

 DELETE FROM CUSTOMER_T;

SQL will actually eliminate the records selected by a DELETE command. Therefore, always execute a SELECT command first to display the records that would be deleted and visually verify that only the desired rows are included.

Changing (Updating) Database COntents

To modify the unit price and quantity on hand of a record in the PRODUCT table to 775 and 4, respectively:

 UPDATE PRODUCT_T
   SET UNIT_PRICE = 775, On_hand = 4
     WHERE PRODUCT_ID = 7;

The SET command can also change a value to null; the syntax is SET columnname = NULL.

Internal Schema Definition in RDMSs

The internal schema of a relational database can be controlled for processing and storage efficiency. Some techniques used to tune the operational performance of the relational database internal model include:

Creating Indexes

Objective: To provide rapid random and sequential access to base-table data.

Command: To create an alphabetical index on customer name in the CUSTOMER table.

CREATE INDEX NAME_IDX ON CUSTOMER_T (CUSTOMER_NAME);

We can define an index as unique using the syntax: CREATE UNIQUE INDEX ...

Command: To remove the index on the customer name in the CUSTOMER table.

DROP INDEX NAME_INX;

Processing Single Tables

While the UPDATE, INSERT, and DELETE commands allow modification of the data in the tables, the select command, with its various clauses, allow us to query the data contained in the tables and ask many different questions or ad hoc queries.

Although the basic construction of an SQL command is simple, SQL is a powerful tool that enables one to specify complex data analysis processes.

Note: Because the basic syntax is relatively easy to learn, it is also easy to write SELECT queries that are syntactically correct but do not answer the exact question that is intended. There are a few recommendations here: (1) Before running a query against a large production database, always test queries carefully on a small test set of data to be sure that they are returning the correct results. (2) In addition to checking the query results manually, it is often possible to parse queries into smaller parts, examine the results of these simpler queries, and then recombine them.

Clauses of the SELECT Statement

Most SQL data retrieval statements include the following three clauses:

The first two are required; the third is necessary when:

  1. only certain table rows are to be retrieved, or
  2. multiple tables are to be joined

QUERY: Which product have a standard price of less than $275?

SELECT PRODUCT_NAME, STANDARD_PRICE FROM PRODUCT_V
  WHERE STANDARD_PRICE < 275;

Every SELECT statement returns a result table when it executes. Two special key words can be used along with the list of columns to display: DISTINCT and *.

NOTE: The order of the SELECT statement clauses is important. Also, if there is any ambiguity in the SQL command (e.g. if there is a CUSTOMER_ID in both the CUSTOMER_T and ORDER_T tables), we must indicate exactly from which table (or view) the requested data is to come (e.g. CUSTOMER_T.CUSTOMER_ID). Futher yet, when using data created by someone else, we must specify the owner of the table by adding the owner's user ID (e.g. OWNER_ID.CUSTOMER_T.CUSTOMER_ID.)

Query: What is the address of the customer named Home Furnishings? Use an alias, NAME, for customer name.

SELECT CUST.CUSTOMER_NAME AS NAME, CUST.CUSTOMER_ADDRESS
  FROM ownerid.CUSTOMER_V CUST
    WHERE NAME = 'Home Furnishings';

When using the SELECT statement, the columns can be rearranged so that they will be arranged differently in the result tbale than they were in the original table.

Query: List the unit price, product name, and product ID for all products in the PRODUCT table.

 SELECT UNIT_PRICE, PRODUCT_NAME, PRODUCT_ID
   FROM PRODUCT_T;

Using Expressions

Query: What is the total value for each product in inventory?

 SELECT PRODUCT_NAME, UNIT_PRICE, ON_HAND, UNIT_PRICE * ON_HAND AS VALUE
   FROM PRODUCT_T;

Using Functions

Functions in SQL include: COUNT, MIN, MAX, SUM, and AVG.

Query: How many different items were ordered on number 1004?

 SELECT COUNT (*)
   FROM ORDER_LINE_V
     WHERE ORDER_ID = 1004;

Consider the following query:

How many different items were ordered on number 1004? Also, show the Order_ID in the query result.

SELECT ORDER_ID, COUNT (*)
FROM ORDER_LINE_T
WHERE ORDER_ID = 1004;


The above query will generate an error message. The problem is that ORDER_ID returns a value, 1004, for each of the two rows selected, while COUNT returns one aggregate value, 2, for the set of rows with ID = 1004.

 

Query: Alphabetically, what is the first product name in the product table?

 SELECT MIN (PRODUCT_NAME)
   FROM PRODUCT_T;

 

Query: Alphabetically, what is the last product name in the product table?

 SELECT MAX (PRODUCT_NAME)
   FROM PRODUCT_T;

Using Wildcards

Wildcards may be used in the WHERE clause where the exact description of the desired item is unknown.

Query: Find all types of tables carried by Pine Valley Furniture

 SELECT PRODUCT_ID, PRODUCT_NAME
   FROM PRODUCT_T
     WHERE PRODUCT_NAME LIKE '%Table';

Query: Find any products with any number of drawers

SELECT PRODUCT_ID, PRODUCT_NAME
  FROM PRODUCT_T
    WHERE PRODUCT_NAME LIKE '_-Drawer';

Comparison Operators

Comparison operators: =, >, >=, <, <=, <>, and !=

Query: Which order have been placed since 10/24/98?

 SELECT ORDER_ID, ORDER_DATE
   FROM ORDER_V
     WHERE ORDER_DATE > '24-OCT-98';

 

Query: What furniture does Pine Valley carry that isn't made of cherry?

 SELECT PRODUCT_DESCRIPTION, PRODUCT_FINISH
   FROM PRODUCT_T
     WHERE PRODUCT_FINISH != 'Cherry';

Using Boolean Operators

Boolean operators:

If multiple Boolean operators are used in an SQL statement, NOT is evaluated first, then AND, then OR.

Query: List product name, finish, and unit price for all desks and all tables that cost more than $300 in the product table.

 SELECT PRODUCT_NAME, PRODUCT_FINISH, UNIT_PRICE
   FROM PRODUCT_T
     WHERE (PRODUCT_NAME LIKE ‘%Desk’ OR PRODUCT_NAME
   LIKE ‘%Table’) AND UNIT_PRICE > 300;

Notice the parenthesis after WHERE and before AND. We place the parentheses there because we want to override the order of operation (AND before OR) and evaluate the OR expression before the AND expression, otherwise we would get an incorrect result. Convince yourself of this fact.

Ranges

The comparison operators < and > are used to establish a range of values. The keywords BETWEEN and NOT BETWEEN can also be used.

Query: Which products in the PRODUCT_T have a unit price between $200 and $300?

SELECT PRODUCT_NAME, UNIT_PRICE
  FROM PRODUCT_T
    WHERE UNIT_PRICE BETWEEN 200 AND 300;

Or, alternatively:

SELECT PRODUCT_NAME, UNIT_PRICE
  FROM PRODUCT_T
    WHERE UNIT_PRICE > 199 AND UNIT_PRICE < 301;

Distinct

Query: What are the distinct order numbers included in the ORDER_LINE table?

SELECT DISTINCT ORDER_ID
  FROM ORDER_LINE_T;

DISTINCT and its counterpart, ALL, can only be used once in a SELECT statement. It comes after SELECT and before any columns or expressions listed. If a SELECT statement projects more than one column (as with the sample below), only rows that are identical for every column will be eliminated.

Query: What are the distinct order numbers included in the ORDER_LINE table?

SELECT DISTINCT ORDER_ID, ORDERED_QTY
  FROM ORDER_LINE_T;

IN and NOT IN Lists

To match a list of values, consider using IN.

Query: List all customers who live in warmer states

SELECT CUSTOMER_NAME, CITY, STATE
  FROM CUSTOMER_T
    WHERE STATE IN ('FL', 'TX', 'CA', 'HI');

Sorting Results: The ORDER BY Clause

Now we extend the previous example by adding an ORDER BY clause

Query: List customer, city, and state for all customers in the customer table whose address is Florida, Texas, California, or Hawaii. List the customers alphabetically by state, and alphabetically by customer within each state.

SELECT CUSTOMER_NAME, CITY, STATE
  FROM CUSTOMER_T
    WHERE STATE IN ('FL', 'TX', 'CA', 'HI')
      ORDER BY STATE, CUSTOMER_NAME;

In the result for this SQL statement, states are alphabetized first, then customer names. If sorting from high to low, we use the DESC keyword at the column used to sort. Furthermore, SQL-92 stipilates the null values (for the column used to sort) be placed first or last, before or after columns that have values. Where the NULLs will be placed will depend on the SQL implementation.

Categorizing Results: The GROUP BY Clause

GROUP BY is particularly useful when pair with aggregage functions like SUM or COUNT. GROUP BY divides a table into subsets (by groups); then an aggregate function can be used to provide summary information for that group. The single value retruned by our previous aggregate function examples is called a scalar aggregate. When aggregate functions are used in a GROUP BY clause and several values are returned, they are called vector aggregates.

Scalar aggregate: A single value returned from an SQL query that includes an aggregate function.

Vector aggregate: Multiple values returned from an SQL query that includes an aggregate function.

Query: Count the number of customers with addresses in each state to which we ship.

SELECT STATE, COUNT (STATE)
  FROM CUSTOMER_T
    GROUP BY STATE; 

It is also possible to nest groups within groups; the same logic is used as when sorting multiple items.

Query: Count the number of customers with addresses in each city to which we ship. List the city by state.

 SELECT STATE, CITY, COUNT (CITY)
   FROM CUSTOMER_T
     GROUP BY STATE, CITY;

Note that when a GROUP BY clause is included, the columns allowed to be specified in the SELECT clause are limited. Only those columns with a single value for each group can be included. In the query above, each group consists of a city and its state. The SELECT statement includes both the 'city' and 'state' column. This works because each combination of city and state is one value... In general, the column referenced in the SELECT statement must be referenced in the GROUP BY clause, unless the column is an argument for an aggregate function included in the SELECT clause.

Qualifying Results by Categories: The HAVING Clause

The HAVING clause acts like a WHERE clause, but it identifies groups that meet a criterion, rather than rows. Thus, one usually sees a HAVING clause following a GROUP BY clause.

Query: Find only states with more than one customer.

SELECT STATE, COUNT (STATE)
  FROM CUSTOMER_T
    GROUP BY STATE
      HAVING COUNT (STATE) > 1;

This query returns a result with all states with greater than one customer.

NOTE that using WHERE in the previous example would not work because WHERE doesn't allow aggregates; further WHERE qualified ros, whereas HAVING qualifies groups.

To include more than one condition in the HAVING clause, we use AND, OR, and NOT just as in the WHERE clause. In summary, here is one last command that includes all of the six clauses; remember that they must be used in this (strict) order.

Query: List the product finish, average unit price for each finish, and number of products on hand for the following finishes: Cherry, Ash, Maple, and White, where average unit price is less than 750 and the quantity on hand is more than 2.

 SELECT PRODUCT_FINISH, AVG (UNIT_PRICE), SUM (ON_HAND)
   FROM PRODUCT_T
     WHERE PRODUCT_FINISH IN ('Cherry', 'Ash', 'Maple', 'White')
       GROUP BY PRODUCT_FINISH
         HAVING AVG (UNIT_PRICE) < 750 AND SUM (ON_HAND) > 2
           ORDER BY PRODUCT_FINISH;

 

Top of Page

Charles E. Oyibo
IDS :: CBA :: UIC