SQL Guide

Joseph Harwood
5 min readMay 31, 2019

This is a brief guide on SQL that was written using IBM DB2 SQL, but the same principles can be applied to other languages. This will be a useful introduction for SQL novices and those that need to review before an interview.

Table of Contents

  • Joins
  • Subqueries
  • Unions
  • Order By
  • Group By
  • Having
  • Order By
  • Distinct
  • In
  • Between
  • Like
  • Not
  • Common Table Expression
  • Aggregate Functions
  • Scalar Functions
  • Static SQL vs. Dynamic SQL
  • SQL Best Practices

Joins

There are four different forms of Joins in SQL:

  • Inner
  • Left Outer
  • Right Outer
  • Full Outer

An Inner Join combines the rows of the first and second table, while only keeping the rows where the join conditions are true. An Inner Join is implied when just the Join operator is used.

Here are examples of multiple ways to do the same Inner Join:

SELECT COLNAME, COLNOFROM SYSIBM.SYSINDEXES, SYSIBM.SYSKEYSWHERE CREATOR = ‘SCITSP’ AND NAME = ‘XIE5IPP_MSG_SEARCH’AND NAME = IXNAME AND CREATOR = IXCREATOR--------------------------------------------------------------------SELECT COLNAME, COLNOFROM SYSIBM.SYSINDEXES INNER JOIN SYSIBM.SYSKEYSON CREATOR = ‘SCITSP’ AND NAME = ‘XIE5IPP_MSG_SEARCH’AND NAME = IXNAME AND CREATOR = IXCREATOR--------------------------------------------------------------------SELECT COLNAME, COLNOFROM SYSIBM.SYSINDEXES JOIN SYSIBM.SYSKEYSON CREATOR = ‘SCITSP’ AND NAME = ‘XIE5IPP_MSG_SEARCH’AND NAME = IXNAME AND CREATOR = IXCREATOR

A Left Inner Join includes the rows from the first table that were missing from the Inner Join.

SELECT COLNAME, COLNOFROM SYSIBM.SYSINDEXES LEFT INNER JOIN SYSIBM.SYSKEYSON NAME = IXNAME AND CREATOR = IXCREATOR

A Right Outer Join includes the rows from the second table that were missing from the Inner Join.

SELECT COLNAME, COLNOFROM SYSIBM.SYSINDEXES RIGHT OUTER JOIN SYSIBM.SYSKEYSON NAME = IXNAME AND CREATOR = IXCREATOR

A Full Outer Join includes the rows from both tables that were missing from the Inner Join.

SELECT COLNAME, COLNOFROM SYSIBM.SYSINDEXES FULL OUTER JOIN SYSIBM.SYSKEYSON NAME = IXNAME AND CREATOR = IXCREATOR

Subqueries

Subqueries are a query within another query that further restricts the data in the main query. It is useful when selecting criteria that only exist in another table while selecting criteria from a table. Up to 16 subqueries can be specified within a query and there can be subqueries within subqueries.

SELECT COLCOUNT, NAME, DBNAMEFROM SYSIBM.SYSTABLESWHERE COLCOUNT IN ( SELECT MAX(COLCOUNT)FROM SYSIBM.SYSTABLES)

Unions

Unions merge data from two or more tables into a single column on a report that removes duplicates.

SELECT DBNAMEFROM SYSIBM.SYSTABLESWHERE DBNAME = ‘ABRS001O’UNIONSELECT NAMEFROM SYSIBM.SYSDATABASEWHERE NAME =’BLNTSTAT’

Union All merges data from two or more tables and selects all of the rows that can result in duplicates.

SELECT DBNAMEFROM SYSIBM.SYSTABLESWHERE DBNAME = ‘ABRS001O’UNION ALLSELECT NAMEFROM SYSIBM.SYSDATABASEWHERE NAME =’BLNTSTAT’

Order By

Order by sorts records by the specified column(s) in ascending order by default. Adding the DESC keyword at the end will sort the results in descending order.

SELECT COLNO, IXNAMEFROM SYSIBM.SYSKEYSORDER BY COLNO--------------------------------------------------------------------SELECT COLNO, IXNAMEFROM SYSIBM.SYSKEYSORDER BY COLNO DESC

Group By

Group by is used to group the result-set by column(s) that aren’t in the aggregate function.

Group by follows the Where clause and precedes Order by if it is used.

SELECT MAX(TIMESTAMP), DBNAME, TSNAMEFROM SYSIBM.SYSCOPYGROUP BY DBNAME, TSNAME

Group by is used on the column(s) that aren’t in the MAX() function.

Having is used in conjunction with the Group by clause to filter results. It applies a search condition to each group.

SELECT COUNT(NAME), NAMEFROM SYSIBM.SYSTABLESGROUP BY NAMEHAVING COUNT(NAME) > 5

Keywords

Distinct removes duplicate rows in a column from a report.

SELECT DISTINCT DBNAMEFROM SYSIBM.SYSTABLESSELECT DBNAMEFROM SYSIBM.SYSTABLES

In is a replacement for the Or connector.

SELECT NAME, DBNAMEFROM SYSIBM.SYSTABLESWHERE DBNAME IN (‘A7AL701I’, ‘A7AM001I’)

Between is a comparison operator that allows for a range of values within a single predicate. The first value is the low value of the range and the second value is the high value of the range. Values can be numeric, character or date. The two values must be separated by an And condition. The match only considers values inclusively between the two values in the range.

SELECT COLCOUNT, DBNAME, TSNAME, NAMEFROM SYSIBM.SYSTABLESWHERE COLCOUNT BETWEEN 5 AND 20

Like is used when only part of a character value is known. The % sign is used to symbolize any number of characters. The _ character is used to symbolize any single character.

SELECT NAMEFROM SYSIBM.SYSTABLESWHERE NAME LIKE ‘BUS%’

Not makes the condition retrieve the opposite of any condition. It also takes precedence over And/Or.

SELECT COLNO, IXNAMEFROM SYSIBM.SYSKEYSWHERE NOT COLNO > 5ORDER BY COLNO

Common Table Expression

Common Table Expression (CTE) is a form of recursive SQL that makes a temporary result set of a query. It is an alternative to subqueries, views and user-defined functions that is useful for generating summary reports of a query.

WITH CTE AS (SELECT DBNAME, NAME, TOTALROWSFROM SYSIBM.SYSTABLESPACESTATSWHERE DBNAME LIKE ‘SCIT%’)SELECT DBNAME, NAME, MAX(TOTALROWS)FROM CTEGROUP BY DBNAME, NAME

Aggregate Functions

Aggregate Functions receive a set of values for each argument and returns a single result for the input values.

The most common aggregate functions:

  • Average- returns the average of a set of numbers.
  • Count- returns the number of rows or values in a set or rows or values.
  • Min- returns the minimum value in a set of values.
  • Max- returns the maximum value in a set of values.
  • Sum- returns the sum of a set of numbers.
SELECT MAX(TIMESTAMP), DBNAME, TSNAMEFROM SYSIBM.SYSCOPYGROUP BY DBNAME, TSNAME

Aggregate functions need a Group by to group the result-set by column(s) that aren’t in the MAX() function.

Scalar Functions

Scalar Functions are applied to a column or expression and operate on a single value.

Static SQL vs. Dynamic SQL

Static SQL is embedded within an application program in a language such as COBOL. It is compiled and optimized prior to its execution.

Dynamic SQL has a program dynamically allocate memory to receive the query results and is optimized during execution.

SQL Best Practices

  • Don’t use SELECT *
  • Don’t select data you know
  • Try to avoid using NOT, except with EXISTS
  • Be careful with LIKE (%EXAMPLE%)
  • Use built-in functions when available
  • Minimize the number of tables in a JOIN
  • Reduce the number of rows to JOIN
  • Consider NULL in query design

Thanks for reading!

--

--