Add new comment

Types of joins in SQL (cheatsheet for beginners)

This article is used in my book "Programming with databases".

Better to see once than to hear a hundred times. This small article explains what are the different kinds of joins in SQL.

At the first time you should take into account that SQL is a declarative 4th generation language. So the best way to use it (especially for beginners) is to write the query and let SQL optimizer do its job. That means thinking in set-oriented terms and avoiding sub-queries.

Let's take two example tables: Companies and Contacts.

CREATE TABLE companies (
  company_id INT NOT NULL,
  company_name VARCHAR(64) NOT NULL,
  phone VARCHAR(16) NULL,
  CONSTRAINT pk_companies PRIMARY KEY (company_id)
);
 
CREATE TABLE contacts (
  contact_id INT NOT NULL,
  contact_name VARCHAR(64),
  phone VARCHAR(16) NULL,
  company_id INT NULL,
  CONSTRAINT pk_contacts PRIMARY KEY (contact_id),
  CONSTRAINT fk_contact_company
    FOREIGN KEY (company_id)
    REFERENCES companies(company_id)
);

Fill these tables by following data:

INSERT INTO companies VALUES (1, 'AT&T', null);
INSERT INTO companies VALUES (2, 'Microsoft', '322-223');
INSERT INTO contacts VALUES (1, 'Bjarne Stroustrup', null, 1);
INSERT INTO contacts VALUES (2, 'Niklaus Wirth', '322-223', null);
INSERT INTO contacts VALUES (3, 'Bill Gates', '322-223', 2);

Here are the tables with data:

Companies

company_id company_name phone
1 AT&T NULL
2 Microsoft 322-223

Contacts

contact_id contact_name phone company_id
1 Bjarne Stroustrup NULL 1
2 Niklaus Wirth 322-223 NULL
3 Bill Gates 322-223 2

Now it's time to write queries.

INNER JOIN

SELECT contact_name, company_name
FROM contacts 
     INNER JOIN companies
     ON contacts.company_id = companies.company_id
ORDER BY contact_name

contact_name company_name
Bjarne Stroustrup AT&T
Bill Gates Microsoft

LEFT OUTER JOIN

SELECT contact_name, company_name
FROM contacts 
     LEFT OUTER JOIN companies
     ON contacts.company_id = companies.company_id
ORDER BY contact_name

contact_name company_name
Bjarne Stroustrup AT&T
Niklaus Wirth NULL
Bill Gates Microsoft

RIGHT OUTER JOIN

In this example tables are joint by a non-key attribute. In relational model you don't have to think of the physical link existence.

SELECT contact_name, company_name
FROM contacts 
     RIGHT OUTER JOIN companies
     ON contacts.phone = companies.phone
ORDER BY contact_name

contact_name company_name
NULL AT&T
Niklaus Wirth Microsoft
Bill Gates Microsoft

Let's write the "left outer join" too because it's required for the next example.

SELECT contact_name, company_name
FROM contacts 
     LEFT OUTER JOIN companies
     ON contacts.phone = companies.phone
ORDER BY contact_name

contact_name company_name
Bjarne Stroustrup NULL
Niklaus Wirth Microsoft
Bill Gates Microsoft

FULL JOIN

SELECT contact_name, company_name
FROM contacts 
     FULL OUTER JOIN companies
     ON contacts.phone = companies.phone
ORDER BY contact_name

As you can see, FULL JOIN is the union of results returned by LEFT and RIGHT outer joins.

contact_name company_name
NULL AT&T
Bjarne Stroustrup NULL
Niklaus Wirth Microsoft
Bill Gates Microsoft

CROSS JOIN is the Cartesian product

CROSS JOIN is also called "Cartesian product".

SELECT contact_name, company_name
FROM contacts 
     CROSS JOIN companies
ORDER BY contact_name

contact_name company_name
Bjarne Stroustrup AT&T
Bjarne Stroustrup Microsoft
Niklaus Wirth AT&T
Niklaus Wirth Microsoft
Bill Gates AT&T
Bill Gates Microsoft

CROSS APPLY

The CROSS APPLY is also called "lateral join". Let's add some data to see the difference.

INSERT INTO companies VALUES (3, 'Google', '333-222');
INSERT INTO contacts VALUES (4, 'Anders Hejlsberg', NULL, 2);

Now, if you run the following query, the result will be the same as for INNER JOIN.

-- inner join 
SELECT company_name, contact_name
FROM companies 
     INNER JOIN contacts
     ON contacts.company_id = companies.company_id
ORDER BY contact_name;
-- lateral join
SELECT company_name, contact_name
FROM companies
     CROSS APPLY(SELECT * FROM contacts 
                 WHERE contacts.company_id = companies.company_id
     ) contacts
ORDER BY contact_name

company_name  contact_name      
------------- ------------------
Microsoft     Anders Hejlsberg
Microsoft     Bill Gates
AT&T          Bjarne Stroustrup

So what the difference?

There are several scenario whether the CROSS APPLY is useful.

Join with a table function

Many DBMS allow to write user defined functions returning a dataset (table). For example, in SQL Server you may create following function which returns contacts for the given company.

CREATE FUNCTION dbo.company_get_contacts(@company_id int)
RETURNS TABLE
AS RETURN (
   SELECT contact_id, contact_name 
   FROM contacts
   WHERE company_id = @company_id
)

If you want to join companies with this function, you need to provide the value of the company_id column as an argument for @company_id parameters. Without CROSS APPLY the only SELECT section where you could call this function.

SELECT company_id,
       dbo.company_get_contacts(company_id) AS contact
FROM companies

However, the company_get_contacts() function returns the table, not the scalar value so the query is completely wrong!

The CROSS APPLY resolve the problem.

SELECT company_name, contact_name
FROM companies
     CROSS APPLY dbo.company_get_contacts(companies.company_id) contacts
ORDER BY contact_name

The result is still the same but we encapsulated some logic in the function that may be changed with a minimum modifications in the queries. For example, you may decide to filter and return only some contacts. In this case you have to modify only the function, the query stay unchanged.

Join with a first row only

In some scenarios (usually in OLAP) you may need to show only the first matched row. For example, show the companies that have at least one contact having a blank phone number.

SELECT company_name, contact_name
FROM companies
     CROSS APPLY(SELECT TOP 1 contact_name FROM contacts 
                 WHERE contacts.company_id = companies.company_id
                       AND contacts.phone IS NULL
     ) contacts
ORDER BY contact_name

Result

company_name  contact_name      
------------- ------------------
Microsoft     Anders Hejlsberg
AT&T          Bjarne Stroustrup

OUTER APPLY

Compared with CROSS APPLY, the OUTER APPLY has the same meaning that OUTER JOIN compared with INNER JOIN.

If you run again the query returning contacts having blank phone numbers but using OUTER APPLY instead, all companies will be selected regardless whether it has linked contacts or not.

SELECT company_name, contact_name
FROM companies
     OUTER APPLY(SELECT TOP 1 contact_name FROM contacts 
                 WHERE contacts.company_id = companies.company_id
                       AND contacts.phone IS NULL
     ) contacts
ORDER BY contact_name

Result

company_name  contact_name      
------------- ------------------
Google        NULL
Microsoft     Anders Hejlsberg
AT&T          Bjarne Stroustrup

Have a nice SQL!

It's a revised and completed translation of my article published in April 2006 in "PC World (Russian edition)", N°12, 2007