This article is used in my book "Programming with databases".
Better to see once than to hear a hundred times. This small article explains what are the different kinds of joins in SQL.
At the first time you should take into account that SQL is a declarative 4th generation language. So the best way to use it (especially for beginners) is to write the query and let SQL optimizer do its job. That means thinking in set-oriented terms and avoiding sub-queries.
Let's take two example tables: Companies and Contacts.
CREATE TABLE companies ( company_id INT NOT NULL, company_name VARCHAR(64) NOT NULL, phone VARCHAR(16) NULL, CONSTRAINT pk_companies PRIMARY KEY (company_id) ); CREATE TABLE contacts ( contact_id INT NOT NULL, contact_name VARCHAR(64), phone VARCHAR(16) NULL, company_id INT NULL, CONSTRAINT pk_contacts PRIMARY KEY (contact_id), CONSTRAINT fk_contact_company FOREIGN KEY (company_id) REFERENCES companies(company_id) );
Fill these tables by following data:
INSERT INTO companies VALUES (1, 'AT&T', null); INSERT INTO companies VALUES (2, 'Microsoft', '322-223'); INSERT INTO contacts VALUES (1, 'Bjarne Stroustrup', null, 1); INSERT INTO contacts VALUES (2, 'Niklaus Wirth', '322-223', null); INSERT INTO contacts VALUES (3, 'Bill Gates', '322-223', 2);
Here are the tables with data:
Companies
company_id | company_name | phone |
---|---|---|
1 | AT&T | NULL |
2 | Microsoft | 322-223 |
Contacts
contact_id | contact_name | phone | company_id |
---|---|---|---|
1 | Bjarne Stroustrup | NULL | 1 |
2 | Niklaus Wirth | 322-223 | NULL |
3 | Bill Gates | 322-223 | 2 |
Now it's time to write queries.
INNER JOIN
SELECT contact_name, company_name FROM contacts INNER JOIN companies ON contacts.company_id = companies.company_id ORDER BY contact_name
contact_name | company_name |
---|---|
Bjarne Stroustrup | AT&T |
Bill Gates | Microsoft |
LEFT OUTER JOIN
SELECT contact_name, company_name FROM contacts LEFT OUTER JOIN companies ON contacts.company_id = companies.company_id ORDER BY contact_name
contact_name | company_name |
---|---|
Bjarne Stroustrup | AT&T |
Niklaus Wirth | NULL |
Bill Gates | Microsoft |
RIGHT OUTER JOIN
In this example tables are joint by a non-key attribute. In relational model you don't have to think of the physical link existence.
SELECT contact_name, company_name FROM contacts RIGHT OUTER JOIN companies ON contacts.phone = companies.phone ORDER BY contact_name
contact_name | company_name |
---|---|
NULL | AT&T |
Niklaus Wirth | Microsoft |
Bill Gates | Microsoft |
Let's write the "left outer join" too because it's required for the next example.
SELECT contact_name, company_name FROM contacts LEFT OUTER JOIN companies ON contacts.phone = companies.phone ORDER BY contact_name
contact_name | company_name |
---|---|
Bjarne Stroustrup | NULL |
Niklaus Wirth | Microsoft |
Bill Gates | Microsoft |
FULL JOIN
SELECT contact_name, company_name FROM contacts FULL OUTER JOIN companies ON contacts.phone = companies.phone ORDER BY contact_name
As you can see, FULL JOIN
is the union of results returned by LEFT
and RIGHT
outer joins.
contact_name | company_name |
---|---|
NULL | AT&T |
Bjarne Stroustrup | NULL |
Niklaus Wirth | Microsoft |
Bill Gates | Microsoft |
CROSS JOIN is the Cartesian product
CROSS JOIN
is also called "Cartesian product".
SELECT contact_name, company_name FROM contacts CROSS JOIN companies ORDER BY contact_name
contact_name | company_name |
---|---|
Bjarne Stroustrup | AT&T |
Bjarne Stroustrup | Microsoft |
Niklaus Wirth | AT&T |
Niklaus Wirth | Microsoft |
Bill Gates | AT&T |
Bill Gates | Microsoft |
CROSS APPLY
The CROSS APPLY
is also called "lateral join". Let's add some data to see the difference.
INSERT INTO companies VALUES (3, 'Google', '333-222'); INSERT INTO contacts VALUES (4, 'Anders Hejlsberg', NULL, 2);
Now, if you run the following query, the result will be the same as for INNER JOIN
.
-- inner join SELECT company_name, contact_name FROM companies INNER JOIN contacts ON contacts.company_id = companies.company_id ORDER BY contact_name; -- lateral join SELECT company_name, contact_name FROM companies CROSS APPLY(SELECT * FROM contacts WHERE contacts.company_id = companies.company_id ) contacts ORDER BY contact_name
company_name contact_name ------------- ------------------ Microsoft Anders Hejlsberg Microsoft Bill Gates AT&T Bjarne Stroustrup
So what the difference?
There are several scenario whether the CROSS APPLY
is useful.
Join with a table function
Many DBMS allow to write user defined functions returning a dataset (table). For example, in SQL Server you may create following function which returns contacts for the given company.
CREATE FUNCTION dbo.company_get_contacts(@company_id int) RETURNS TABLE AS RETURN ( SELECT contact_id, contact_name FROM contacts WHERE company_id = @company_id )
If you want to join companies with this function, you need to provide the value of the company_id
column as an argument for @company_id
parameters. Without CROSS APPLY the only SELECT section where you could call this function.
SELECT company_id, dbo.company_get_contacts(company_id) AS contact FROM companies
However, the company_get_contacts()
function returns the table, not the scalar value so the query is completely wrong!
The CROSS APPLY
resolve the problem.
SELECT company_name, contact_name FROM companies CROSS APPLY dbo.company_get_contacts(companies.company_id) contacts ORDER BY contact_name
The result is still the same but we encapsulated some logic in the function that may be changed with a minimum modifications in the queries. For example, you may decide to filter and return only some contacts. In this case you have to modify only the function, the query stay unchanged.
Join with a first row only
In some scenarios (usually in OLAP) you may need to show only the first matched row. For example, show the companies that have at least one contact having a blank phone number.
SELECT company_name, contact_name FROM companies CROSS APPLY(SELECT TOP 1 contact_name FROM contacts WHERE contacts.company_id = companies.company_id AND contacts.phone IS NULL ) contacts ORDER BY contact_name
Result
company_name contact_name ------------- ------------------ Microsoft Anders Hejlsberg AT&T Bjarne Stroustrup
OUTER APPLY
Compared with CROSS APPLY
, the OUTER APPLY
has the same meaning that OUTER JOIN
compared with INNER JOIN
.
If you run again the query returning contacts having blank phone numbers but using OUTER APPLY instead, all companies will be selected regardless whether it has linked contacts or not.
SELECT company_name, contact_name FROM companies OUTER APPLY(SELECT TOP 1 contact_name FROM contacts WHERE contacts.company_id = companies.company_id AND contacts.phone IS NULL ) contacts ORDER BY contact_name
Result
company_name contact_name ------------- ------------------ Google NULL Microsoft Anders Hejlsberg AT&T Bjarne Stroustrup
Have a nice SQL!
It's a revised and completed translation of my article published in April 2006 in "PC World (Russian edition)", N°12, 2007
Comments
Firstly thank you for
Submitted by Raju (not verified) on
Firstly thank you for providing the data to insert into tables which is a tedious job. great article.