Monday, June 11, 2012

SQL Server CROSS APPLY and OUTER APPLY

For more detailed Information please Click Hare
ProblemSQL Server 2005 introduced the APPLY operator, which is very much like a join clause and which allows joining between two table expressions i.e. joining a left/outer table expression with a right/inner table expression. The difference between join and APPLY operator becomes evident when you have a table-valued expression on the right side and you want this table-valued expression to be evaluated for each row from the left table expression. In this tip I am going to demonstrate what APPLY operator is, how it differs from regular JOINs and what are few of its applications.
SolutionThe APPLY operator allows you to join two table expressions; the right table expression is processed every time for each row from the left table expression. As you might have guessed, the left table expression is evaluated first and then right table expression is evaluated against each row of the left table expression for final result-set. The final result-set contains all the selected columns from the left table expression followed by all the columns of right table expression.
The APPLY operator comes in two variants, the CROSS APPLY and the OUTER APPLY. The CROSS APPLY operator returns only those rows from left table expression (in its final output) if it matches with right table expression. In other words, the right table expression returns rows for left table expression match only. Whereas the OUTER APPLY operator returns all the rows from left table expression irrespective of its match with the right table expression. For those rows for which there are no corresponding matches in right table expression, it contains NULL values in columns of right table expression. So you might now conclude, the CROSS APPLY is semantically equivalent to INNER JOIN (or to be more precise its like a CROSS JOIN with a correlated sub-query) with a implicit join condition of 1=1 whereas OUTER APPLY is semantically equivalent to LEFT OUTER JOIN.

You might be wondering if the same can be achieved with regular JOIN clause then why and when to use APPLY operator? Though the same can be achieved with normal JOIN, the need of APPLY arises if you have table-valued expression on right part and also in some cases use of APPLY operator boost the performance of your query. Let me explain you with help of some examples.

Script #1 creates a Department table to hold information about departments. Then it creates an Employee table which hold information about the employees. Please note, each employee belongs to a department, hence the Employee table has referential integrity with the Department table.
USE [tempdb]
GO
IF EXISTS (SELECT * FROM sys.objects WHERE OBJECT_ID = OBJECT_ID(N'[Employee]') AND type IN (N'U'
)) BEGIN
DROP TABLE
[Employee] END GO IF EXISTS (SELECT * FROM sys.objects WHERE OBJECT_ID = OBJECT_ID(N'[Department]') AND type IN (N'U')) BEGIN
DROP TABLE
[Department] END
CREATE TABLE
[Department]
(
[DepartmentID] [int] NOT NULL PRIMARY KEY
,
[Name] VARCHAR(250
) NOT NULL,
)
ON
[PRIMARY] INSERT [Department] ([DepartmentID], [Name]) VALUES (1, N'Engineering') INSERT [Department] ([DepartmentID], [Name]) VALUES (2, N'Administration') INSERT [Department] ([DepartmentID], [Name]) VALUES (3, N'Sales') INSERT [Department] ([DepartmentID], [Name]) VALUES (4, N'Marketing') INSERT [Department] ([DepartmentID], [Name]) VALUES (5, N'Finance') GO CREATE TABLE [Employee](
[EmployeeID] [int] NOT NULL PRIMARY KEY
,
[FirstName] VARCHAR(250
) NOT NULL,
[LastName] VARCHAR(250
) NOT NULL,
[DepartmentID] [int] NOT NULL REFERENCES [Department](DepartmentID
),
)
ON
[PRIMARY]
GO
INSERT [Employee] ([EmployeeID], [FirstName], [LastName], [DepartmentID]
) VALUES (1, N'Orlando', N'Gee', 1 ) INSERT [Employee] ([EmployeeID], [FirstName], [LastName], [DepartmentID]) VALUES (2, N'Keith', N'Harris', 2 ) INSERT [Employee] ([EmployeeID], [FirstName], [LastName], [DepartmentID]) VALUES (3, N'Donna', N'Carreras', 3 ) INSERT [Employee] ([EmployeeID], [FirstName], [LastName], [DepartmentID]) VALUES (4, N'Janet', N'Gates', 3
)

Department and employee table information


First query in Script #2 selects data from Department table and uses CROSS APPLY to evaluate the Employee table for each record of the Department table. Second query simply joins the Department table with the Employee table and all the matching records are produced.
SELECT * FROM Department D CROSS APPLY
(
SELECT * FROM
Employee E
WHERE E.DepartmentID = D
.DepartmentID
)
A
GO
SELECT * FROM
Department D INNER JOIN Employee E ON D.DepartmentID =
E.DepartmentID 


If you look at the results they produced, it is the exact same result-set; not only that even the execution plan for these queries are similar to each other and has equal query cost, as you can see in the image below. So what is the use of APPLY operator? How does it differ from a JOIN and how does it help in writing more efficient queries. I will discuss this later, but first let me show you an example of OUTER APPLY also.

The first query in Script #3 selects data from Department table and uses OUTER APPLY to evaluate the Employee table for each record of the Department table. For those rows for which there is not a match in Employee table, those rows contains NULL values as you can see in case of row 5 and 6. The second query simply uses a LEFT OUTER JOIN between the Department table and the Employee table. As expected the query returns all rows from Department table; even for those rows for which there is no match in the Employee table.

SELECT * FROM Department D OUTER APPLY
(
SELECT * FROM
Employee E
WHERE E.DepartmentID =
D.DepartmentID
)
A
GO
SELECT * FROM
Department D LEFT OUTER JOIN Employee E ON D.DepartmentID =
E.DepartmentID 


Even though the above two queries return the same information, the execution plan is a bit different. Although cost wise there is not much difference, the query with the OUTER APPLY uses a Compute Scalar operator (which has an estimated operator cost of 0.0000103 or almost 0% of total query cost) before Nested Loops operator to evaluate and produce the columns of Employee table.

Now comes the time to see where the APPLY operator is really required. In Script #4, I am creating a table-valued function which accepts DepartmentID as its parameter and returns all the employees who belong to this department. The next query selects data from Department table and uses CROSS APPLY to join with the function we created. It passes the DepartmentID for each row from the outer table expression (in our case Department table) and evaluates the function for each row similar to a correlated subquery. The next query uses the OUTER APPLY in place of CROSS APPLY and hence unlike CROSS APPLY which returned only correlated data, the OUTER APPLY returns non-correlated data as well, placing NULLs into the missing columns. 

IF EXISTS (SELECT * FROM sys.objects WHERE OBJECT_ID = OBJECT_ID(N'[fn_GetAllEmployeeOfADepartment]') AND type IN (N'IF')) BEGIN
DROP FUNCTION
dbo.
fn_GetAllEmployeeOfADepartment END GO CREATE FUNCTION dbo.fn_GetAllEmployeeOfADepartment(@DeptID AS INT) RETURNS TABLE
AS
RETURN
(
SELECT * FROM
Employee E
WHERE E.DepartmentID =
@DeptID
) GO SELECT * FROM Department D CROSS APPLY dbo.fn_GetAllEmployeeOfADepartment(D.DepartmentID) GO SELECT * FROM Department D OUTER APPLY dbo.fn_GetAllEmployeeOfADepartment(D.DepartmentID)
GO

Now the output is excately same as above figure.
So now if you are wondering, can we use a simple join in place of the above queries? Then the answer is NO, if you replace CROSS/OUTER APPLY in the above queries with INNER JOIN/LEFT OUTER JOIN, specify ON clause (something as 1=1) and run the query, you will get "The multi-part identifier "D.DepartmentID" could not be bound." error. This is because with JOINs the execution context of outer query is different from the execution context of the function (or a derived table), and you can not bind a value/variable from the outer query to the function as a parameter. Hence the APPLY operator is required for such queries.

No comments: