Tuesday, August 19, 2008

Hidden Power of UNION

I used to think that UNION was useful only when you needed to combine the results of two queries from different sources (e.g., tables, views) into one result set. However, using UNION is sometimes the quickest way to select from just one table.

Suppose you need to retrieve all OrderIDs from the Northwind Orders table where the CustomerID is VICTE or the EmployeeID is 5:

SELECT OrderID FROM Orders WHERE CustomerID = 'VICTE' OR EmployeeID = 5

The Orders table has indexes on the CustomerID and EmployeeID columns, but SQL Server doesn't use them to execute the SELECT statement, as the following execution plan shows:

|--Clustered Index Scan(OBJECT: ([Northwind].[dbo].[Orders].[PK_Orders] ), WHERE:([Orders].[CustomerID]='VICTE' OR [Orders].[EmployeeID]=5))

Instead, SQL Server scans the clustered index PK_Orders. SQL Server uses this plan because the OR operator in the query's WHERE clause makes the result satisfy both conditions at the same time, so SQL Server must double-scan the table. Scanning a clustered index in this case is almost the same as scanning the entire table; the server goes though the table record by record and checks for the specified predicate.

To improve the query's efficiency, you could create a composite index on CustomerID and EmployeeID:

CREATE INDEX IdxOrders001 ON Orders (CustomerID, EmployeeID)

The Index Tuning Wizard advises you to create just such an index to improve performance of the SELECT statement. SQL Server can use the new index to find all the records in which EmployeeID = 5, then scan only the resulting range of records to return the required result. The estimated execution plan for the new query shows that SQL Server uses the composite index:

|--Index Scan(OBJECT:([Northwind]    .[dbo].[Orders].[IdxOrders001]),    WHERE:([Orders].[CustomerID]='VICTE' OR    [Orders].[EmployeeID]=5))

But an employee can make thousands of deals with thousands of customers. So even if you create a composite index, SQL Server will need to scan a range of records in the index and won't perform a seek operation, which uses indexes to retrieve records and is the fastest way for SQL Server to find information.

You need to make SQL Server use a seek operation instead of a scan, but SQL Server won't perform a seek when an OR operator is in the WHERE clause. You can solve this dilemma by using UNION to rewrite the SELECT statement:

SELECT OrderID FROM Orders WHERE CustomerID = 'VICTE' UNION SELECT OrderID FROM Orders WHERE EmployeeID = 5

SQL Server 2000's estimated execution plan for this statement is

|--Merge Join(Union)     |--Index Seek(OBJECT:([Northwind]        .[dbo].[Orders].[CustomersOrders]),        SEEK:([Orders].[CustomerID]='VICTE'        ) ORDERED FORWARD)     |--Index Seek(OBJECT:([Northwind]        .[dbo].[Orders].[EmployeesOrders]),        SEEK:([Orders].[EmployeeID]=5)        ORDERED FORWARD)

This execution plan looks longer than the original one, but both operators are index-seek operators. SQL Server doesn't use the composite index; instead, it uses two single-column indexes. You might think that two seek operations would cost more than one seek, but performance improves when you use this method. You can check performance by using SQL Trace to analyze the three versions of the SELECT statement. For the UNION query, my SQL Server 2000 system performed four reads to return the result. The first SELECT query required 23 reads, and the second SELECT statement, which created the composite index, required 11 reads.

This special use of UNION can help you avoid the OR operator's slow performance, but use it carefully. If you don't have the appropriate indexes (CustomersOrders and EmployeesOrders, in this example), you can double-scan the table.

Sharpen Your Basic SQL Server Skills: Database backup demystified

Q: What are the types of backup methods in SQL Server?
A: SQL Server has four backup methods: full, transaction log or incremental, differential, and file or file group. You should always perform a full backup, in which you make a complete copy of the database. The transaction log or incremental backup method copies all the modifications made to a database. A differential backup makes a copy of all the changes in a database since the last full backup. The file or file group backup method copies the actual database files on disk.

A full backup quickly restores a database to its original state and is the baseline for the other types of backups. A full backup contains all of a database’s data, structures, and security objects, plus the transaction log, which you can use to restore any changes made to the database while it’s being backed up. A database restored by the full backup method doesn’t contain any uncommitted transactions (transactions which have begun but to which the administrator hasn’t explicitly committed in the database either by turning on auto commit or by using the COMMIT statement). The following code sample shows a full backup:

BACKUP DATABASE AdventureWorks
TO DISK = ‘C:full_bk_AdventureWorks.bak’

The transaction log or incremental backup helps recover a database to its latest working condition. To use this method, you need to set the database to full recovery or bulk recovery mode. To restore a database to a point in time, you must have an unbroken chain of transaction log backups. You need to first restore the database from a full backup before you can restore from a transaction log backup. Finally, you must restore each transaction log backup in the order in which it was taken. The following code sample shows a transaction log backup:

BACKUP LOG AdventureWorks
TO DISK = ‘C:log_bk_AdventureWorks.trn’

Performing a differential backup saves time and media space compared with performing a full backup. If the database is set to full-recovery mode, restoring the latest differential backup restores the database state to the time the last differential backup was completed. A differential backup can’t be restored independently; it can be restored only after a full backup is restored. The following code sample shows a differential backup:

BACKUP DATABASE AdventureWorks
TO DISK = ‘C:diff_bk_AdventureWorks.dif’
WITH DIFFERENTIAL

The file or file group backup method is an alternative way to perform a full backup. This method lets you quickly restore a database to working condition. However, you can perform a file or file group backup only under certain conditions. To use this method, the database must have been created with multiple files or file groups. The following code sample shows a file backup:

BACKUP DATABASE AdventureWorks FILE =
‘AdventureWorks_data’
TO DISK = ‘C:file_bk_AdventureWorks.
data’

Q: What are the recovery models in SQL Server?
A: SQL Server has three recovery models: full, bulk-logged, and simple.

In the full recovery model, all changes to the database are logged, so the database can be restored to the point of failure with full backup and log files. Use the full recovery model for essential databases that are updated frequently. The following code sample shows a full recovery:

ALTER DATABASE AdventureWorks
SET RECOVERY FULL

In the bulk-logged recovery model, all changes to the database except high-speed bulk insert operations are logged. (Don’t let the name confuse you—this method doesn’t actually log bulk processes.) Database performance isn’t compromised when bulk operations are in process. The following code sample shows a bulk-logged recovery:

ALTER DATABASE AdventureWorks
SET RECOVERY BULK_LOGGED

The simple recovery model lets you restore a database to the point of its last full backup. A database that isn’t updated frequently is a candidate for the simple recovery model. The following code sample shows a simple recovery:

ALTER DATABASE AdventureWorks
SET RECOVERY SIMPLE

Q: Does the RESTORE HEADERONLY command change or restore the header of a BACKUP file?
A: This command doesn’t change anything in the header. The clause simply returns all the backup header information for all the backup sets on a particular backup device. The following code sample illustrates using RESTORE HEADERONLY:

RESTORE HEADERONLY
FROM DISK = ‘C:file_bk_
AdventureWorks.data’
WITH NOUNLOAD;

Q: What’s a tail-log backup, and when do you use it?
A: The tail-log backup requirement first occurs in SQL Server 2005 and applies to all subsequent versions. When you recover a database using the full or bulk-logged recovery model, before you restore it, you might find that some transaction log entries don’t yet have backups. Backing up this set of transaction log entries is known as a tail-log backup. SQL Server 2005 requires a tail-log backup operation on a database before it’s restored, to ensure that no data is accidentally lost. The following code sample shows a tail-log backup:

BACKUP LOG AdventureWorks
TO DISK = ‘C:taillog_bk_
AdventureWorks.trn’
WITH NORECOVERY

For more SQL Server tips, check out the blog at sqlmag.com/go/SQLskills.

All the notes to learn SQL Server. includes my learnings too..