Thursday, February 28, 2008

[DRAFT]

Planning SQL Course



This list helps students to plan their SQL self-training. Introduction and good external links are also available here for each topic .


"adv:" means advanced sub-topics



database operations (create, alter, delete, drop, truncate, select, insert ...........)


Basic Topics

Database objects::::::::::::
0) database basics

introduction to various db objects: database objects (Database, Tables , Views Stored procedures, Triggers, Functions, Constraints (PK, FK, UK, DF, NULL...), Indexes, Statistics...)

DDL DML

Variables (declaring, assigning value, scope)
Constants http://www.sqlservercentral.com/articles/T-SQL/61244/
Data types
1) Tables (create, alter, drop, insert, update, delete, truncate operations)
identity property basics
adv: identity property advanced topics (various related functions, the behaviour of identity column on delete, truncate, insert, update operations)
adv: identity_insert
) select statement
from clause
where clause
order by clause
Aliases
Group by clause
Having clause
Operators used in where clause:
AND
OR
In
Between
=, <, >, <>
IS, NOT, EXISTS,
IN
) Insert statement
with list of values
with results from a select statement
adv: complex insert having joins
) Update statement
simple update
updating multiple columns in a single query
adv: complex insert having joins
) Delete

simple delete

adv: complex delete having joins
) Truncate
truncate vs delete
truncate vs delete vs drop


6) Constraints (PK, FK, UK, DF, NULL.......)
types of constraints

significance of constraints while operating on parent-child tables.

8) system functions and exercises on system functions (to test if the student is able to pick the right function for the right purpose)

String Functions

Date Functions

Aggrigate Functions

Exercises on system functions

* print the statements like this (given the DOB as input and Name)
My Birthday is on <date>

My Birthday falls on <week day>

My Birthday is on <day>th of <month>

This year my birthday is on <week day>

My age is <age>

My first name is (using sub string)

My last name is (using sub string)

My middle name is (using sub string)

Length of my name is <length>


9) Sub-queries

exercises on Aggregate Functions: running total and a running count using sub-queries

10) JOINS (self, cross, inner, left outer, right outer)

Various types of joins and their applications

Alternative queries to various types of joins

JOIN vs IN vs EXISTS: http://mrdee.blogspot.com/2007/05/sql-server-join-vs-in-vs-exists-logical.html


2) Views
create view
adv: various operations on view and their effect on the underlying tables
adv: schema binding
adv: cascading property

adv: system views

Exercises on System Views:

1) write a query to list all tables in the database which are having column name starting with 'emp'
2) write a query to list all tables and their columns in the database which are having int datatype column

3) Write a SP that generates the insert

3) Stored procedures

4) Triggers
trigger basics

types of events which can be triggered

virtual tables

Types of triggers

disabling and enabling a trigger

adv: trigger settings (db level and server level)

5) Functions
basic system functions (string functions, aggregate functions, ...)

functions basics

Write simple functions

* to add 2 values

* to find the reminder of the division

* to find the quotient of the division

Write moderate complex functions

* to return 'N' Perfect Numbers



Intermediate topics:

) Basics topics that are marked as "Adv:"

) user defined datatypes
) Union, Union ALL
) dynamic SQL
) Select Into vs Insert into
) Database (create, drop ... operations)

Assignment: design database for telephone bill loading system (example use my reliance mobile bill)

Question on Database Design: How can we keep track of inserts and updates of records in the database... like who inserted the record, who updated the record, when is this record inserted, when is this record
) cross db queries
) Corelated subqueries
) delete vs drop vs truncate

) DISTINCT vs GROUP BY

* www.sqlmag.com/Article/ArticleID/24282/sql_server_24282.html
* http://blog.sqlauthority.com/2007/03/29/sql-server-difference-between-distinct-and-group-by-distinct-vs-group-by/

) Admin: Backup & Resore

) Error handling exercises:
modify the following previous code to include error handling
generate Prime numbers
generate Perfect numbers
generate maths tables


Advanced topics:
1) Indexes

2) Performance tuning
3) Statistics
4) cascaded views and other advanced topics related to views
5) linked server

6) COLLATE


Monday, February 25, 2008

SQL Optimization Methods

January 3rd, 2007

SQL optimization is a transformation of SQL statements, which produces exactly the same result, as an original query. This process requires a lot of creativity and intuition; it is more an Art than a Science.

It is impossible to teach it in a small article, so I don’t even try to do it. Instead, I want to limit myself by classifying types of optimization. So, when you are asked to optimize a database, you need to know, what to ask and what to expect.

The result, produced by the optimized code must be exactly the same as it was before: the second worst nightmare of a DBA (the first one is losing data) is someone running to him crying “you know, after you last optimization all customer balances became negative!” So I will focus on Potential danger on every level.

For that reason all types of optimization are sorted in order of increasing intrusiveness, and I begin from the non-intrusive optimization.

0. Non-intrusive optimization

It is upgrading SQL server hardware and changing of SQL server parameters. Adding more memory or processor power wont harm, you don’t need to test your product again.

Upgrading hardware you can spend a lot without any effect, because it is possible that your server is suffering from Locks, while you will be adding more memory – for vain. There is an easy way to determine, what (in general) is the most serious problem on the server: CPU power, IO, or Locks. To do it, use Lakeside Upgrade Advisor.

You can accelerate your IO by putting LOG and DATA on different devices, separating indexes from data, but unfortunately, it is more and more difficult to apply these old good recommendations in the modern world. In many companies ‘a server’ is built based on the same specification no matter if it is a file server or a database server. One big RAID-5 drive is almost standard for the small servers. If you ask them for separate drives for LOG and DATA, they will give you 2 separate LUNs on the same RAID-5 drive and on the same physical channel, which is absolutely useless.

For the bigger servers more and more companies use big disk arrays like SAN EMC – these arrays are very good for the OLAP systems, but are bad for the OLTP.

Changing SQL server parameters also relatively safe, because you don’t change the code. Unfortunately, there is no parameter ‘TURBO=TRUE’, so you can play with these parameters, but in most cases SQL server self-tuning is quite good, and effects of tuning can be beyond measurement.

For the given hardware, applying all the recommendations you can find in the Internet, you can gain 20-30%, even less if there is nothing really wrong with you server. So, if your queries are running several times slower then expected – then you don’t have other choice but change your TSQL code.

Potential danger: None

1. Indexes

On this level of optimization, we change the database by creating and changing indexes, but we don’t touch the code.

Of course, any DBA is very conservative about changing the code, especially on the production. In some cases it is possible to make some modifications which are not really considered as ‘changes’ by the conventional programming.

At first, all you can do with indexes, can affect the execution plan, but the result is guaranteed to be the same (unless there are bugs in the SQL server, but Microsoft spent much more time on testing it then it is spent on any query you work with, so below we can think about SQL server as a ‘bug-free’ product, which is not true, but is almost true in comparison with the quality of our code)

You can create, drop indexes, change index types (clustered/non-clustered), include or exclude columns from indexes. On SQL 2005, check a new index option ‘INCLUDING’. Don’t try to index everything – you will slow down updates. If SQL server does not use an index, there may be a very good reason behind that decision.

Potential danger: Is it absolutely safe? For most of the databases, it is safe. SQL server will automatically adjust plans based on new indexes. However, if your database uses not only DML but also a DDL, and it is generating tables and indexes on the fly, then it is possible that some

exec('drop index '+@indname)

would fail. So be careful.

2. Hints and options

To use a hint you should modify an existing SQL code. Hence, it is a modification, and theoretically, you need to retest all the system. But with an assumption that SQL server is almost bug-free, a query with any hint produces exactly the same result as without a hint.

So check the execution plan and check the following:

(index=…) – but be careful, usually, when SQL server ignores an index, there is a very good reason why.
OPTION (HASH JOIN) – try also other types of joins.
OPTION (FORCE ORDER)
OPTION (FAST 1) – these 3 options above are good candidates to try even without thinking, sometimes it helps.
(NOEXPAND) for indexed views
• Option WITH RECOMPILE for a stored procedure
and some others…

But if it does not help, there is no other choice then change the code

Potential danger:
• Some modifications can have a catastrophic impact on the performance.
• Some options can result in error type

‘Query processor could not produce a query plan because of the hints defined in this query.’

• Some queries can fail on some data sets, check

OPTION (ROBUST PLAN)

for details

3. Syntax-level optimization

The most primitive type of the optimization is based only on the text of a query. We don’t know even the underlying database structure, so we are limited by the subset of syntax transformations, which do not change the result in any case. For example:

select 'Low',A,B from T
union
select 'High',A1,B1 from T1

In this case we merge both datasets together and get rid of the duplicates. However, there are no duplicates because the first column is different for 2 sub-queries. Hence we can rewrite it as:

select 'Low',A,B from T
union all
select 'High',A1,B1 from T1

Another example.

update T set Flag='Y'
update T set Ref=0

In both queries we affect all the records, so we can change both columns at the same time:

update T set Flag='Y',Ref=0

We can merge together updates in even more complicated cases:

update T set Flag='Y'
update T set Ref=0 where Comment like '%copied%'

Both updates use a full table scan, so lets do both updates during the same pass:

update T set Flag='Y',
Ref = case when Comment like '%copied%'
then 0
else Ref -- keep the same
end

A Trap:
Such type of optimization can be dangerous. For example,

update T set Ref=0
update T set Amount=StoredValue

It is almost the same case as we had before. So we can rewrite it as

update T set Ref=0,Amount=StoredValue

Right? Yes, this transformation is correct… in 99% of cases, but not always. The possible problem is that StoredValue can be a calculated column, which depends on Ref. For example, lets say it is defined as StoredValue=Ref+1

The original statement sets Ref to 0, and Amount to 1. Our optimized query works differently, as UPDATE at first calculates all right parts and only then assigns them to the columns. Therefore, it will set Ref to 0, but Amount will be set to the old value of Ref plus 1.

Potential danger: As it was demonstrated above, we should test the code after that modification. All levels above are intrusive and require verification.

4. When we know database schema

When we know database scheme and we have a list of indexes we can make more serious changes. For example, when we see:

update T set Flag='A' where Opt=1
update T set Flag='B' where Opt=2

We can check if there is an index on the column Opt. Assuming, that an index is created on a column with a high selectivity, we can leave there statements as is. However, if there is no index on Opt, we can rewrite it as:

update T
set Flag=case when Opt=1 then 'A' else 'B' end
where Opt in (1,2)

to make an update during the same scan.

Potential danger: Testing is required

5. When we have an access to the actual data and know the meaning of the data

We can do it only when we have an access to the actual data. In the example above, we could determine if Opt is a column with a high selectivity. To do it we should compare the results of 2 queries:

select count(*) from T
select count(distinct Opt) from T

What is more important, we can find values with irregular selectivity and handle them properly. Check this article to read more about irregular selectivity: http://www.lakesidesql.com/articles/?p=8

But sometimes table and column names help, so we understand the situation even without looking at the actual data. In what case, do you think, SQL server uses an index and where it uses a full table scan?

update Patients set Flag='Y' where Gender='F'
update Sales set Flag='N' where Id=13848

In the first case SQL server uses a table scan, because Gender column (M/F) has extremely low selectivity (about 50%). We can even say, that Flag is a low selectivity value in both cases with values ‘Y’,”N’ and probably few others. But syntaxically, both queries are identical.

Another example. Assuming that Gender is NOT NULL, does this statement update all the records?

update Patients set Flag=0 where Gender in ('M','F')

Yes. Now let’s change the table name:

update Nouns set Flag=0 where Gender in ('M','F')

We update a noun, which have a grammar category of Gender, which may be ‘M’, ‘F’ or Neutral! As you see, the only difference is a table name, so we actually guessed the meaning.

We can base our optimization based on many real world constraints, like
• age is less then 255 and non-negative,
• mass is positive,
• InternalId int identity correlates with a date of the document
• SSN is unique.

Potential danger: Requires not only testing but also verification, that constraints are not violated now (such type of verification is possible by querying the existing data) and in the future (not all data may be inserted). You can find that:
• Special value of -1 is used for age as some weird flag
• mass can be negative because sometimes it is delta from the actual and expected value
• InternalId int identity correlates with a date of the document except for the documents, imported from another database a year ago
• SSN is unique, except for the value ‘N/A’, used for all illegal immigrant workers.

5. Actual or expected workload.

Some solutions have 2 sides: they accelerate data retrieval, for example, but slow down the modification. Very good example is a technique of ‘indexed’ or ‘materialized’ views. Indexed views can accelerate selects, but they slow down any updates significantly, especially massive updates.

So before you make any modifications, you should know if data is relatively static or not, how man by updates you expect per one select etc.

On the actual data load the best way to make such analysis is to use Lakeside Trace Analyzer.


When SQL Server Query Optimizer Is Wrong

http://www.lakesidesql.com/articles/?p=18

In most cases, SQL Server Optimizer generates optimal plans. It is impossible to compete with its internal knowledge of average disk access cost, record length or page fill ratio. But, there is one area where human expertise is always superior.

To follow my example, execute the following scripts on an empty database to create 2 tables (you can skip this step and look directly at the results):
Each Sale (master table) is associated with the details table – SalesItems.

create table Sales (
Id int primary key,
Amount money not null,
Date datetime not null,
Comment varchar(128)
)
GO
create table SaleItems (
SaleId int,
ItemId int not null,
Quantity int
)
GO

The Sales table (master table) will have 10’000 random records. The SalesItems table (details table) will have 1 to 10 items for each sale record in the Sales table.

Execute the following scripts to populate both tables and create indexes:

set nocount on
GO
declare @n int, @m int
set @n=10000
while @n>0 begin
insert into Sales select @n,$10.0*(@n%100)+$100.,dateadd(hh,-@n,'20070507'),
'This is sale N '+convert(varchar,@n)
set @m=@n%10
while @m>0 begin
insert into SaleItems select @n,@m,@m+@n
set @m=@m-1
end
set @n=@n-1
end
GO
create index SaleItems_Id on SaleItems (SaleId)
GO
create index Sales_Dates on Sales (Date)
GO

Note that Sales.Date varies from 2006-03-16 08:00 to 2007-05-06 23:00.

Now, we make a first attempt to write a stored procedure that retrieves the maximum quantity of sales for a specified period of time.

create procedure GetMaxQuantity
@p1 datetime, @p2 datetime
as
select max(Quantity) from SaleItems where SaleId in (
select Id from Sales where Date between @p1 and @p2)
GO

Before running the procedure, let’s enable IO Statistics by executing:

set statistics io on

It is also important that we always clear the buffer cache before executing the procedure. Otherwise, the number of physical reads will not be accurate:

dbcc dropcleanbuffers

Now, let’s execute the stored procedure:

exec GetMaxQuantity '20070301','20070302'

In my test environment, I get the following IO statistics:

Table 'SaleItems'. Scan count 25, logical reads 191, physical reads 1, read-ahead reads 4.
Table 'Sales'. Scan count 1, logical reads 2, physical reads 2, read-ahead reads 0.

Let’s again execute the procedure with a different date range:

dbcc dropcleanbuffers

exec GetMaxQuantity '20060301','20070302

This time, the following IO statistics are returned:

Table 'SaleItems'. Scan count 8417, logical reads 55564, physical reads 90, read-ahead reads 173.
Table 'Sales'. Scan count 1, logical reads 17, physical reads 1, read-ahead reads 16.

It is important that you execute the first query before the second. There is no surprise that the second execution, for the whole year, requires much more logical and physical reads. However, 55564 logical reads is too high. So what happened?

Let’s examine the execution plan:

SQL server provides also many counters, like:

While for most of these numbers we can just trust SQL Server, there is one value we can verify – the estimated row count:

SQL Server thinks that when we select data from Sales, we get around 23 records, and then, after joining it to SaleItems we’ll have:

Fortunately, it can be verified:

select count(*) from Sales where Date between '20060301' and '20070302'

In fact, there are 8417 records instead of 23, which was expected by SQL Server’s query optimizer, and

select count(*) from SaleItems where SaleId in (
select Id from Sales where Date between '20060301' and '20070302')

37884 records instead of the expected 118! What a mistake. If only SQL Server knew it, it would have generated a totally different execution plan:

select max(Quantity) from SaleItems where SaleId in (
select Id from Sales where Date between '20060301' and '20070302')

As you can see, when constants are supplied, SQL Server can obtain information from the statistics tables. When it realizes there are too many records, it changes the execution plan and, instead of costly Lookups, uses a Hash Join.

The SQL Server query optimizer is a black box. However, we can sometimes make some interesting experiments.

Execute the following queries:

select * from Sales where Comment like '%A%'
select * from Sales where Comment like '%AA%'
select * from Sales where Comment like '%AAA%'
select * from Sales where Comment like '%AAAA%'
select * from Sales where Comment like '%AAAAA%'
select * from Sales where Comment like '%AAAAAA%'
select * from Sales where Comment like '%AAAAAAA%'
select * from Sales where Comment like '%AAAAAAAA%'

All of them do not return any rows, and therefore all estimations are absolutely wrong. But SQL Server’s query optimizes gives an estimation of the number of returned rows:

As you can see, it is nothing more than pure guess and a heuristic formula.

This is the most common scenario: SQL Server underestimates the number of rows, and queries tables thousands or even millions of times. On the other hand, it can overestimate the number of rows, and uses a Hash Join and reads the whole table when it actually needs only a few rows.

SQL Server knows the physical layout of data much better than us. It even knows some primitive statistics based on columns. SQL Server 2005 can even trace the correlation of data in different tables. However, don’t expect too much from it. It would never understand, that Name LIKE '%Smith%' returns more rows than expected, or joining data from 2 sales tables of 2 different departments would hardly give us any matches; and we did it just to confirm that fact.


Sunday, February 24, 2008

Temporary Tables vs. Table Variables and Their Effect on SQL Server Performance

September 13th, 2006
http://www.lakesidesql.com/articles/?p=12

There are 3 major theoretical differences between temporary tables:

create table #T (…)

and table-variables

declare @T table (…)

The first one is that transaction logs are not recorded for the table-variables. Hence, they are out of scope of the transaction mechanism, as is clearly visible from this example:

create table #T (s varchar(128))
declare @T table (s varchar(128))
insert into #T select 'old value #'
insert into @T select 'old value @'
begin transaction
update #T set s='new value #'
update @T set s='new value @'
rollback transaction
select * from #T
select * from @T

s
---------------
old value #

s
---------------
new value @

After declaring our temporary table #T and our table-variable @T, we assign each one with the same “old value” string. Then, we begin a transaction that updates their contents. At this point, both will now contain the same “new value” string. But when we rollback the transaction, as you can see, the table-variable @T retained its value instead of reverting back to the “old value” string. This happened because, even though the table-variable was updated within the transaction, it is not a part of the transaction itself.

The second major difference is that any procedure with a temporary table cannot be pre-compiled, while an execution plan of procedures with table-variables can be statically compiled in advance. Pre-compiling a script gives a major advantage to its speed of execution. This advantage can be dramatic for long procedures, where recompilation can be too pricy.

Finally, table-variables exist only in the same scope as variables. Contrary to the temporary tables, they are not visible in inner stored procedures and in exec(string) statements. Also, they cannot be used in insert/exec statement.

But let’s compare both in terms of performance.

At first, we prepare a test table with 1 million records:

create table NUM (n int primary key, s varchar(128))
GO
set nocount on
declare @n int
set @n=1000000
while @n>0 begin
insert into NUM
select @n,'Value: '+convert(varchar,@n)
set @n=@n-1
end
GO

Now we prepare our test procedure T1:

create procedure T1
@total int
as
create table #T (n int, s varchar(128))
insert into #T select n,s from NUM
where n%100>0 and n<=@total
declare @res varchar(128)
select @res=max(s) from NUM
where n<=@total and
not exists(select * from #T
where #T.n=NUM.n)
GO

Called with a parameter, which we will vary from 10, 100, 1000, 10’000, 100’000 up to 1’000’000, it copies the given number of records into a temporary table (with some exceptions, it skips records where n is divisible by 100), and then finds a max(s) of such missing records. Of course, the more records we give, the longer the execution is.

To measure the execution time precisely, I use the code:

declare @t1 datetime, @n int
set @t1=getdate()
set @n=100(**)
while @n>0 begin
exec T1 1000(*)
set @n=@n-1 end
select datediff(ms,@t1,getdate())
GO

(*) is a parameter to our procedure, it is varied from 10 to 1’000’000
(**) if an execution time is too short, I repeat the same loop 10 or 100 times.
I run the code several times to get a result of a ‘warm’ execution.

The results can be found in Table 1 (see below).

Now let’s try to improve our stored procedure by adding a primary key to the temporary table:

create procedure T2
@total int
as
create table #T (n int primary key, s varchar(128))
insert into #T select n,s from NUM
where n%100>0 and n<=@total
declare @res varchar(128)
select @res=max(s) from NUM
where n<=@total and
not exists(select * from #T
where #T.n=NUM.n)
GO

Then, lets create a third one. With a clustered index, it works much better. But let’s create the index AFTER we insert data into the temporary table – usually, it is better:

create procedure T3
@total int
as
create table #T (n int, s varchar(128))
insert into #T select n,s from NUM
where n%100>0 and n<=@total
create clustered index Tind on #T (n)
declare @res varchar(128)
select @res=max(s) from NUM
where n<=@total and
not exists(select * from #T
where #T.n=NUM.n)
GO

Surprise! It not only takes longer for the big amounts of data; merely adding 10 records take an additional 13 milliseconds. The problem is that ‘create index’ statements force SQL server to recompile stored procedures, and slows down the execution significantly.

Now let’s try the same using table-variables:

create procedure V1
@total int
as
declare @V table (n int, s varchar(128))
insert into @V select n,s from NUM
where n%100>0 and n<=@total
declare @res varchar(128)
select @res=max(s) from NUM
where n<=@total and
not exists(select * from @V V
where V.n=NUM.n)
GO

To our surprise, this version is not significantly faster than the version with the temporary table. This is a result of a special optimization SQL server has for the create table #T statements in the very beginning of a stored procedure. For the whole range of values, V1 works better or the same as T1.

Now let’s try the same with a primary key:

create procedure V2
@total int
as
declare @V table (n int primary key, s varchar(128))
insert into @V select n,s from NUM
where n%100>0 and n<=@total
declare @res varchar(128)
select @res=max(s) from NUM
where n<=@total and
not exists(select * from @V V
where V.n=NUM.n)
GO

The result is much better, but T2 outruns this version.

Table 1, using SQL Server 2000, time in ms

But the real shock is when you try the same on SQL Server 2005:

Table 2

In some cases, SQL 2005 was much faster then SQL 2000 (marked with green). But in many cases, especially with huge amounts of data, procedures that used table variables took much longer (highlighted with red). In 4 cases, I even gave up waiting.

Conclusion:

1. There is no universal rule of when and where to use temporary tables or table variables. Try them both and experiment.

2. In your tests, verify both sides of the spectrum – small amount/number of records and the huge data sets.

3. Be careful with migrating to SQL 2005 when you use complicated logic in your stored procedures. The same code can run 10-100 times slower on SQL server 2005!


[DRAFT]

XML Basics for SQL Server


Contents:

XML Basics for SQL Server

Two types of XML mapping

Attribute-centric mapping (0, 1)

Element-centric mapping (2)

Two main SQL keywords related to XML:



Two types of XML mapping

  1. Attribute-centric mapping (0, 1)

  2. Element-centric mapping (2)


Attribute-centric mapping (0, 1)

'<catalog>

<book id="bk101"

<author="Gambardella, Matthew"

<title="XML Developers Guide"

<genre="Computer"

<price="44.95"

<publish_date="2000-10-01"

<description = "An in-depth look at creating applications with XML.">

</book>

</catalog>'


Code examples:

DECLARE @iDoc int

EXECUTE sp_xml_preparedocument @iDoc OUTPUT, '<catalog>

<book id="bk101"

author="Gambardella, Matthew"

title="XML Developers Guide"

genre="Computer"

price="44.95"

publish_date="2000-10-01"

description = "An in-depth look at creating applications with XML.">

</book>

</catalog>'


SELECT * FROM OpenXML(@iDoc, '/catalog/book', 0)

WITH

(author varchar(50),

title varchar(50),

genre varchar(50),

price varchar(50),

publish_date varchar(50),

description varchar(1000)

)


Element-centric mapping (2)

'<catalog>

<book id="bk101">

<author>Gambardella, Matthew</author>

<title>XML Developers Guide</title>

<genre>Computer</genre>

<price>44.95</price>

<publish_date>2000-10-01</publish_date>

<description>An in-depth look at creating applications

with XML.</description>

</book>

</catalog>'



Code examples:

DECLARE @iDoc int

EXECUTE sp_xml_preparedocument @iDoc OUTPUT, '<catalog>

<book id="bk101">

<author>Gambardella, Matthew</author>

<title>XML Developers Guide</title>

<genre>Computer</genre>

<price>44.95</price>

<publish_date>2000-10-01</publish_date>

<description>An in-depth look at creating applications with XML.</description>

</book>

</catalog>'


SELECT * FROM OpenXML(@iDoc, '/catalog/book', 2)

WITH

(author varchar(50),

title varchar(50),

genre varchar(50),

price varchar(50),

publish_date varchar(50),

description varchar(1000)

)



Main SQL keywords related to XML:

  1. OpenXML

  2. FOR XML

  3. OpenROWSET


  • OpenXML: is to read the XML data
  • FOR XML: is to generate a XML data
  • OpenROWSET: is to ......



Database Normalization:


Links :

Database Normalization Basics explained with examples: The best resource I have ever found on Normalization

  1. Database Normalization Basics: http://databases.about.com/od/specificproducts/a/normalization.htm
  2. Putting your Database in First Normal Form: http://databases.about.com/od/specificproducts/a/firstnormalform.htm
  3. Putting your Database in Second Normal Form: http://databases.about.com/od/specificproducts/a/2nf.htm
  4. Putting your Database in Third Normal Form: http://databases.about.com/od/specificproducts/a/3nf.htm

Friday, February 22, 2008


Identity Column:

Topics:

  • Identity
  • SET Identity_Insert <Table Name> ON|OFF

  • SCOPE_IDENTITY(function)
  • IDENT_CURRENT(function)
  • @@IDENTITY
  • IDENT_INCR (function)
  • IDENT_SEED (function)

Identity

IDENTITY ( data_type [ , seed , increment ] ) AS column_name


SET Identity_Insert

syntax:

SET Identity_Insert <Table Name> ON|OFF

In a session only one table can have the Identity_Insert set to ON. If we try to ON the Identity_Insert on one table when there is another table for which Identity_Insert is ON, then the Server throws an error like this...

Msg 8107, Level 16, State 1, Line 1
IDENTITY_INSERT is already ON for table 'table1'. Cannot perform SET operation for table 'table2'.

SCOPE_IDENTITY

Returns the last IDENTITY value inserted into an IDENTITY column in the same scope. A scope is a module -- a stored procedure, trigger, function, or batch. Thus, two statements are in the same scope if they are in the same stored procedure, function, or batch.

Syntax: SCOPE_IDENTITY( )

IDENT_CURRENT

Returns the last identity value generated for a specified table in any session and any scope.

Syntax: IDENT_CURRENT('table_name')

@@IDENTITY

Returns the last-inserted identity value (of any table).

Syntax: @@IDENTITY

SCOPE_IDENTITY vs IDENT_CURRENT vs @@IDENTITY

IDENT_CURRENT is similar to the Microsoft® SQL Server™ 2000 identity functions SCOPE_IDENTITY and @@IDENTITY. All three functions return last-generated identity values. However, the scope and session on which 'last' is defined in each of these functions differ.

  • IDENT_CURRENT returns the last identity value generated for a specific table in any session and any scope.
  • @@IDENTITY returns the last identity value generated for any table in the current session, across all scopes.
  • SCOPE_IDENTITY returns the last identity value generated for any table in the current session and the current scope.

Table

Session

Scope

IDENT_CURRENT

Specific

Any

Any

@@IDENTITY

Any

Current

Any

SCOPE_IDENTITY

Any

Current

Current

IDENT_INCR & IDENT_SEED function


IDENT_INCR function: Returns the increment value specified during the creation of an identity column in a table or view that has an identity column.

syntax: IDENT_INCR ( 'table_or_view' )


IDENT_SEED function:Returns the seed value that was specified when an identity column in a table or a view that has an identity column was created.

syntax: IDENT_SEED ( 'table_or_view' )


Scenario Questions: (What is the output of this query)

Query 1:

create table #t1(sno int identity(1,1), name char(1))

insert into #t1 (name) values ('a')

insert into #t1 (name) values ('a')

insert into #t1 (name) values ('a')

insert into #t1 (name) values ('a')

insert into #t1 (name) values ('a')

dbcc checkident (#t1, RESEED, 10)

insert into #t1 (name) values ('a')

select max(sno) from #t1

Ans: 11

Query 2:

create table #t1(sno int identity(1,1), name char(1))

dbcc checkident (#t1, RESEED, 10)

insert into #t1 (name) values ('a')

select max(sno) from #t1

Ans: 10

Query 3:

create table #t1(sno int identity(1,1), name char(1))

insert into #t1 (name) values ('a')

insert into #t1 (name) values ('a')

insert into #t1 (name) values ('a')

insert into #t1 (name) values ('a')

insert into #t1 (name) values ('a')

dbcc checkident (#t1, RESEED, 2)

insert into #t1 (name) values ('a')

insert into #t1 (name) values ('a')

select * from #t1

Ans: the values in 'sno' column are: 1,2,3,4,5,3,4. Note 3,4 are repeated.

Query 4:

create table #t1(sno int identity(1,1) PRIMARY KEY, name char(1))

insert into #t1 (name) values ('a')

insert into #t1 (name) values ('a')

insert into #t1 (name) values ('a')

insert into #t1 (name) values ('a')

insert into #t1 (name) values ('a')

dbcc checkident (#t1, RESEED, 2)

insert into #t1 (name) values ('a')

select * from #t1

Ans: Since a PRIMARY KEY (or UNIQUE) constraint exists on the identity column, error message 2627 will be generated on later insert operations into the table because the generated identity value will conflict with existing values.

Questions:

1) Will IDENTITY column on a table enforce uniqueness of that column?

Ans 1) NO.

Explanation: see Query 3 above.

All the notes to learn SQL Server. includes my learnings too..