Dev's Guide to SQL Server: 2016

Thursday, December 29, 2016

Date Dimension aka Calendar Table

Before we get into discussing how to create it date dimension and how to use it, first let's talk about what it is and why do we need it. Depending on who you talk to, people can refer to this concept as "Calendar table" or "Date Dimension," which is usually found in Data Warehouse. No matter how it is called, at the end of the day, it is a table in SQL Server which is populated with different date/calendar related information to help speed up SQL queries which require specific parts of dates.

In my case, I have created it to be able to aggregate data by quarters, years and month. Depending on how large your requirements are it will add additional complexity to building it. Since I don't care about holidays (for now at least), I will not be creating holiday schedule which can be complicated to populate.

Now onto writing some SQL. Step 1: Create a table and add "covered" Nonclustered index that I need for my queries. Depending on your needs you will need to create additional indexes and/or modify the one I'm creating.

 
CREATE TABLE [dbo].[DateDimension]
    (
        [DateKey]             [INT]         NOT NULL
      , [Date]                [DATE]        NOT NULL
      , [Day]                 [TINYINT]     NOT NULL
      , [DaySuffix]           [CHAR](2)     NOT NULL
      , [Weekday]             [TINYINT]     NOT NULL
      , [WeekDayName]         [VARCHAR](10) NOT NULL
      , [IsWeekend]           [BIT]         NOT NULL
      , [DOWInMonth]          [TINYINT]     NOT NULL
      , [DayOfYear]           [SMALLINT]    NOT NULL
      , [WeekOfMonth]         [TINYINT]     NOT NULL
      , [WeekOfYear]          [TINYINT]     NOT NULL
      , [Month]               [TINYINT]     NOT NULL
      , [MonthName]           [VARCHAR](10) NOT NULL
      , [Quarter]             [TINYINT]     NOT NULL
      , [QuarterName]         [VARCHAR](6)  NOT NULL
      , [Year]                [INT]         NOT NULL
      , [MMYYYY]              [CHAR](6)     NOT NULL
      , [MonthYear]           [CHAR](7)     NOT NULL
      , [FirstDayOfMonth]     [DATE]        NOT NULL
      , [LastDayOfMonth]      [DATE]        NOT NULL
      , [FirstDayOfQuarter]   [DATE]        NOT NULL
      , [LastDayOfQuarter]    [DATE]        NOT NULL
      , [FirstDayOfYear]      [DATE]        NOT NULL
      , [LastDayOfYear]       [DATE]        NOT NULL
      , [FirstDayOfNextMonth] [DATE]        NOT NULL
      , [FirstDayOfNextYear]  [DATE]        NOT NULL ,
    )

GO

ALTER TABLE dbo.DateDimension
ADD CONSTRAINT PK_DateDimension
    PRIMARY KEY CLUSTERED (DateKey)

IF NOT EXISTS
    (
        SELECT  1
        FROM    sys.indexes AS i
        WHERE   i.name = 'IX_DateDimension_Date'
    )
    CREATE NONCLUSTERED INDEX [IX_DateDimension_Date]
        ON dbo.DateDimension ([Date])
        INCLUDE ([Month], [Quarter], [Year])
GO

The reason why I chose to include Month, Quarter and Year columns in my index is based on my requirements to aggregate data by those columns. Originally I did not have any columns included, which produced Key Lookup. Now to populate table I have taken some suggestions from Aaron's script, link at the bottom of the post. In my case, I'm truncating table and limiting it to 300 years which is more than plenty for my scenario.

 
TRUNCATE TABLE dbo.DateDimension

DECLARE @StartDate     DATE = '01/01/1900'
      , @NumberOfYears INT  = 300;

DECLARE @CutoffDate DATE = DATEADD( YEAR, @NumberOfYears, @StartDate );

-- prevent set or regional settings from interfering with 
-- interpretation of dates / literals
SET DATEFIRST 7;
SET DATEFORMAT MDY;
SET LANGUAGE US_ENGLISH;
-- Thanks to Aaron for this hint

WITH Pass0 AS
    (
        SELECT  1 AS C
        UNION ALL
        SELECT  1
    )
   , Pass1 AS
    (
        SELECT  1 AS C
        FROM    Pass0 AS A
              , Pass0 AS B
    )
   , Pass2 AS
    (
        SELECT  1 AS C
        FROM    Pass1 AS A
              , Pass1 AS B
    )
   , Pass3 AS
    (
        SELECT  1 AS C
        FROM    Pass2 AS A
              , Pass2 AS B
    )
   , Pass4 AS
    (
        SELECT  1 AS C
        FROM    Pass3 AS A
              , Pass3 AS B
    )
   , Pass5 AS
    (
        SELECT  1 AS C
        FROM    Pass4 AS A
              , Pass4 AS B
    )
   , Pass6 AS
    (
        SELECT TOP (DATEDIFF( DAY, @StartDate, @CutoffDate ))
            rn = ROW_NUMBER() OVER (ORDER BY Pass5.C)
        FROM    Pass5
    )
   , Pass7 AS
    (
        SELECT  a.Date
              , [day]          = DATEPART( DAY, a.Date )
              , [week]         = DATEPART( WEEK, a.Date )
              , [month]        = DATEPART( MONTH, a.Date )
              , [quarter]      = DATEPART( QUARTER, a.Date )
              , [year]         = DATEPART( YEAR, a.Date )
              , [DayOfWeek]    = DATEPART( WEEKDAY, a.Date )
              , [FirstOfMonth] = CONVERT( DATE, DATEADD( MONTH, DATEDIFF( MONTH, 0, a.Date ), 0 ))
              , FirstOfYear    = CONVERT( DATE, DATEADD( YEAR, DATEDIFF( YEAR, 0, a.Date ), 0 ))
              , [MonthName]    = DATENAME( MONTH, a.Date )
              , Style101       = CONVERT( CHAR(10), a.Date, 101 )
              , Style112       = CONVERT( CHAR(8), a.Date, 112 )
        FROM
            (
                SELECT  [Date] = DATEADD( DAY, Pass6.rn - 1, @StartDate )
                FROM    Pass6
            ) a
    )
INSERT  dbo.DateDimension WITH (TABLOCKX)
SELECT  DateKey             = CONVERT( INT, CONVERT( CHAR(8), pass7.Date, 112 ))
      , [Date]              = pass7.Date
      , [Day]               = CONVERT( TINYINT, pass7.day )
      , DaySuffix           = CONVERT(   CHAR(2)
                                       , CASE
                                             WHEN pass7.day / 10 = 1
                                                 THEN 'th'
                                         ELSE CASE RIGHT(pass7.day, 1)
                                                  WHEN '1'
                                                      THEN 'st'
                                                  WHEN '2'
                                                      THEN 'nd'
                                                  WHEN '3'
                                                      THEN 'rd'
                                              ELSE 'th'
                                              END
                                         END
                                     )
      , [Weekday]           = CONVERT( TINYINT, pass7.DayOfWeek )
      , [WeekDayName]       = CONVERT( VARCHAR(10), DATENAME( WEEKDAY, pass7.Date ))
      , [IsWeekend]         = CONVERT(   BIT
                                       , CASE
                                             WHEN pass7.DayOfWeek IN (
                                                                         1, 7
                                                                     )
                                                 THEN 1
                                         ELSE 0
                                         END
                                     )
      , [DOWInMonth]        = CONVERT(   TINYINT
                                       , ROW_NUMBER() OVER (PARTITION BY pass7.FirstOfMonth
                                                                       , pass7.DayOfWeek
                                                            ORDER BY pass7.Date
                                                           )
                                     )
      , [DayOfYear]         = CONVERT( SMALLINT, DATEPART( DAYOFYEAR, pass7.Date ))
      , WeekOfMonth         = CONVERT(   TINYINT
                                       , DENSE_RANK() OVER (PARTITION BY pass7.year
                                                                       , pass7.month
                                                            ORDER BY pass7.week
                                                           )
                                     )
      , WeekOfYear          = CONVERT( TINYINT, pass7.week )
      , [Month]             = CONVERT( TINYINT, pass7.month )
      , [MonthName]         = CONVERT( VARCHAR(10), pass7.MonthName )
      , [Quarter]           = CONVERT( TINYINT, pass7.quarter )
      , QuarterName         = CONVERT(   VARCHAR(6)
                                       , CASE pass7.quarter
                                             WHEN 1
                                                 THEN 'First'
                                             WHEN 2
                                                 THEN 'Second'
                                             WHEN 3
                                                 THEN 'Third'
                                             WHEN 4
                                                 THEN 'Fourth'
                                         END
                                     )
      , [Year]              = pass7.year
      , MMYYYY              = CONVERT( CHAR(6), LEFT(pass7.Style101, 2) + LEFT(pass7.Style112, 4))
      , MonthYear           = CONVERT( CHAR(7), LEFT(pass7.MonthName, 3) + LEFT(pass7.Style112, 4))
      , FirstDayOfMonth     = pass7.FirstOfMonth
      , LastDayOfMonth      = MAX( pass7.Date ) OVER (PARTITION BY pass7.year
                                                                 , pass7.month
                                                     )
      , FirstDayOfQuarter   = MIN( pass7.Date ) OVER (PARTITION BY pass7.year
                                                                 , pass7.quarter
                                                     )
      , LastDayOfQuarter    = MAX( pass7.Date ) OVER (PARTITION BY pass7.year
                                                                 , pass7.quarter
                                                     )
      , FirstDayOfYear      = pass7.FirstOfYear
      , LastDayOfYear       = MAX( pass7.Date ) OVER (PARTITION BY pass7.year)
      , FirstDayOfNextMonth = DATEADD( MONTH, 1, pass7.FirstOfMonth )
      , FirstDayOfNextYear  = DATEADD( YEAR, 1, pass7.FirstOfYear )
FROM    Pass7

Few things to consider in the script. Depending on requirements and local setting, part of the script would need to be adjusted to reflect correct data. For example, Sunday could the first day of the week in some countries and last day of the week in other coutnries. Same goes for date formats, in United States we have Month/Day/Year back home in Russia I always wrote Day/Month/Year which could create large issues when trying to compare data stored in one format to date dimension that is stored in a different format. That is why the following two commands are so important SET DATEFIRST 7 and SET DATEFORMAT MDY;

Now that table is populated you too can run aggregate queries without having to generate data at run time. Enjoy!

References:

Bertrand, Aaron. "Creating a Date Dimension or Calendar Table in SQL Server." Creating a Date Dimension or Calendar Table in SQL Server. N.p., 20 Oct. 2015. Web. 29 Dec. 2016. <https://www.mssqltips.com/sqlservertip/4054/creating-a-date-dimension-or-calendar-table-in-sql-server/>.

Friday, October 7, 2016

Functions and Implicit Conversion

Has it ever happened to you that most simple piece of code produces results that make no sense what so ever? Today, I helped a coworker fix one of those little issues that I thought deserves a post. Even though all men are created equal, SQL Server functions are not. Depending on who wrote the function and at what point in the history of SQL Server it can behave differently from what is expected.

Today I want to look at ISNULL vs COALESCE vs OR . To start will I will create a simple table and populate it with two records.


IF OBJECT_ID('dbo.BlogPost') IS NOT NULL
 DROP TABLE dbo.BlogPost

CREATE TABLE dbo.BlogPost
( ID INT
,CODE VARCHAR(10)
)
GO
INSERT INTO dbo.BlogPost
(ID, CODE)
VALUES(1, 'BOB01')
,(2, 'VLADIMIR01')
GO
SELECT *
FROM dbo.BlogPost

Results are as follows.

ID	CODE
1	BOB01
2	VLADIMIR01

Exactly what we would expect. Now unto something more fun. Let us try to limit result by use of a local variable, this would work exactly the same as a parameter in stored procedure or a function.


DECLARE @CODE VARCHAR(6) = NULL

SELECT 'ISNULL- METHOD', *
FROM dbo.BlogPost
WHERE CODE = ISNULL(@CODE, CODE)

SELECT 'COALESCE - METHOD', *
FROM dbo.BlogPost
WHERE CODE = COALESCE(@CODE, CODE)

SELECT 'OR - METHOD', *
FROM dbo.BlogPost
WHERE CODE = @CODE OR @CODE IS NULL

After running the code we get the following results

Method	ID	CODE
ISNULL- METHOD	1	BOB01

Even though we would expect to see both records returned we only get 1 record. Huh? This is exactly what puzzled a coworker, ofcourse query was not as simple as this one but same issue caused him to hit a road block.

In the case of COALESCE and OR methods, results are identical.

Method	ID	CODE
COALESCE - METHOD	1	BOB01
COALESCE - METHOD	2	VLADIMIR01

Method	ID	CODE
OR- METHOD	1	BOB01
OR- METHOD	2	VLADIMIR01

So what caused results to be different for the first function but be the same for next two functions? I'm sure the title of the post already gave it away. Implicit Conversion is what happens behind the scenes and usually when you least expect it. Unless you look at execution plan you would not be able to tell that CONVERT_IMPLICIT has ever occured. I know I can't stop raving about Plan Explorer and how much I love it, if we run same query in PE and look at the base select statement it shows right on the plan CONVERT_IMPLICIT as seen in the screenshot below.

Moral of today's story is: Don't ever assume that all SQL Server functions will behave exactly the same. In some cases, they might throw you in for a spin and you will be scratching your head at results without knowing how to fix it or what is causing it. In the case of what inspired today's post, my coworker will now be more educated and will be more cautious when working with SQL server functions. "It depends!" phrase has truly become a common theme for SQL Server operations. It is always best to test multiple ways of coding to figure out what is the best and which way will produce consistent results.

note: I did not write original code

Tuesday, October 4, 2016

Update SQL Login Password Powershell

This is a quick little reference on how to update the password for SQL Login when you don't SSMS installed on the machine running SQL, but you are able to login to the machine with the user that has SA the priviledges on SQL Instance.

Step 1

Step 2

Start PowerShell

Step 3

Connect to SQL Server by running the following command and update password.

 
sqlcmd -s instance-name
alter login sqluser with password='StrongPassword'
go

In my case I did the following.

Step 4

Celebrate and go back to finishing preparing for a presentation or whatever else you were doing.

Thursday, September 29, 2016

SSRS Report Prints Extra Pages

Problem

Recently run into an issue that caused me spend more time trying figure out what to do that it did to fix it. I got a very simple ticket. Client reports that extra pages are being printed on SSRS report when it is being sent to a specific printer but other printers are fine, additionally printing to PDF is fine.

After some research, I found multiple articles online that talk about improper page and body setup that results in extra pages. Since I'm not used to working on SSRS report inside BIDS (Bussiness Intelligence Development Studio) which was a precursor to SSDT (SQL Server Data Tools), It took me for longer than I would expect to accomplish a simple task. Therefore I'm hoping the following screenshots will save someone (most likely me) time in fixing this issue.

Solution

Step 1:

Open report in Design mode which is causing issues.

Step 2:

Verify printer margins by right clicking on empty space and selecting Report Properties

Step 3:

Check all margins, for some older printers margins, should be at least .25in to be on safe side. Newer printers can go less but some of older printers can not do it.

Step 4:

Calculate maximum report body size by subtracting margins from width and height, in my case

Width - Left - Right (11 - .5 - .6) = 9.9in

Height - Top - Bottom (8.5 - .5 - .25) = 7.75in

Step 5:

Click on the body of the report to see height and width in the properties window. Now you can adjust the size to be less than calculated size from previous step.

Conlusion

Now you are all done, all that is left is to publish changes and deploy report to the client.

Wednesday, September 14, 2016

Workaround to upgrading SSRS project

This guide is more of work around for specific opportunity I faced at work. Hopefully, this can help someone else to create a workaround for a problem that should be rather easy to be fixed, but there are limitations that prevent you from taking the simple solution.

Scenario:

You have upgraded to newest Visual Studio 2015 and you are excited (not me) to make some reports. You get the path for the project and you create working directory using Visual Source Safe ( an old tool that got replaced by Team Foundation Server). Now that files are on your local machine, you fire up Visual Studio and try to open the solution only to get the following screen.

Now What? You talk to other people and they are still using Visual Studio 2010 and you are not allowed to upgrade and break it for everyone.

Solution:

Create new project and name it something different from original project to avoid confusion

Now that we have a project we need to copy all report definitions to the new location that was created for the project. To do that just find all .RDL files in old folder and copy them to new folder that was created for the project.
Now just add reports to the project by right-clicking on Reports Folder in Solution explorer and select all reports that have been copied over.
At this point reports are in new project and you are ready to develop (or are you)
You find the most amazing report that looks something like this and you click Preview just to get the following error.

This error even though it is cryptic, is actually missing Shared Data Source file that existed in the original project.
Now all we have to do is copy .RDS file found in the original directory to the new directory where we placed .RDL files and then add Existing Item under Shared Data Sources
At this point we can preview report but now arises the issue of deploying it the server. Since we created the new project we don't know where it needs to be deployed. Luckily this can be easily fixed by opening .rptproj file in Notepad++ or another editor and looking for Configurations element. It should look something like this.


    <Configuration>
      <Name>Debug</Name>
      <Platform>Win32</Platform>
      <Options>
        <OutputPath>bin\Debug</OutputPath>
        <TargetServerVersion>SSRS2008R2</TargetServerVersion>
        <TargetServerURL>http://MyServer/ReportServer_MyServer</TargetServerURL>
        <TargetFolder>/MyProjectTarget</TargetFolder>
        <TargetDataSourceFolder>/MyProjectTarget/Data Sources</TargetDataSourceFolder>
        <TargetDatasetFolder>Datasets</TargetDatasetFolder>
        <TargetReportPartFolder>Report Parts</TargetReportPartFolder>
        <StartItem>MyAwesomeReport.rdl</StartItem>
      </Options>
    </Configuration>

Once you have that information just place it project properties of your new project and you are done.

Notes:

Couple things to consider, but doing this we have removed the project from original source control solution and it will take additional effort to commit it back to source control. The easiest way to overcome would be to check out and check-in files from new location using Source Control utility. In my case, I can easily check out and check files back in using Visual SourceSafe Explorer.

As always questions and comments are welcomed and I if I can add any details to help out fellow developer I would be happy to do so.

Tuesday, August 30, 2016

SQL Joins - Basics Part 2

Having already covered the basics of SQL Join syntax, now it is time to jump into a little more advanced stuff. Early in my career, I remember being confused about the difference in results when the same clause is placed in JOIN instead of WHERE clause. This post is aimed to clarify those questions and few others around ON and WHERE clauses. For demonstration I will use table built and populated in SQL Joins - Basics Part 1

First, let us look at basic LEFT join from Part 1. All rows are returned from Teacher table and all matching rows from Student table are displayed.

 
SELECT  t.FullName AS TeacherFullName
       ,s.FullName AS StudentFullName
FROM    Teacher t
LEFT OUTER JOIN Student s
        ON s.TeacherID = t.ID

TeacherFullName	StudentFullName
Roy Chad	Jepson Balfour
Roy Chad	Milburn Brett
Dudley Goddard	Clinton Schuylerr
Dudley Goddard	Norbert Kemp
Raphael Philander	Meriwether Kennedy
Raphael Philander	Braith Cornelius
Sonnie Davin	NULL

Now we introduce simple WHERE clause. By adding condition to WHERE clause we are now restricting entire results set to specific condition, any rows that do not satisfy that condition are excluded from results. Therefore we only end up with one row seen below.

 
SELECT  t.FullName AS TeacherFullName
       ,s.FullName AS StudentFullName
FROM    Teacher t
LEFT OUTER JOIN Student s
        ON s.TeacherID = t.ID
WHERE s.FullName = 'Jepson Balfour'

TeacherFullName	StudentFullName
Roy Chad	Jepson Balfour

So what happens when you move same clause to the ON clause? Ok, lets test it!

 
SELECT  t.FullName AS TeacherFullName
       ,s.FullName AS StudentFullName
FROM    Teacher t
LEFT OUTER JOIN Student s
        ON s.TeacherID = t.ID
  and s.FullName = 'Jepson Balfour'

TeacherFullName	StudentFullName
Roy Chad	Jepson Balfour
Dudley Goddard	NULL
Raphael Philander	NULL
Sonnie Davin	NULL

What happened? Result set looks nothing like first or second example. When I first did that I was thoroughly confused. To explain the result lets run another query.

 
SELECT s.FullName
FROM dbo.Student as s
WHERE s.FullName = 'Jepson Balfour'

StudentFullName
Jepson Balfour

Even though last two queries look different in reality they are placing exactly same restriction on Student table. On clause on Student table became where clause that restricts results to only rows that specific that critirea. Since Teacher table is not joined with INNER join it is not restricted by what happens to Student therefore we see all Teachers displyaed but only one of them actually showing a student.

Conclusion:

The ON clause is a powerfull way to change your result set exactly to what you need it to be, but if used without understanding of what happens to the data it can produce unpredicted result set. Each statement placed in ON clause will be evaulated prior to WHERE clause. This goes back to understanding order of operations in SQL Server. Below are just few of the operations listed in correct order. By knowing and understanding the order of operations in SQL Server it helps to understand why queries behaved the way that they did above. Each one was evaluated by SQL Server in correct order which produced correct output based on that structure.

FROM

ON

WHERE

SELECT

As always feel free to leave comments, questions, etc. For next post I will answer common question of TOP clause.

Tuesday, August 23, 2016

NOT IN - NOT EXISTS - LEFT JOIN - EXCEPT

Today I stumbled upon a blog post that pointed out how inefficient NOT IN clause and comment was made to use LEFT OUTER JOIN because it is "faster". They showed execution plans of both and SSMS query cost from Actual Execution plan. The first thing that jumped out at me were the Table scans that were present in both plans. Since I did some reading previously on how these suppose to yield almost similar results I decided to give this test a try myself.

To begin with, I created tables in my test database and pulled over data from AdventureWorks2014 database. To avoid extra operations that I did not want to skew results, I took the liberty of changing NVARCHAR to VARCHAR and drop extra columns that served no purpose in my tests.

 
--CREATE Clustered Tables
CREATE TABLE dbo.Person_Clustered
    (
      [BusinessEntityID] INT NOT NULL
                             PRIMARY KEY
    , [PersonType] CHAR(2) NOT NULL
    , [FirstName] VARCHAR(50) NOT NULL
    , [LastName] VARCHAR(50) NOT NULL,
    )
CREATE TABLE dbo.MyPeople_Clustered
    (
      [MyPeople_ID] INT NOT NULL
                        PRIMARY KEY
    , [PersonType] CHAR(2) NOT NULL
    , [FirstName] VARCHAR(50) NOT NULL
    , [LastName] VARCHAR(50) NOT NULL,
    )

--CREATE NonClustered Tables
CREATE TABLE dbo.Person_NonClustered
    (
      [BusinessEntityID] INT NOT NULL
    , [PersonType] CHAR(2) NOT NULL
    , [FirstName] VARCHAR(50) NOT NULL
    , [LastName] VARCHAR(50) NOT NULL,
    )
CREATE TABLE dbo.MyPeople_NonClustered
    (
      [MyPeople_ID] INT NOT NULL
    , [PersonType] CHAR(2) NOT NULL
    , [FirstName] VARCHAR(50) NOT NULL
    , [LastName] VARCHAR(50) NOT NULL,
    )

--Write all data from base table
INSERT  INTO dbo.Person_Clustered
        ( BusinessEntityID
        , PersonType
        , FirstName
        , LastName
        )
        SELECT  BusinessEntityID
              , CONVERT(CHAR(2), PersonType)
              , CONVERT(VARCHAR(20), FirstName)
              , LastName
        FROM    AdventureWorks2014.Person.Person

--Write some data to it, 
INSERT  INTO dbo.MyPeople_Clustered
        SELECT  *
        FROM    dbo.Person_Clustered AS p
        WHERE   p.BusinessEntityID % 133 = 0

INSERT  INTO dbo.MyPeople_NonClustered
        SELECT  *
        FROM    dbo.MyPeople_Clustered AS p

INSERT  INTO dbo.Person_NonClustered
        SELECT  *
        FROM    dbo.Person_Clustered AS p

SELECT  COUNT(*) dbo.Person_Clustered
SELECT  COUNT(*) dbo.Person_NonClustered
SELECT  COUNT(*) dbo.MyPeople_Clustered
SELECT  COUNT(*) dbo.MyPeople_NonClustered

For the actual test I created few different scenarios including NOT EXISTS and EXCEPT clauses. First 4 test are based on clustered primary keys and last 4 are based on non-clustered tables.

 

--Clustered Tables  - NOT IN
SELECT  p.BusinessEntityID
      , p.PersonType
      , p.FirstName
      , p.LastName
FROM    dbo.Person_Clustered AS p
WHERE   p.BusinessEntityID NOT IN ( SELECT  mp.MyPeople_ID
                                    FROM    dbo.MyPeople_Clustered mp )
--Clustered Tables  - NOT EXISTS
SELECT  p.BusinessEntityID
      , p.PersonType
      , p.FirstName
      , p.LastName
FROM    dbo.Person_Clustered AS p
WHERE   NOT EXISTS ( SELECT 1
                     FROM   dbo.MyPeople_Clustered mp
                     WHERE  mp.MyPeople_ID = p.BusinessEntityID )


--Clustered Tables  - LEFT OUTER JOIN
SELECT  p.BusinessEntityID
      , p.PersonType
      , p.FirstName
      , p.LastName
FROM    dbo.Person_Clustered AS p
LEFT OUTER JOIN dbo.MyPeople_Clustered mp
        ON mp.MyPeople_ID = p.BusinessEntityID
WHERE   mp.MyPeople_ID IS NULL

--Clustered Tables  - EXCEPT
SELECT  p.BusinessEntityID
      , p.PersonType
      , p.FirstName
      , p.LastName
FROM    dbo.Person_Clustered AS p
EXCEPT
SELECT  mp.MyPeople_ID
      , mp.PersonType
      , mp.FirstName
      , mp.LastName
FROM    dbo.MyPeople_Clustered AS mp


--NonClustered Tables  - NOT IN
SELECT  p.BusinessEntityID
      , p.PersonType
      , p.FirstName
      , p.LastName
FROM    dbo.Person_NonClustered AS p
WHERE   p.BusinessEntityID NOT IN ( SELECT  mp.MyPeople_ID
                                    FROM    dbo.MyPeople_NonClustered mp )

--NonClustered Tables  - NOT EXISTS
SELECT  p.BusinessEntityID
      , p.PersonType
      , p.FirstName
      , p.LastName
FROM    dbo.Person_NonClustered AS p
WHERE   NOT EXISTS ( SELECT 1
                     FROM   dbo.MyPeople_NonClustered mp
                     WHERE  mp.MyPeople_ID = p.BusinessEntityID )

--NonClustered Tables  - LEFT OUTER JOIN
SELECT  p.BusinessEntityID
      , p.PersonType
      , p.FirstName
      , p.LastName
FROM    dbo.Person_NonClustered AS p
LEFT OUTER JOIN dbo.MyPeople_NonClustered mp
        ON mp.MyPeople_ID = p.BusinessEntityID
WHERE   mp.MyPeople_ID IS NULL


--NonClustered Tables  - EXCEPT
SELECT  p.BusinessEntityID
      , p.PersonType
      , p.FirstName
      , p.LastName
FROM    dbo.Person_NonClustered AS p
EXCEPT
SELECT  mp.MyPeople_ID
      , mp.PersonType
      , mp.FirstName
      , mp.LastName
FROM    dbo.MyPeople_NonClustered AS mp

Since the point of this test was compare execution times and actual plans I captured actual execution plans using SSMS and compared run duration using SQL Sentry Plan Explorer.

Conclusion:

As it can be seen from my test, all 4 operations performed almost identical plans and almost within few milliseconds of each other. In case of Non-Clustered tables operations changed from clustered index scans to table scans but duration was still on point. In case of EXCEPT clause on Non-Clustered tables it introduced additional sort operations.

Looking at NOT IN and NOT EXISTS those operations produced identical plans and LEFT OUTER JOIN actually required one more operation to complete on top of those performed by first two.

This is exactly why I will continue to test code multiple ways and might use one way over another depending on situation.

Thursday, August 18, 2016

CURSOR Tests

During another day on twitter, I started friendly discussion on about which cursor options are faster versus another. Since I love a little challenge, I decided to recreate test scenario, but to be "fair" to the cursor since I have a soft spot for them since my Oracle days, I changed code just slightly by "fixing" few things that could hinder clean fight.

First thing I did was to place both cursors into stored procedures so I can easily call it, and next I modified code to write test result to a table instead of trying to calculate them by hand after each run. At the bottom of the page is the code to setup test.

I have done "cold" testing to remove extra factors that can contribute to inconclusive results. I ran each of the following query 20 times.

Query 1


DBCC FREESESSIONCACHE
DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS 
GO
exec dbo.Cursor_KeySetTest

Query 2


DBCC FREESESSIONCACHE
DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS 
GO
exec dbo.Cursor_FASTForward

Query 3


DBCC FREESESSIONCACHE
DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS 
GO
SET NOCOUNT ON;
SELECT TOP 0 * INTO vSWA FROM Sales.vStoreWithAddresses

DECLARE @StartTime datetime2 = SYSDATETIME()

INSERT INTO vSWA
SELECT  BusinessEntityID
                , Name
                , AddressType
                , AddressLine1
                , AddressLine2
                , City
                , StateProvinceName
                , PostalCode
                , CountryRegionName 
  FROM Sales.vStoreWithAddresses;

DECLARE @EndTime datetime2 = SYSDATETIME()
DECLARE @Duration INT = DATEDIFF(millisecond,@StartTime,@EndTime)

INSERT INTO TestTimes (CallType, Duration, TestTime)
VALUES ('Set Based',@Duration, SYSDATETIME())

GO
DROP TABLE vSWA;
GO

Now for the results just a simple aggregate query from the table that I created.

Test Result Query


SELECT CallType
 ,MIN(duration) as MinDuration
 ,MAX(duration) as MaxDuration
 ,AVG(duration) as AvgDuration
 ,COUNT(*) NumberOfTries
FROM dbo.TestTimes as t
GROUP BY CallType

Results

Call Type	MinDuration	MaxDuration	AvgDuration	NumberOfTries
Cursor_FASTForward	122	390	191	20
Cursor_KeySetTest	162	398	232	20
Set Based	77	275	98	20

Conclusion:

In my test which was done on my desktop with Windows 10, SQL Server 2016 Developer Edition, SET BASED approach won with average duration of 98 milliseconds. Second place goes to LOCAL FAST_FORWARD with 191 millisecond average time which is twice as slow as set based approach. Third place goes to LOCAL KEYSET cursor with average duration of 232 millisecond.

In my test scenario SET BASED still faster and will likely be faster for most scenarios, so far I have only encountered 1 scenario in my work experiences where cursor was actually faster and it had to do with TSQL based ETL loading very large tables with geospatial data. Cursors have their purpose and should be used with understanding that it can create performance issues if used incorrectly. Until I see some other information I will likely stick to using LOCAL FAST_FORWARD options which seem to provide fastest CURSOR times.


CREATE PROC [dbo].[Cursor_FASTForward]
AS 
    BEGIN
  SET NOCOUNT ON;
        SELECT TOP 0
                *
        INTO    vSWA
        FROM    Sales.vStoreWithAddresses ;

 
        DECLARE curSWA CURSOR LOCAL FAST_FORWARD FOR
        SELECT  BusinessEntityID
                , Name
                , AddressType
                , AddressLine1
                , AddressLine2
                , City
                , StateProvinceName
                , PostalCode
                , CountryRegionName 
       FROM Sales.vStoreWithAddresses ;
 
        DECLARE @BusinessEntityID INT
          , @Name Name
          , @AddressType Name
          , @AddressLine1 NVARCHAR(60)
          , @AddressLine2 NVARCHAR(60)
          , @City NVARCHAR(30)
          , @StateProvinceName Name
          , @PostalCode NVARCHAR(15)
          , @CountryRegionName Name ;
 
        OPEN curSWA
 
        DECLARE @StartTime datetime2 = SYSDATETIME();
        FETCH NEXT FROM curSWA INTO @BusinessEntityID, @Name, @AddressType, @AddressLine1,
                    @AddressLine2, @City, @StateProvinceName, @PostalCode,
                    @CountryRegionName ;
        WHILE ( @@fetch_status <> -1 ) 
            BEGIN
                IF ( @@fetch_status <> -2 ) 
                    BEGIN
                        INSERT  INTO vSWA
                                ( BusinessEntityID
                                , Name
                                , AddressType
                                , AddressLine1
                                , AddressLine2
                                , City
                                , StateProvinceName
                                , PostalCode
                                , CountryRegionName
                                )
                        VALUES  ( @BusinessEntityID
                                , @Name
                                , @AddressType
                                , @AddressLine1
                                , @AddressLine2
                                , @City
                                , @StateProvinceName
                                , @PostalCode
                                , @CountryRegionName
                                )
                    END
                FETCH NEXT FROM curSWA INTO @BusinessEntityID, @Name, @AddressType, @AddressLine1,
                    @AddressLine2, @City, @StateProvinceName, @PostalCode,
                    @CountryRegionName ;
            END

        DECLARE @EndTime datetime2 = SYSDATETIME()
  DECLARE @Duration INT = DATEDIFF(millisecond,@StartTime,@EndTime)
  INSERT INTO dbo.TestTimes (CallType, Duration, TestTime)
  VALUES (OBJECT_NAME(@@PROCID),@Duration, SYSDATETIME())
 
        CLOSE curSWA ;
        DEALLOCATE curSWA ;

        DROP TABLE vSWA ;

    END
GO
CREATE PROC [dbo].[Cursor_KeySetTest]
AS 
    BEGIN
  SET NOCOUNT ON;

        SELECT TOP 0
                *
        INTO    vSWA
        FROM    Sales.vStoreWithAddresses ;
 
        DECLARE curSWA CURSOR LOCAL KEYSET FOR
        SELECT  BusinessEntityID
                , Name
                , AddressType
                , AddressLine1
                , AddressLine2
                , City
                , StateProvinceName
                , PostalCode
                , CountryRegionName FROM Sales.vStoreWithAddresses ;
 
        DECLARE @BusinessEntityID INT
          , @Name Name
          , @AddressType Name
          , @AddressLine1 NVARCHAR(60)
          , @AddressLine2 NVARCHAR(60)
          , @City NVARCHAR(30)
          , @StateProvinceName Name
          , @PostalCode NVARCHAR(15)
          , @CountryRegionName Name ;
 
        OPEN curSWA
 
        DECLARE @StartTime datetime2 = SYSDATETIME();

        FETCH NEXT FROM curSWA INTO @BusinessEntityID, @Name, @AddressType, @AddressLine1,
                    @AddressLine2, @City, @StateProvinceName, @PostalCode,
                    @CountryRegionName ;
        WHILE ( @@fetch_status <> -1 ) 
            BEGIN
                IF ( @@fetch_status <> -2 ) 
                    BEGIN
                        INSERT  INTO vSWA
                                ( BusinessEntityID
                                , Name
                                , AddressType
                                , AddressLine1
                                , AddressLine2
                                , City
                                , StateProvinceName
                                , PostalCode
                                , CountryRegionName
                                )
                        VALUES  ( @BusinessEntityID
                                , @Name
                                , @AddressType
                                , @AddressLine1
                                , @AddressLine2
                                , @City
                                , @StateProvinceName
                                , @PostalCode
                                , @CountryRegionName
                                )
                    END
                FETCH NEXT FROM curSWA INTO @BusinessEntityID, @Name, @AddressType, @AddressLine1,
                    @AddressLine2, @City, @StateProvinceName, @PostalCode,
                    @CountryRegionName ;
            END

        DECLARE @EndTime datetime2 = SYSDATETIME()
  DECLARE @Duration INT = DATEDIFF(millisecond,@StartTime,@EndTime)
  INSERT INTO dbo.TestTimes (CallType, Duration, TestTime)
  VALUES (OBJECT_NAME(@@PROCID),@Duration, SYSDATETIME())
 
        CLOSE curSWA ;
        DEALLOCATE curSWA ;

 
        DROP TABLE vSWA ;

    END
GO
CREATE TABLE dbo.TestTimes
(
 ID INT IDENTITY(1,1) PRIMARY KEY
 ,CallType VARCHAR(150)
 ,Duration INT
 ,TestTime DateTime2
)
GO

Tuesday, August 16, 2016

STRING_SPLIT() vs User Defined Table Types - Conclusion

I don't think I have tested for every scenario possible, but it is better than only a single test. With that being said, I'm a little surprised at the result. I thought that User Defined Table Type would win in every scenario but it is clear that in some cases STRING_SPLIT() function is actually faster. Just like most other things in SQL Server the answer is: It Depends!

In the case of STRING_SPLIT() function Engineers at Microsoft did amazing job optimizing it over other custom solutions that people had to resort to in the past. Of course there are limitations with it, that need to be acknowledged and taken into consideration when designing a solution. In my opinion this could be quick go to solution that is great for initial testing but needs to be heavily tested when it comes to final solution for production environment.

Just like I mentioned in previous post, estimated row count always shows 50 rows and value returned from the function is a string and needs to be converted to target type. This did not prove to be large issue in my testing but in complex execution plans it could throw query optimizer into wrong plan when getting data from other tables which could cause major slow downs.

In my test scenarios outlined below, User Defined Table Type Won 4 out of 6 times. I'm sure I could have setup other scenarios where it would lose every time but in this case it won. Leave comments if you want to see full source code or want me to run any other scenarios. Thanks for reading!

 
SELECT   a.*
        , CASE WHEN ( MAX(a.AvgDuration) OVER ( PARTITION BY RowCnt ) ) = AvgDuration 
  THEN 'LOOSER' ELSE 'WINNER' END AS RoundDecision
  , ROW_NUMBER() OVER ( ORDER BY RowCnt, SPName ) AS RowNumber
FROM     (
            SELECT    SPName
                    , RowCnt
                    , COUNT(*) AS NumberOfExecutions
                    , MIN(Duration) AS MinDuration
                    , AVG(Duration) AS AvgDuration
                    , MAX(duration) AS MaxDuration
            FROM      dbo.SPExecutionTime
            GROUP BY  SPName
                    , RowCnt
        ) a
ORDER BY RowNumber

Monday, August 15, 2016

STRING_SPLIT() vs User Defined Table Types - Round 1

During SQL Saturday Indianapolis I heard about new function introduced in SQL Server 2016 STRING_SPLIT(). Presenter showed few examples of how to use it and mentioned that it is faster than other implementations of what people had to previously done. This got me excited and I wanted to see how it compares to User Defined Table Types which has always been my go to replacement for string splitting.

This post is not about comparing new function to other function, Aaron Bertrand already did a great post comparing it to other solutions. IT can be found here -> Performance Surprises and Assumptions : STRING_SPLIT(). What I wanted to do is compare it to User Defined Table Types because it always been faster (at least this is what I think about it).

Now unto test scenario. First time I had to replace string splitting function with User Defined Table Typed Parameter was when I had to rewrite stored procedure that accepted a list of 5000 items inside varchar(max) parameter. With that in mind I decided to test new function the same way. Pass a huge string, split it inside stored procedure and return result set back to .NET application. For this test I created two stored procedures running against Wide World Importers Test Database.

 
 /*
 ==============================================================================================================================
 Author:  Vlady Oselsky
 Create date: 08/15/2016
 Description:   Testing String Split Function
==============================================================================================================================
*/
CREATE PROCEDURE [dbo].[Invoices_ByTableType]
(
 @List IdList READONLY
)
AS
BEGIN
 SET NOCOUNT ON;
 DECLARE @StartTime datetime2 = SYSDATETIME();
 DECLARE @RwCnt int;

 select iv.SalespersonPersonID
  ,iv.InvoiceDate
  ,iv.DeliveryInstructions
 from WideWorldImporters.Sales.Invoices as iv
 join @List as l
  on iv.InvoiceID = l.ID
 order by 1

 SET @RwCnt = @@ROWCOUNT
 DECLARE @EndTime Datetime2 = Sysdatetime();
 INSERT INTO SPExecutionTime (SPName,RowCnt, Duration)
 VALUES(OBJECT_NAME(@@PROCID),@rwCnt, DATEDIFF(MICROSECOND, @StartTime, @EndTime))
END
GO
 /*
 ==============================================================================================================================
 Author:  Vlady Oselsky
 Create date: 08/15/2016
 Description:   Testing String Split Function
==============================================================================================================================
*/
CREATE PROCEDURE [dbo].[Invoices_ByStringList]
(
 @List VARCHAR(MAX)
)
AS
BEGIN
 SET NOCOUNT ON;
 DECLARE @StartTime datetime2 = SYSDATETIME();
 DECLARE @RwCnt int;

 select iv.SalespersonPersonID
  ,iv.InvoiceDate
  ,iv.DeliveryInstructions
 from WideWorldImporters.Sales.Invoices as iv
 join string_split(@List, ',') as l
  on iv.InvoiceID = l.value
 order by 1
 
 SET @RwCnt = @@ROWCOUNT
 DECLARE @EndTime Datetime2 = Sysdatetime();
 INSERT INTO SPExecutionTime (SPName,RowCnt, Duration)
 VALUES(OBJECT_NAME(@@PROCID),@rwCnt, DATEDIFF(MICROSECOND, @StartTime, @EndTime))
END

Below is the script for user defined table type and table to store results on SQL Side. I done this in order to compare SQL duration to total duration on .NET side

 
CREATE TYPE [dbo].[IdList] AS TABLE( [ID] [int] NULL )
GO
CREATE TABLE dbo.SPExecutionTime
(
 ID INT IDENTITY(1,1) PRIMARY KEY
 ,SPName sysname
 ,RowCnt INT
 ,Duration INT
)
GO

Now onto .NET C# code. To be fair I wanted to pass entire value on .NET side instead of trying to build it just before calling stored procedure. Below is the code for calling TableType stored procedure, as you can see instead of having VarChar type, I have Structured type which accepts IEnumerable variable. In my case I'm passing DataTable.

 
stopwatch.Start(); //Time starts here
DataTable table = new DataTable();
using (var con = new SqlConnection(connectionString))
{
    con.Open();
    var cmd = new SqlCommand();
    cmd.Connection = con;
    cmd.CommandText = "Invoices_ByTableType";
    cmd.CommandType = CommandType.StoredProcedure;
    cmd.Parameters.Add("@List", SqlDbType.Structured, -1).Value = hugeListTable;

    using (var da = new SqlDataAdapter(cmd))
    {
        da.Fill(table);
    }

}
dataGridView2.DataSource = table;
stopwatch.Stop(); //Time ends Here

Next comes a call to stored procedure with VARCHAR(MAX).

 
stopwatch.Start();
DataTable table = new DataTable();
            
using (var con = new SqlConnection(connectionString))
{
    con.Open();
    var cmd = new SqlCommand();
    cmd.Connection = con;
    cmd.CommandText = "Invoices_ByStringList";
    cmd.CommandType = CommandType.StoredProcedure;
    cmd.Parameters.Add("@List", SqlDbType.VarChar, -1).Value = hugeListString;

    using (var da = new SqlDataAdapter(cmd))
    {
        da.Fill(table);
    }

}
dataGridView1.DataSource = table;
stopwatch.Stop();

On front end I setup simple interface with two buttons and a way to aggregate data together. After running each stored procedure 50 times. I came up with the following result.

Invoices_ByStringList - 112.82 millisecond
Invoices_ByTableType - 79.4 millisecond

Conclusion: Round 1 goes to User Defined Table Type. One thing I noticed when looking at execution plans. I saw that string_split() function always estimates 50 rows, whereas when calling with user defined table type estimated row count matches actual. Additionally there is implicit conversion because string returned from split function and I'm joining it to integer column. Next I want to make few other adjustment to stored procedures and front end to try to call it with different number of rows to see how it changes the outcome.

Update:

Thursday, August 11, 2016

Speaking at dev-data DAY

I have another opportunity to share my T-SQL knowledge. This time, it will be in Birmingham, AL at dev-data DAY on August 20. This will be a free conference with SQL, .NET, Powershell/Azure and Professional Development sessions. I'm happy to present "Crash Course on Better SQL Development" session which includes elements of T-SQL coding, error handling, a look at execution plans and finally exploring some capabilities of SSDT (SQL Server Data Tools).

This year it will by my 5th time speaking at local events. Each new event brings a new perspective into local user group communities and different knowledge that technology professionals bring to the market. For me, speaking became a way to share and at the same polish my own knowledge. Depending on experiences and knowledge of each group, there are different questions which lead a session on a slightly different path. Presenting same content takes different shape with every new group of people.

For me, one of the biggest challenges that I had to overcome in order to become a speaker was to realize that it is perfectly fine to admit that I don't know an answer to every question. Whenever a difficult question is asked, I do my best to answer based on my knowledge at that time and take a time to research it after the fact to make myself prepared for it next time. I'm not an expert in any given field and will never claim to be one. I like to be Jack of all trades. Being diverse in my knowledge helps me to approach problems from a different perspective and think of solutions from different angles.

More information about event DevDataDay

Thursday, July 14, 2016

Rename Database

Today was the proof that I have a lot to learn about DBA tasks. I took on a somewhat simple task of renaming a database that turned into a 30-minute ordeal. Hopefully, this post will save you some time by avoiding issues that I ran into. Below I outlined one of the possible ways of getting this task done. Please note that this can result in outage and should not be done in production (live) environment, unless you are confident in what you are doing and it is done during a maintenance window.

Step 1
Before we can rename DB we need to identify Logical and Physical file names. I recommend copying result to notepad or somewhere else as it will be easier to paste it into later queries. To get names we can query system table in master DB. In my scenario, my Database Name is "VladTesting"

 
USE [master]
GO
/* Important to get logical and physical file names */

SELECT  name AS [Logical Name]
       ,physical_name AS [DB File Path]
       ,type_desc AS [File Type]
       ,state_desc AS [State]
FROM    sys.master_files
WHERE   database_id = DB_ID(N'VladTesting')
GO

Logical Name	DB File Path	File Type	State
VladTesting	C:\SQL Server Data Files\MSSQL11\Data\VladTesting.mdf	ROWS	ONLINE
VladTesting_log	C:\SQL Server Data Files\MSSQL11\Log\VladTesting_log.ldf	LOG	ONLINE

Step 2
Now we are ready to start the process. Before logic names can be renamed we need to have exclusive rights to DB.

 
ALTER DATABASE VladTesting 
SET SINGLE_USER WITH ROLLBACK IMMEDIATE

Step 3
Now when we execute the following statement it will rename logical file names.

 
ALTER DATABASE [VladTesting] MODIFY FILE 
(NAME=N'VladTesting', NEWNAME=N'VladsAwesomeDB')
GO
ALTER DATABASE [VladTesting] MODIFY FILE 
(NAME=N'VladTesting_log', NEWNAME=N'VladsAwesomeDB')
GO

Step 4
At this point we have several options, the following is one of the easiest. We need to DETACH database, rename files and than attach it with new name later.
Another option is to take DB Offline and rename files and later rename DB.

 
EXEC master.dbo.sp_detach_db  @dbname = N'VladTesting'
GO

Step 5
Now to rename the files it can be done from SSMS with TSQL or directly on file system. NOTE: When I ran the following command it messed up file permissions, this should be used with caution. It is safer to ask System Admins or someone else with proper file system permissions to rename files directly on file system, then it is possible to shift the blame if something goes wrong (just kidding).
Most of the time XP_CMDSHELL is not enable as it is considered dangerous, to enable it, just run the following code.

 
EXEC sp_configure 'show advanced options' ,1;   
GO  
RECONFIGURE;  
GO  
EXEC sp_configure 'xp_cmdshell' ,1;  
GO  
RECONFIGURE;  
GO

Step 6
Now to actually rename the files, we simply provide old file name with path and new filename to the XP_CMDSHELL command.
In order to execute rename command we first need to enable XP_CMDSHELL if it is not already enabled.

 
EXEC xp_cmdshell 'RENAME "C:\SQL Server Data Files\MSSQL11\Data\VladTesting.mdf", "NewDatabaseName.mdf"'
GO
EXEC xp_cmdshell  'RENAME "C:\SQL Server Data Files\MSSQL11\Log\VladTesting_log.ldf", "NewDatabaseName_log.ldf"'
GO

Step 7
Now for the last step. We attach DB with new name and new physical file names that we just renamed.

 
CREATE DATABASE VladsAwesomeDB ON 
( FILENAME = N'C:\SQL Server Data Files\MSSQL11\Data\VladsAwesomeDB.mdf' ),
( FILENAME = N'C:\SQL Server Data Files\MSSQL11\Log\VladsAwesomeDB_log.ldf' ) FOR ATTACH
GO
ALTER DATABASE VladsAwesomeDB SET MULTI_USER 
GO

Step 8 - Housekeeping
Since xp_cmdshell considered dangerous it is best to disable before we forget that it had to be enabled. First command will disable xp_cmdshell and second command will hide advanced options.

 
EXEC sp_configure 'xp_cmdshell' ,0;  
GO  
RECONFIGURE;  
GO 
EXEC sp_configure 'show advanced options' ,0;   
GO  
RECONFIGURE;  
GO

So now you know how to rename database. Usually this should only be done when there is specific business reason to do so. If command sp_renamedb or ALTER DATABASE Modify Name is used, it does not rename physical or logical files which can cause issues later when someone tries to create new database with same name as last one or someone does restore of backup without changing options.

Thursday, July 7, 2016

SQL Joins - Basics Part 1

This topic takes me back to my first SQL Server database class I had in college. Back then it was extremely difficult subject for me. Partly because I have worked very little with databases and secondary, because of the way it was presented. Taking all that into account let's try to attack it in most basic manner. SQL JOIN in its basic form is just a way to connect two tables together. There are several types of JOINs that can be defined: LEFT, RIGHT, FULL, INNER, CROSS. There are time and place for each one, but most of the time I can get away with using only LEFT and INNER.

Its is great to talk about JOINs, but I'm sure that there are other people who like me don't learn but hearing, they learn by seeing and trying it them self. To do that, we need to create two basic tables and populate it with some data.

First, let us create two tables to store data for our tests. Since the concept of student and teacher is most easily understood, I'm creating Teacher table and Student table with a key to link back to Teacher Table.


CREATE TABLE [dbo].[Student]
(
 [ID] [int] NULL
 ,[FullName] [varchar](50) NULL
 ,[TeacherID] [int] NULL
)

GO

CREATE TABLE [dbo].[Teacher]
(
 [ID] [int] NULL
 ,[FullName] [varchar](50) NULL
)

Now to load some data, I generated insert script with random names from the following website.

 
INSERT INTO Teacher (ID, FullName)
VALUES  (1, 'Roy Chad')
  ,(3, 'Dudley Goddard')
  ,(7, 'Raphael Philander')
  ,(5, 'Sonnie Davin')

INSERT INTO Student (ID, FullName,TeacherID)
VALUES  (1,'Jepson Balfour',1)
  ,(2,'Milburn Brett',1)
  ,(4,'Clinton Schuyler',3)
  ,(5,'Norbert Kemp',3)
  ,(7,'Meriwether Kennedy',7)
  ,(8,'Braith Cornelius',7)
  ,(10,'Dion Hayden',9)

Now that we got some data loaded into tables let the JOINing commence. First will look at 'INNER' JOIN


SELECT  t.FullName AS TeacherFullName
       ,s.FullName AS StudentFullName
FROM    Teacher t
INNER JOIN Student s
        ON s.TeacherID = t.ID

INNER JOIN throws out all records that do not match. In our case Teacher "Sonnie Davin" and Student "Dion Hayden" do not appear in result because there was no way to connect those records.

TeacherFullName	StudentFullName
Roy Chad	Jepson Balfour
Roy Chad	Milburn Brett
Dudley Goddard	Clinton Schuylerr
Dudley Goddard	Norbert Kemp
Raphael Philander	Meriwether Kennedy
Raphael Philander	Braith Cornelius

LEFT OUTER JOIN, returns all records from main table and attempts to match records from secondary table.


SELECT  t.FullName AS TeacherFullName
       ,s.FullName AS StudentFullName
FROM    Teacher t
LEFT OUTER JOIN Student s
        ON s.TeacherID = t.ID

As seen in this result list, Teacher "Sonnie Davin" appears in the list but does not have any student records associated. LEFT JOIN is most usefull when ever you not sure if you all records from main table are matched in secondary table. Almost every time I write query I use LEFT JOIN the first I'm JOINing tables together to know for sure that I'm not excluding any records that I wanted to include.

TeacherFullName	StudentFullName
Roy Chad	Jepson Balfour
Roy Chad	Milburn Brett
Dudley Goddard	Clinton Schuylerr
Dudley Goddard	Norbert Kemp
Raphael Philander	Meriwether Kennedy
Raphael Philander	Braith Cornelius
Sonnie Davin	NULL

RIGHT OUTER JOIN returns all records from RIGHT table and matches records from left table. The following two queries can be written as RIGHT or LEFT join with same results.


SELECT  t.FullName AS TeacherFullName
       ,s.FullName AS StudentFullName
FROM    Teacher t
RIGHT OUTER JOIN Student s
        ON s.TeacherID = t.ID

SELECT  t.FullName AS TeacherFullName
       ,s.FullName AS StudentFullName
FROM    Student s
LEFT OUTER JOIN Teacher t
        ON s.TeacherID = t.ID

In this result set because we reversed the condition of LEFT JOIN above, we can see record from Student table "Dion Hayden" who is not matched with a teacher record.

TeacherFullName	StudentFullName
Roy Chad	Jepson Balfour
Roy Chad	Milburn Brett
Dudley Goddard	Clinton Schuylerr
Dudley Goddard	Norbert Kemp
Raphael Philander	Meriwether Kennedy
Raphael Philander	Braith Cornelius
NULL	Dion Hayden

FULL OUTER JOIN attempts to match records from both tables just like INNER JOIN and also returns all additional records that do not match from ether table.


SELECT  t.FullName AS TeacherFullName
       ,s.FullName AS StudentFullName
FROM    Teacher t
FULL OUTER JOIN Student s
        ON s.TeacherID = t.ID

In the this result set we have all records that we had from INNER JOIN, plus extra record from LEFT JOIN and one record from RIGHT JOIN. Teacher "Sonnie Davin" appears in result set without student and Student "Dion Hayden" also appears in result set without teach. This join is most usefull when need to get all records no matter if match exists or not.

TeacherFullName	StudentFullName
Roy Chad	Jepson Balfour
Roy Chad	Milburn Brett
Dudley Goddard	Clinton Schuylerr
Dudley Goddard	Norbert Kemp
Raphael Philander	Meriwether Kennedy
Raphael Philander	Braith Cornelius
Sonnie Davin	NULL
NULL	Dion Hayden

CROSS JOIN, returns all records from main table and matches them to every record from secondary table.


SELECT  t.FullName AS TeacherFullName
       ,s.FullName AS StudentFullName
FROM    Teacher t
CROSS JOIN Student s

What we get back is the following result set that has every record from both table. Since no condition has be specified every record appears in result set. Number of records in result set will always be equal to Number of records in TableA multiple by number of record in TableB. In our case we had 4 Teachers and 7 Students therefore we got 28 records in result. But what if we had 4 million teachers and 70 million students. This JOIN has worst performance and will require most memory to complete the operation.

TeacherFullName	StudentFullName
Roy Chad	Jepson Balfour
Roy Chad	Milburn Brett
Roy Chad	Clinton Schuylerr
Roy Chad	Norbert Kemp
Roy Chad	Meriwether Kennedy
Roy Chad	Braith Cornelius
Roy Chad	Dion Hayden
Dudley Goddard	Jepson Balfour
Dudley Goddard	Milburn Brett
Dudley Goddard	Clinton Schuylerr
Dudley Goddard	Norbert Kemp
Dudley Goddard	Meriwether Kennedy
Dudley Goddard	Braith Cornelius
Dudley Goddard	Dion Hayden
Raphael Philander	Jepson Balfour
Raphael Philander	Milburn Brett
Raphael Philander	Clinton Schuylerr
Raphael Philander	Norbert Kemp
Raphael Philander	Meriwether Kennedy
Raphael Philander	Braith Cornelius
Raphael Philander	Dion Hayden
Sonnie Davin	Jepson Balfour
Sonnie Davin	Milburn Brett
Sonnie Davin	Clinton Schuylerr
Sonnie Davin	Norbert Kemp
Sonnie Davin	Meriwether Kennedy
Sonnie Davin	Braith Cornelius
Sonnie Davin	Dion Hayden

This concludes the first look at logical JOIN operations. Just like I mentioned in beginning of the post, just about every time I write query it ends up being LEFT or INNER JOIN. Do I ever had to use others? Yes, but with specific business cases.

Please leave a comment with questions and/or feedback on post.

Subscribe to: Posts ( Atom )