Introduction

From SQL Server 2012, Microsoft released one of the most important features for querying a large dataset COLUMNSTORE INDEXES.

The columnstore index is the standard for storing and querying large data warehousing fact tables. It uses column-based data storage and query processing to achieve up to 10x query performance gains in your data warehouse over traditional row-oriented storage, and up to 10x data compression over the uncompressed data size.

In this article, I would like to demonstrate how this feature improve performance when I query a large data?

Environment

SQL Server 2016 Developer

Window 10

RAM: 8GB

CPU: Core i7, 1 socket, 2 cores and 4 logical cores

Download AdventureWorks2014 database

Step 1: I created a database named LargeDatabase in FULL recovery mode.

USE [master]
GO
CREATE DATABASE [LargeDatabase]
 CONTAINMENT = NONE
 ON PRIMARY 
( NAME = 'LargeDatabase ', FILENAME = N'D:\DataSQL2014\LargeDatabase .mdf' , SIZE = 1GB , MAXSIZE = UNLIMITED, FILEGROWTH = 100MB )
 LOG ON 
( NAME = N'LargeDatabase _log', FILENAME = N'E:\MSSQL\Log\LargeDatabase _log.ldf' , SIZE = 100MB , MAXSIZE = 2048GB , FILEGROWTH = 10MB)
GO

Step 2: inserted around 4 million rows that extracted Order data from AdventureWorks2014. I tried to use the script that implemented by Jonathan M. Kehayias at SQLskills.com. You can also download it here.

Testing

Case 1 : queried table [dbo].[SalesOrderDetailEnlarged] where ProductID IN (712, 870, 873). I also created a clustered index on [SalesOrderID],[SalesOrderDetailID] columns. Before running this query, I restarted SQL Server service on my laptop to ensure that SQL Buffer was refreshed.

DBCC DROPCLEANBUFFERS;
DBCC FREEPROCCACHE;
GO
SET STATISTICS IO ON;
SET STATISTICS TIME ON;
GO
SELECT ProductID,SUM(LineTotal)
FROM [Sales].[SalesOrderDetailEnlarged]
GROUP BY ProductID
GO
SET STATISTICS TIME OFF;
SET STATISTICS IO OFF;

(266 row(s) affected)

(266 row(s) affected)
Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘Workfile’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘SalesOrderDetailEnlarged’. Scan count 5, logical reads 50861, physical reads 1, read-ahead reads 50681, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

SQL Server Execution Times:
CPU time = 5422 ms, elapsed time = 3011 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.

Case 2: removed clustered index and then created a clustered columnstore index

IF EXISTS (SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID(N'[Sales].[SalesOrderDetailEnlarged]’) AND name = N’PK_SalesOrderDetailEnlarged_SalesOrderID_SalesOrderDetailID’)
ALTER TABLE [Sales].[SalesOrderDetailEnlarged] DROP CONSTRAINT [PK_SalesOrderDetailEnlarged_SalesOrderID_SalesOrderDetailID]
GO
CREATE CLUSTERED COLUMNSTORE INDEX [PK_SalesOrderDetailEnlarged_SalesOrderID_SalesOrderDetailID] ON [Sales].[SalesOrderDetailEnlarged]

Restarted the SQL Server service and re-executed the above query

DBCC DROPCLEANBUFFERS;
DBCC FREEPROCCACHE;
GO
SET STATISTICS IO ON;
SET STATISTICS TIME ON;
GO
SELECT ProductID,SUM(LineTotal)
FROM [Sales].[SalesOrderDetailEnlarged]
GROUP BY ProductID
GO
SET STATISTICS TIME OFF;
SET STATISTICS IO OFF;

(266 row(s) affected)
Table ‘SalesOrderDetailEnlarged’. Scan count 4, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 1071, lob physical reads 3, lob read-ahead reads 1798.
Table ‘SalesOrderDetailEnlarged’. Segment reads 7, segment skipped 0.
Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

SQL Server Execution Times:
CPU time = 282 ms, elapsed time = 349 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.

Comparison

Obviously, there were a big difference between CLUSTERED COLUMNSTORE INDEX and CLUSTERED INDEX in my examples.

COLUMNSTORE INDEX

COLUMNSTORE INDEX executed better than the traditional approach (ROWSTORE).

CONCLUSION

COLUMNSTORE INDEX is a great feature that helps improve performance a lot in data-warehouse environment. It’s worthy for us to investigate how it can improve our data warehouse operations in case we are encountering any performance issue.

Microsoft also released this feature on SQL Server 2016 Standard Edition. It is a good news if we have not cost enough for working on other editions.Beside, this feature can be applied on an OLTP workload… a big change from Microsoft in my view. Let’s start with it on your DEV environment to see how it works?

Advertisements

One thought on “How does COLUMNSTORE INDEXES improve performance of Data Warehouse?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s