A couple of weeks, my friends asked me the best suitable approach to move data out a large table. I was interested in this question because there was no right answers. Depending on the situation, environment even database structure to determine what approaches.

Actually, there are a lot of techniques you can apply how to delete data quickly:

  1. Design database structure effectively
  2. Use TRUNCATE instead of DELETE if you want to delete whole table
  3. Try to narrow data what you want to delete and create indexes on columns to filter in data
  4. Try to prevent logging by log backup
  5. Move out data to a temp table what you don’t want to delete, then truncate the table then insert data back
  6. ….

However, I would like to present one of popular techniques that we usually delete data from a large table. This is to delete data by multiple batches instead of a single run.

Environment

SQL Server 2014 Enterprise

Window 10

RAM: 8GB

CPU: Core i7, 1 socket, 2 cores and 4 logical cores

Download AdventureWorks2014 database

Step 1: I created a database named Delete_LargeTable in FULL recovery mode.

USE [master]
GO
CREATE DATABASE [Delete_LargeTable]
 CONTAINMENT = NONE
 ON PRIMARY 
( NAME = N'Delete_LargeTable', FILENAME = N'D:\DataSQL2014\Delete_LargeTable.mdf' , SIZE = 1GB , MAXSIZE = UNLIMITED, FILEGROWTH = 100MB )
 LOG ON 
( NAME = N'Delete_LargeTable_log', FILENAME = N'E:\MSSQL\Log\Delete_LargeTable_log.ldf' , SIZE = 100MB , MAXSIZE = 2048GB , FILEGROWTH = 10MB)
GO

Step 2: inserted around 4 million rows that extracted Order data from AdventureWorks2014. I tried to use the script that was implemented by Jonathan M. Kehayias at SQLskills.com. You can also download it here.

 

Step 3: The query to see distribution data

SELECT TOP 3 ProductID, ProductCount = COUNT(*)
FROM Sales.SalesOrderDetailEnlarged
GROUP BY ProductID
ORDER BY ProductCount DESC

Step 4: After I created the database, I tried to initialize the log size back to 100MB by using the DETACH / ATTACH database and checked the data file and log file

SELECT name, size/128.0 FileSizeInMB,
size/128.0 - CAST(FILEPROPERTY(name, 'SpaceUsed') AS int)/128.0 
 AS EmptySpaceInMB
FROM sys.database_files;

Name                                      FileSizeInMB                           EmptySpaceInMB
Delete_LargeTable             1224.000000                           531.500000
Delete_LargeTable_log     0.492187                                   0.085937

Step 5: backed up database before deleting data

BACKUP DATABASE [Delete_LargeTable] TO DISK ='D:\MSSQL\Backup\Delete_LargeTable.bak' WITH COMPRESSION;

OK, we are ready for testing the performance between a single run and multiple batches

IMPLEMENTATION

Implemented the script to delete data. As I mentioned above, the common rules is to try  to delete data by multiple batches instead a single run

-- Delete a single run
 DELETE Sales.SalesOrderDetailEnlarged
 WHERE ProductID IN (712, 870, 873);

-- Delete in multiple batches
SET NOCOUNT ON;
GO
DECLARE @r INT;
SET @r = 1;
WHILE @r > 0
BEGIN
BEGIN TRANSACTION;
 
 DELETE TOP (100000) -- this will change
 Sales.SalesOrderDetailEnlarged
 WHERE ProductID IN (712, 870, 873);
 SET @r = @@ROWCOUNT;
 COMMIT TRANSACTION;
 -- CHECKPOINT; -- if simple
 -- BACKUP LOG ... -- if full
END
GO
SET NOCOUNT OFF;

DATABASE RECOVERY MODE is SIMPLE

Case 1

Deleted data in a single run, RECOVERY MODE is SIMPLE

Number of deleted rows: 468384 row(s) affected

Duration: 10 seconds

Log file size: 214.5 MB

Case 2

Deleted data chunks, RECOVERY MODE is SIMPLE and without log backup

2.1 Deleted data every 500,000 rows

Duration: 9 seconds

Log file size: 214.5 MB

2.2 Deleted data every 250,000 rows

Duration: 7 seconds

Log file size: 146,44 MB

2.3 Deleted data every 100,000 rows

Duration: 5 seconds

Log file size: 100 MB

Case 3

Deleted data chunks, RECOVERY MODE is SIMPLE and with log backup

2.1 Deleted data every 500,000 rows

Duration: 15 seconds

Log file size: 214.5 MB

2.2 Deleted data every 250,000 rows

Duration: 24 seconds

Log file size: 110 MB

2.3 Deleted data every 100,000 rows

Duration: 15 seconds

Log file size: 100 MB

DATABASE RECOVERY MODE is FULL

ALTER DATABASE [Delete_LargeTable] SET RECOVERY FULL

Case 4

Deleted data in a single run, RECOVERY MODE is FULL

Duration: 15 seconds

Log file size: 214.5 MB

Case 5

Deleted data chunks, RECOVERY MODE is FULL and without log backup

2.1 Deleted data every 500,000 rows

Duration: 9 seconds

Log file size: 214.5 MB

2.2 Deleted data every 250,000 rows

Duration: 6 seconds

Log file size: 133 MB

2.3 Deleted data every 100,000 rows

Duration: 15 seconds

Log file size: 214.5 MB

Case 6

Deletde data chunks, RECOVERY MODE is FULL  and with log backup

2.1 Deleted data every 500,000 rows

Duration: 1 seconds

Log file size: 214.5 MB

2.2 Deleted data every 250,000 rows

Duration: 15 seconds

Log file size: 110 MB

2.3 Deleted data every 100,000 rows

Duration: 15 seconds

Log file size: 100 MB

 

 

Conclusion

On both SIMPLE and FULL mode, 100K option seems the best performance on both duration and log file size. Obviously, a single batch still works well on a small dataset but I believe in you should not use this approach in case your data is large (million rows).

However, the testing data just more than 4 million rows and my database contains one data file and one log file so what will happen if data is bigger and bigger and you store them in complex structure? And how many rows per batch? The final answer is try to test it on your system and then you will get the best threshold that will meet your requirement.

 

 

 

 

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s