Problem

My client sends data in an Excel file to me to load them into Staging database. The file contains many sheets that they have same structure. In this article, I would like to share the simple way to load data of multiple sheets by using SSIS.

Solution

Source Files

For demonstration purpose, I have an Excel file which is containing sales data monthly. Each sheet contains data per month.

Development environment

  • SQL Server 2014 Developer Edition ( or SQL Server 2014 Express)
  • SQL Server Data Tools for Visual Studio 2013
  • Excel 2010 (32 bits)

If you could not install any Excel instance of Microsoft, you can download Microsoft Access Database Engine from https://www.microsoft.com/en-us/download/details.aspx?id=13255 (depend on your environment 32bits/ 64 bits)

For SQL Server and SQL Server Data Tools, you can download them from https://msdn.microsoft.com/evalcenter

Develop SSIS package

Step 1 – Create a folder to store Excel file

Source_Folder.PNG

Step 2 – Create Excel example file

The Excel file contain sales data from October to December in 2015 and there are 3 sheets of three months. They have same structure.
Source_File.PNG

Step 3 – Create a staging table in SQL Database Server

USE Demo
GO
CREATE TABLE Stg_SalesDataInMutipleSheets
(
TransactionDate date,
ProductCategory nvarchar(50),
ProductSubCategory nvarchar(50),
ProductName nvarchar(100),
Price decimal,
Quantity int,
TotalAmount decimal
)

Select all Open in new window

Step 4 – Create SSIS package

4.1 Create a new SSIS project and then a new SSIS loader with name AA_LoadSalesData_Monthly
AA_LoadData_Project.PNG
4.2 Create variables for the package
Because we need to pass the folder path / file name dynamically, we need to create some variables:

OLEDBConnectionString: connection string to SQL Database Server
Value: Data Source=HSSSC1PCL01198\SQLSERVER2014; Initial Catalog=Demo;Provider=SQLNCLI11.1;Integrated Security=SSPI;
Data Source: database server name
Initial Catalog: database name
Provider: the driver to connect to SQL Database Server. You need to change the value to meet your environment. I use SQLNCLI11.1 because SQL Server Engine is 2014. If your version is lower than SQL 2012 version, it should be 10.0 / 10.1
Integrated Security: means that I use Window Authentication mode while connection to SQL Database Server.

SourcePath: path to folder which contains our Excel files
Value: D:\SSIS-Practice\Load_Multiple_Sheets\Source_Files

ExcelFile: Excel file name. By default, we assign path of example file.
Value: D:\SSIS-Practice\Load_Multiple_Sheets\Source_Files\AA_Sales_Data_2015.xlsx

SheetName: contains name of each sheet in Excel file. When SSIS package reads Excel file, it will scan every sheet and get sheet name, and then assign the value to this variable. Assign Sheet1$ by default.
Variables.PNG
4.3 Create Connection Configurations
SSIS package uses the connection configurations to make connection, which we create them in Connection Manager pane.
Create OLEDB Connection and assign OLEDBConnectionString variable to ConnectionString property.
OLEDBConnectionString.png
Create Excel Connection and assign ExcelFile variable to ServerName property
ExcelFile1.PNG
Excel_ServerName.PNG

4.3 Define control flow
4.3.1 Drag and drop Execute SQL Task to truncate table Stg_SalesDataInMutipleSheets
Truncate_Staging_Table.PNG
4.3.2 Drag and drop Foreach Loop Container component. Double click to open Foreach Loop editor.
On Collection

Enumerator: Foreach ADO.NET Schema Rowset Enumerator. This mode helps us loop through all sheets in Excel file.
Connection: we select new connection

Foreach_Connection.png
Configure ADO.NET Connection Manager window appears, then we click on New button. Connection Manager window appears, select Microsoft Office 12.0 Access Database Engine OLE DB Provider. Because the Excel file is 2007 (xlsx format), we need to select this option. If Excel is lower version such as 2003 Excel, we select Microsoft Jet 4.0 OLE DB Provider.
Foreach_Connection1.png
Enter path of Excel example file for Server or file name 

Foreach_Connection2.pngClick on All tab, then enter EXCEL 12.0 value for Extend Properties. And then click on Test Connection to make sure that SSIS can connect to Excel file. In case, you select Microsoft Jet 4.9 OLE DB Provider, please enter EXCEL 8.0 instead of EXCEL 12.0
Foreach_Connection3.png
Click OK to back Foreach Loop Editor window. Select Tables for Schema
Foreach_Connection4.png

On Variable Mapping, we have setup the variable mapping with SheetName variable to index 2Foreach_Connection5.png

On Connection Manage pane, a new connection is appeared. Change name it as Schema
ConnectionManager.PNG

4.3.3 Verify sheet name
Sometimes, Excel file includes some invalid sheet name so we need additional step to verify before loading data. Remember we name sheet as AA-<month name>
Drag and drop Script Task component into Foreach Loop Container. Double click to open Script Task Editor then select as below
Sheetname_Verify1.PNG
Click on Edit Script button and add the C# code to verify sheet name

public void Main()
{
             // TODO: Add your code here

           Dts.TaskResult = (int)ScriptResults.Success;
           String sheetname = Dts.Variables["User::SheetName"].Value.ToString();
           try
            {
                if (sheetname.Trim().Contains("AA-"))
                {
                    Dts.Variables["User::IsValidateSheetName"].Value = true;
                }
                else
                {
                    Dts.Variables["User::IsValidateSheetName"].Value = false;
                }
            }
            catch
            {
                Dts.Variables["User::IsValidateSheetName"].Value = false;
            }
}

Select all Open in new window


4.4 Define Data Flow
Drag and drop Data Flow Task component into Foreach Loop Container. Data Flow loads data from each sheet to staging table. Connect Task Script with Data Flow and double-click on line connector to open Precedence Constraint Editor. We only load valid sheets base on user define variableIsValidateSheetName
IsValidateSheetName.PNGDouble click on Data Flow Task.

4.4.1 Define Excel Source
Drag and drop Excel Source component. Double click to open Excel Source Editor.
On Connection Manger tab

Excel Connection Manager: ExcelFile
Data Access mode: Table name or view name variable
Variable name: User::SheetName

ExcelSource_Connection.PNG

On Column table: select columns we want to transfer data. In this example, we select all columns.

ExcelSource_Columns.PNG

4.4.2 Define OLE DB Destination
Drag and drop OLE DB Destination component and double-click to open OLE DB Destination Editor
On Connection Manager:

OLE DB connection manager: OLEDBConnectionString
Data access mode: Table or view – fast load
Name of the table or the view: Stg_SalesDataInMutipleSheets

OLEDB_1.PNG

On Mappings: map Input Columns and Destination Columns

OLEDB_Mappings.PNG

Finally, we have SSIS package as below

SSIS_Package.png

Run package

Press F5 to execute SSIS package
Run_SSIS.PNGCheck data to make sure that they are inserted into the table successfully. There are 30 rows in result.
Verify_Data.PNG

Conclusions

The above package is very simple. Obviously, we will have many steps when we implement it in any real project. The purpose is to give you the solution in case you have to load data of multiple sheets of an Excel file. One of critical issues is Excel connection, maybe you will face to it. Make sure that you have Excel installed or you must install Database Engine, which I mentioned earlier.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s