Introduction

A couple of weeks ago, my client requested me to implement a SSIS package that allows them to download their files from a FTP server and archives them. Microsoft SSIS is the powerful tool which allows us to proceed multiple files at same time even we can use SSIS to download files from FTP server.

Approach

To implement this requirement, I used two main SSSI tasks which are FTP Task and Foreach Loop Container
FTP Task
: is used to download files from FTP server
Foreach Loop Container: is used to load each downloaded file

Development environment

SQL Server 2014 Developer
SQL Server Data Tools for Visual Studio 2013

Data files

I created  3 files for Agency, Invoice and Sales data. Each file has 3 columns  (Col1, Col2 and Col3)
Agency
Col1|Col3|Col3
A|1|3
Invoice
Col1|Col3|Col3
A|1|3
Sales
Col1|Col3|Col3
A|1|3

FTP Server

For demonstration purpose, I used DriveHQ service that supports FTP service. I created AAA_ArchivedData folder and then uploaded Agency data,  Invoice data and Sales data.
I suggest you should create a free member account because you use them to configure FTP Task for next step

FTP_Server.PNG

Create staging tables

 

IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[Agency]') AND type in (N'U'))
BEGIN
CREATE TABLE [dbo].[Agency](
	[Col1] [varchar](50) NULL,
	[Col2] [varchar](50) NULL,
	[Col3] [varchar](50) NULL
) ON [PRIMARY]

IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[Invoice]') AND type in (N'U'))
BEGIN
CREATE TABLE [dbo].[Invoice](
	[Col1] [varchar](50) NULL,
	[Col2] [varchar](50) NULL,
	[Col3] [varchar](50) NULL
) ON [PRIMARY]

IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[Sales]') AND type in (N'U'))
BEGIN
CREATE TABLE [dbo].[Sales](
	[Col1] [varchar](50) NULL,
	[Col2] [varchar](50) NULL,
	[Col3] [varchar](50) NULL
) ON [PRIMARY]

Implement SSIS package

 

  1. Create a new package named Load_Multiple_CSV_Files.dtsx
  2. Create variables
    • ArchivedFolder: F:\SSIS-Practice\Load_Multiple_CSV_Files\ArchivedFolder where to store archived files
    • FileConnection
    • FileType
    • OLEDBConnection: connection string to SQL Database Server
      • Value: Data Source=HSSSC1PCL01198\SQLSERVER2014; Initial Catalog=Demo;Provider=SQLNCLI11.1;Integrated Security=SSPI;
      • Data Source: database server name
        Initial Catalog: database name
        Provider: the driver to connect to SQL Database Server. You need to change the value to meet your environment. I use SQLNCLI11.1 because SQL Server Engine is 2014. If your version is lower than SQL 2012 version, it should be 10.0 / 10.1
        Integrated Securitymeans that I use Window Authentication mode while connection to SQL Database Server.
    • SourceFolder: F:\SSIS-Practice\Load_Multiple_CSV_Files\DataFile where to store downloaded files from FTP serverSSIS_Variables.PNG
    • Create connections on Connection Manager panel
      • Create OLEDBConnection and assign OLEDBConnection variable to ConnectionString property
      • Create 3 connections for CSV files ( Agency, Invoice and Sales) and assign FileConnection variable to ConnectionString property
      • Create ArchivedFolder connection and assign ArchivedFolder variable to ConnectionString property
        SSIS_Multiple_CSV_Connection.PNG
  3. Drag and drop FTP Task component
    1. Double click to open FTP Task Editor
      • Create new FTP connection to open FTP Connection Manager Editor
      • Server settings
        • Server name: ftp.drivehq.com
        • Server port: 21 by default. The FTP site listens to this port. If you us another port, you need to make sure that it is not blocked by firewall.
      • Credentials
        • User name: the account that you registered with DriveHQ
        • Password: the password that you registered with DriveHQ
      • Options
        • Check on Use Passive mode: it means that the client side initiates the data connection. If you un-check, it means you use Active mode. In Active mode, the server side initiates the data connection and it is DriveHQ in this case.
    2. Click on Test Connection to make sure you can connect to DriveHQ successfullyFTP_Task_Editor_1.PNG
    3. Click OK to come back FTP Task Editor
      1. Go to File Transfer tab
        • Local Parameters
          • IsLocalPathVariable: True
          • LocalVariable: User::SourceFolder. We specify where to download data files from DriveHQ
        • Operation
          • Operation: Receive files. There are a lot of actions such as Send files, Receive files, Create local directory,… In this case, we want to download the files so we use Receive files
          • IsTransferAscii: True accepts Ascii code
        • Remote Parameters
          • IsRemotePathVariable: False
          • RemotePath: /AAA_ArchivedData/*.txt. The option specifies where we download data files on DriveHQ. The AAA_ArchivedData is created above step and I also uploaded the testing files. *.txtspecifies that we download all files with txt extension
      2. Click OK
  4. Drag and drop Execute SQL  Task component
    • This task is to truncate the tables that created before
  5. Drag and drop Foreach Loop Container component
    1. Double click to open Foreach Loop Editor
      • Go to Collection tab
        • Enumerator: Foreach File Enumerator
        • Add expression Directory: @[User::SourceFolder]
        • Enumerator Configuration
          • Files: *.txt it means to load text files only
          • Check on Fully qualified
        • SSIS_Multiple_CSV_ForeachLoop_1.PNG
      • Go to Variable Mappings
        • User::FileConnection
        • Index = 0
      • Click OK
    2. Drag and drop Script Task component into Foreach Loop and name it as Parse File Name
      1. This task is used to detect what kind of file type (Agency, Invoice or Sales)
      2. Double click to open Script Task Editor
        • ReadOnlyVariables: User::FileConnection
        • ReadWriteVariables: User::FileType
      3. Click Edit script… and enter the following code
        public void Main() { // TODO: Add your code here string filepath = Dts.Variables[“FileConnection”].Value.ToString(); string filename = Path.GetFileName(filepath); if (filename.ToUpper().Contains(“INVOICE”)) { Dts.Variables[“FileType”].Value = “Invoice”; } else if ((filename.ToUpper().Contains(“AGENCY”))) { Dts.Variables[“FileType”].Value = “Agency”; } else if ((filename.ToUpper().Contains(“SALES”))) { Dts.Variables[“FileType”].Value = “Sales”; } else Dts.Variables[“FileType”].Value = “Unknown”; Dts.TaskResult = (int)ScriptResults.Success; }
      4. Click OK
    3. Drag and drop Data Flow Task component and name it as Load Invoice
      • Make a connection between Parse File Name script task and Load Invoice
      • Double click on the connection to open Precedence Constraint Editor
        • Evaluation operation: Expression
        • Expression: @[User::FileType]==”Invoice”
        • Click on Logical OR….SSIS_Multiple_CSV_PrecedenceConstrai.PNG
        • Click OK
      • Double click on Load Invoice data flow
        • Drag and drop Flat File Source component and name it as Invoice File
          • Flat file connection manager: Invoice
          • Columns: all columns
        • Drag and drop OLEDB Destination and name is as Invoice Destination
          • OLEDB connection manger: OLEDBConnection
          • Data access view: Table and view – fast load
          • Name of the table or the view: Invoice
          • Mapping with flat file sourceSSIS_Multiple_CSV_LoadInvoice.PNG
    4. Drag and drop Data Flow Task component and name it as Load Agency
      • Repeat all steps which we implement for Invoice but change Expression of Precedence Constraint Editor to @[User::FileType]==”Agency” SSIS_Multiple_CSV_PrecedenceConstrai.PNGSSIS_Multiple_CSV_LoadAgency.PNG
    5. SSIS_Multiple_CSV_LoadAgency.PNGDrag and drop Data Flow Task component and name it as Load Sales
      • Repeat all steps which we implement for Invoice but change Expression of Precedence Constraint Editor to @[User::FileType]==”Sales” SSIS_Multiple_CSV_PrecedenceConstrai.PNGSSIS_Multiple_CSV_LoadSales.PNG
    6. Drag and drop File System Task component
      • Double click to open File System Task Editor
        • IsDestinationPathVariable: False
        • DestinationConnection: ArchivedFolder. This is the folder connection we created before
        • OverwriteDestination: False
        • Operation: Move File. There are a lot of actions but we use Move File to archive data files after proceeding them successfully
        • IsSourcePathVariable: True
        • SourceConnection: User::FileConnection

Execute SSIS package

Finally, our package has been done and we need to execute it to see how it works?
SSIS_Multiple_CSV.PNG

  1. Check the folder F:\SSIS-Practice\Load_Multiple_CSV_Files\DataFile to make sure there are no data files
  2. Check the folder F:\SSIS-Practice\Load_Multiple_CSV_Files\ArchivedFolder to make sure there are no data files
  3. Query data from Agency, Invoice and Sales tables to make sure that they are empty

SSIS_Multiple_CSV_Before_Executing.PNG
Executing the package
SSIS_Multiple_CSV_Executing.PNGVerifying the result after executing the package
SSIS_Multiple_CSV_Results.PNGExcellent, data imported and data files were also archived to the folder as our expectation. In my example, the archived folder is local, you can still use FTP Task component to upload the files to anywhere.

Conclusions

In real life, there are a lot of scenarios that will require you to combine SSIS components together. Hopefully, you will have an overview how to use FTP Task and Foreach Loop Container to proceed multiple files at same time through my example.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s