Sunday, November 11, 2012

SSIS - Good Practice

This post talks about a good practice of implementing SSIS projects, to make a template that can be reuse for every SSIS project. We have frameworks for application design, such as MVC, Spring, that include the basic elements which tend to repeat on every single environment. We have that for SSIS as well. However, we are not as lucky to be able to generate the basic elements in a package by going through a wizard, so we will have to include our own. Some of the good elements to include in the SSIS project framework would be:
  • A master package
  • Logging
  • Configuration file
A master package is responsible for calling each sub package and execution order can easily be managed in the master package. Once deployed, one can just schedule the execution of the master package to save deployment effort. Logging is responsible for logging the execution time and outcome for each batch (which is simply the execution of master package), package, and task. Corresponding database tables and stored procedures need to be implemented in order to support the logging system. I started a database project in visual studio and simply do a schema compare to install the needed tables and stored procedures, it took me seconds now after having spent hours doing the initial work. Configuration file allows for the package to be configured, all the logging connection strings and the location of the sub packages are best stored in the configuration so that it can adjust to different project and environment. It took me 3 days to set that up from scratch, so if these elements are stored in a SSIS template and one just pulls it out for the next project, that is 3 days saved. In addition, the following elements are not always required but could be useful in the template, one can just disable those element should be deemed not required for the project:
  • A foreach loop container that loops through all the files in a folder (or all the tables in a database)
  • A sequence container
Since most ETL projects involves handling a bunch of source data, one can almost not avoid the first element. The universal data dumper mention in earlier post fits very nicely into this foreach loop. The sub package that is responsible for data transform could be inserted into the sequence container that is used to manage the execution sequence and organizes a rollback should one or more of the element fails. With these elements setup in my template, I could now get started on my ETL projects right away without having to spend days doing repetitive work.

No comments:

Post a Comment