Sunday, November 25, 2012

Data warehouse design, are you really that special

Whenever I came across a project where a data warehouse design is required, I often hear the customer stressing the point that their business is unique to everyone else and that a custom designed data warehouse is required. Well, according to the Lens Silverston, the author to the book series "The Data Model Resource Book", there exists a generic data model for all business. Most customization would merely be additional fields in the entities, or in most cases I can imagine, the customization work will simply be determining which subset of the generic data model to include. [gallery] Neither him nor me are suggesting that everyone can suddenly become a data warehouse architect and implement data warehouses on the fly by just treating the books as the holy bible, but with that as a starting point plus some accurate understanding of the business requirements to the customer. One can save a lot of time on the actual datawarehouse design process. After all, the understanding of the business aspect is the key to a successful datawarehouse. We are trying out his idea in my current project as we are implementing the data warehouse based on the data model from the financial sector suggested in volume 2. Over the course of the project we will track how much it really satisfy the customer's need and how much customizations along the way that would deviate from the data model (our goal is none). With that in mind, let's take it one step further, for consultants who work in a specific field of business, how about coding the data model into a database project in Visual Studio as a template and make the customizations from it? Imagine how much time it would have saved?

Sunday, November 18, 2012

Different Views to an IT Project - Illustration

When I first came across this picture I couldn't stop laughing, mostly because of the truth in it. Especially the documentation part. How many times did we come across an old solution that doesn't come with any documentation at all?

As time goes on I also realize that there is a difference between what the customer wants and how the customer explained it. Thus the importance of a prototype and frequent communication.

Sunday, November 11, 2012

SSIS - Good Practice

This post talks about a good practice of implementing SSIS projects, to make a template that can be reuse for every SSIS project. We have frameworks for application design, such as MVC, Spring, that include the basic elements which tend to repeat on every single environment. We have that for SSIS as well. However, we are not as lucky to be able to generate the basic elements in a package by going through a wizard, so we will have to include our own. Some of the good elements to include in the SSIS project framework would be:
  • A master package
  • Logging
  • Configuration file
A master package is responsible for calling each sub package and execution order can easily be managed in the master package. Once deployed, one can just schedule the execution of the master package to save deployment effort. Logging is responsible for logging the execution time and outcome for each batch (which is simply the execution of master package), package, and task. Corresponding database tables and stored procedures need to be implemented in order to support the logging system. I started a database project in visual studio and simply do a schema compare to install the needed tables and stored procedures, it took me seconds now after having spent hours doing the initial work. Configuration file allows for the package to be configured, all the logging connection strings and the location of the sub packages are best stored in the configuration so that it can adjust to different project and environment. It took me 3 days to set that up from scratch, so if these elements are stored in a SSIS template and one just pulls it out for the next project, that is 3 days saved. In addition, the following elements are not always required but could be useful in the template, one can just disable those element should be deemed not required for the project:
  • A foreach loop container that loops through all the files in a folder (or all the tables in a database)
  • A sequence container
Since most ETL projects involves handling a bunch of source data, one can almost not avoid the first element. The universal data dumper mention in earlier post fits very nicely into this foreach loop. The sub package that is responsible for data transform could be inserted into the sequence container that is used to manage the execution sequence and organizes a rollback should one or more of the element fails. With these elements setup in my template, I could now get started on my ETL projects right away without having to spend days doing repetitive work.

Sunday, November 4, 2012

SSRS - Good Practice

This post talks about one of the good practice for an SSRS developer, the importance of generating mock report prior to report implementation. A couple of years ago I was at a bank making SSRS reports as part of a multi-phase, large scale project. The project leader who is on his last project before his retirement. He is extremely experienced and is also known to have a bottomless pocket because even though he keep claiming the budget is tight, but for some reason he is always able to squeeze out the extra budget required to handle any unexpected scenarios. During phase 1 there was about 10 reports made, I completed them according to the specifications which was solely based on the data that needs to be shown and the parameters required. Most of them are to replace today's excel reports so I was given samples to use as design guideline. The report were completed and deployed prior to deadline so I was quite proud of myself. However, that's when the fun begins, I start getting numerous change requests regarding the layout of the report, either the number format, the width of the column, the order of columns, and even the color choice of the header background. Though they are change requests so technically it isn't my fault, but these excessive communication pushed the release date of reports back. When the first phase was completed I had a meeting with my project leader to discuss my performance, and he challenged me to minimize the change requests by hitting on the spot of what the client wants on first try. He then suggested a mock report that simulates the layout and submit it to the client for approval prior to the report implementation. He is aware that it will prolong the implementation time, but the time saved afterward is well worth it. I took his advice and start making excel mock reports for phase 2, and I am able to cut down the change request by 90%, most report were accepted right away while a few of them had a couple of small changes that was not picked up with the mock report. Since I am pretty good with excel, it only took me about an hour for each report but they saved me days of follow-up work. Phase 2 and 3 went smoothly with his advice and I have since then included this into my best practice list.