Sunday, December 2, 2012

*Un*expected traps in an ETL Project

I am writing this post as a supplement to this post by Bjørn Eilertsen. Once we get comfortable with SSIS, we tend to assume we can handle any ETL project with ease. Especially when we have already developed quite a few of the reusable modules that would save us significant amount of time. Thus we would go lower on our estimates in order to win the contract and impress the client. Though it is true that with good experience with SSIS it would lower the development time of an ETL project, but there are some factors that has nothing to do with SSIS but are the main reason to delays and development hours shot off the chart. Besides listing the issues I would also point out a few ways to minimize if not avoid the delay.   Data/Environment access: As a consultant when we comes in to our client and work on a project, their data (or sometimes their client's data) can sometimes be sensitive and may not be available to us until we have completed a long list of task in order to gain access to it. Now, there is no guarantee of how long this may take so start early! This is the first of the many traps in an ETL project so the earlier this is taken care of, the more time there is to deal with the other potential issues.   Data Schema: When the sample data is not immediately available the client may try to appease us by giving us the data schema and claiming that the sample data will strictly follow the schema. It would be nice if it does, but it definitely would avoid a lot of time wasting if there is a plan B should the sample data deviates from the received schema.   Data Quality: In a perfect world, all data is clean. Just like in a perfect society everyone is living happily and there will be no crime. We all know that is impossible. Be prepared to spend some quality time to assure the quality of data. If the SSIS packages have implemented data control and spits out the data rows that contains unexpected data, you are half way done. It may take from a few communication mails with the person responsible of the source data to a month of sitting down to figure out how the data entities are connected. Connection between data entities are tricky especially when they come from different sources and they should not be taken lightly.   Data Model: The destination data model may seem perfectly under control of the SSIS developer and is definitely possible to define prior to the start of the project. However, should there be any changes to this, then there is a risk of facing all the challenges listed above again. The risk of this is high as the data model is usually used to serve as the base of data analysis/mining or an application that utilizes the data. So a glitch in the specification process will likely cause a change in the data model.   Most of these potential traps are difficult to avoid, but it is no point to feel discouraged. I believe it is because of these traps that make the existence of an SSIS developer worthwhile. With careful planning, accurate estimate and a good plan B, the traps would be able overcome with relatively little or no pain.

Sunday, November 25, 2012

Data warehouse design, are you really that special

Whenever I came across a project where a data warehouse design is required, I often hear the customer stressing the point that their business is unique to everyone else and that a custom designed data warehouse is required. Well, according to the Lens Silverston, the author to the book series "The Data Model Resource Book", there exists a generic data model for all business. Most customization would merely be additional fields in the entities, or in most cases I can imagine, the customization work will simply be determining which subset of the generic data model to include. [gallery] Neither him nor me are suggesting that everyone can suddenly become a data warehouse architect and implement data warehouses on the fly by just treating the books as the holy bible, but with that as a starting point plus some accurate understanding of the business requirements to the customer. One can save a lot of time on the actual datawarehouse design process. After all, the understanding of the business aspect is the key to a successful datawarehouse. We are trying out his idea in my current project as we are implementing the data warehouse based on the data model from the financial sector suggested in volume 2. Over the course of the project we will track how much it really satisfy the customer's need and how much customizations along the way that would deviate from the data model (our goal is none). With that in mind, let's take it one step further, for consultants who work in a specific field of business, how about coding the data model into a database project in Visual Studio as a template and make the customizations from it? Imagine how much time it would have saved?

Sunday, November 18, 2012

Different Views to an IT Project - Illustration

When I first came across this picture I couldn't stop laughing, mostly because of the truth in it. Especially the documentation part. How many times did we come across an old solution that doesn't come with any documentation at all?

As time goes on I also realize that there is a difference between what the customer wants and how the customer explained it. Thus the importance of a prototype and frequent communication.

Sunday, November 11, 2012

SSIS - Good Practice

This post talks about a good practice of implementing SSIS projects, to make a template that can be reuse for every SSIS project. We have frameworks for application design, such as MVC, Spring, that include the basic elements which tend to repeat on every single environment. We have that for SSIS as well. However, we are not as lucky to be able to generate the basic elements in a package by going through a wizard, so we will have to include our own. Some of the good elements to include in the SSIS project framework would be:
  • A master package
  • Logging
  • Configuration file
A master package is responsible for calling each sub package and execution order can easily be managed in the master package. Once deployed, one can just schedule the execution of the master package to save deployment effort. Logging is responsible for logging the execution time and outcome for each batch (which is simply the execution of master package), package, and task. Corresponding database tables and stored procedures need to be implemented in order to support the logging system. I started a database project in visual studio and simply do a schema compare to install the needed tables and stored procedures, it took me seconds now after having spent hours doing the initial work. Configuration file allows for the package to be configured, all the logging connection strings and the location of the sub packages are best stored in the configuration so that it can adjust to different project and environment. It took me 3 days to set that up from scratch, so if these elements are stored in a SSIS template and one just pulls it out for the next project, that is 3 days saved. In addition, the following elements are not always required but could be useful in the template, one can just disable those element should be deemed not required for the project:
  • A foreach loop container that loops through all the files in a folder (or all the tables in a database)
  • A sequence container
Since most ETL projects involves handling a bunch of source data, one can almost not avoid the first element. The universal data dumper mention in earlier post fits very nicely into this foreach loop. The sub package that is responsible for data transform could be inserted into the sequence container that is used to manage the execution sequence and organizes a rollback should one or more of the element fails. With these elements setup in my template, I could now get started on my ETL projects right away without having to spend days doing repetitive work.

Sunday, November 4, 2012

SSRS - Good Practice

This post talks about one of the good practice for an SSRS developer, the importance of generating mock report prior to report implementation. A couple of years ago I was at a bank making SSRS reports as part of a multi-phase, large scale project. The project leader who is on his last project before his retirement. He is extremely experienced and is also known to have a bottomless pocket because even though he keep claiming the budget is tight, but for some reason he is always able to squeeze out the extra budget required to handle any unexpected scenarios. During phase 1 there was about 10 reports made, I completed them according to the specifications which was solely based on the data that needs to be shown and the parameters required. Most of them are to replace today's excel reports so I was given samples to use as design guideline. The report were completed and deployed prior to deadline so I was quite proud of myself. However, that's when the fun begins, I start getting numerous change requests regarding the layout of the report, either the number format, the width of the column, the order of columns, and even the color choice of the header background. Though they are change requests so technically it isn't my fault, but these excessive communication pushed the release date of reports back. When the first phase was completed I had a meeting with my project leader to discuss my performance, and he challenged me to minimize the change requests by hitting on the spot of what the client wants on first try. He then suggested a mock report that simulates the layout and submit it to the client for approval prior to the report implementation. He is aware that it will prolong the implementation time, but the time saved afterward is well worth it. I took his advice and start making excel mock reports for phase 2, and I am able to cut down the change request by 90%, most report were accepted right away while a few of them had a couple of small changes that was not picked up with the mock report. Since I am pretty good with excel, it only took me about an hour for each report but they saved me days of follow-up work. Phase 2 and 3 went smoothly with his advice and I have since then included this into my best practice list.

Sunday, October 28, 2012

SSRS - Permission Denied

So you are new to SSRS 2008 R2, you checked all the boxes at installation, started the services and started the Reporting Services Configuration Manager. So far so good, but just when you try to get on to the reporting manager (http://localhost/Reports) to check out the interface. You got the following error:

You started to wonder: "How could this be?", I am the administrator of the machine, I am the system admin of the SQL Server, so there is no way I can't access the report server. Well, you are right, except for one minor detail...

The default setup account that has access to the report server is only BUILTIN\Administrator, so everyone else cannot access the page. You can do one of the following:

  • Log in as BUILTIN\Administrator
  • Run internet explorer as BUILTIN\Administrator and then add yourself the permission.


    So a few minutes later I am off making reports and deploying them to the site which I now have access.

  • Saturday, June 16, 2012

    NDC 2012

    The 2012 Norwegian Developer Conference takes place in Oslo 6th - 8th June in Oslo Spectrum. Amende has a stand in the venue and along with it comes a few passes which us employees can share. I signed up to go on Thursday because there was a couple of talks that really interests me. Other obvious reason for picking Thursday may include the party of music and alcohol follows the day, but I was honestly too tired to join as my brain was going through indigestion from the amount of information and ideas absorbed during the day. This is actually my first NDC or any .NET developer conference for that matter since I was a java developer in my early days, Java Zone was a more natural destination for me, I was there in 2008. Talks in the NDC varies, some may be presentation of a new idea, others maybe a cool project that they work on, there are even some workshops to present some cool features in a common technology. The talks I went to are
    • Clean Architecture by Robert Martin
    • Embrace the Uncertainty by Dan North
    • Git and GitHub for Developers on Windows by Phil Haack
    • You are in production, now what? by Tatham Oddie
      Besides going to the presentations I am also responsible to be by the Amende stand in between sessions or even during a couple of sessions. Mainly giving out souveniers to guests and encourage them to take our Amende survey for a chance to win the macbook air. It was nice to chat with different people who drops by to learn more about what they do and what they think about us. I tried to sell the idea of EasyQuest to glasspaper, but apparently one of my colleagues beat me to it. Our USB memory stick was extremely popular, that was because (as I later found out) that our memory stick has to most capacity. Comparing to 4 GB and 2 GB that was offered at other stands, ours were superior, props to us :) One thought I had regarding the giveaways of the stands, I wish they can be more original. If I ended up winning all the drawings I would indeed end up with 3 iPads! Overall I think this event is definitely worth going if one has the means and time. Not only did I learn about new technologies or get a peek inside the mind of the experts in the field, but the socializing and networking is also an important reason to show your face at such a venue. So hopefully I will get to go again next year and I hope to see you there next year.