Life as a BI Consultant

Thursday, July 18, 2013

Useful SQL Queries to Retrieve Daterange Dates

Most of us who have worked with SQL knows of GetDate(), it returns the timestamp of the very moment it executed. Combine that with dateadd function and date calculations become much more powerful.

dateadd can take a date, and pick a part of a date (like day, month, year) and do arithmetics to it in order to get another desired date.

Example: dateadd(dd, -1, getDate()) returns yesterday, dateadd(yy, 1, getDate()) returns the same day of next year.

Last week I need to make a report that returns result of last week, last month and year to date. To make things complicated, last week doesn't mean 7 days ago, but actually Monday last week to Sunday last week. Same with last month (1st of Sept to 30th of Sept if the report is run today), to fetch the fromDate and the toDate for those case scenarios, we need to combine dateadd with datediff:

Datediff returns the integer difference for the 2 specified dates and the datepart specified.

Example: datediff(dd, getDate(), dateadd(dd, 1, getDate())) will return 1

Now, with both datediff and dateadd in action:

Dateadd(wk, Datediff(wk, 6, getDate())-1, 7) returns the first date of last week at 0 hour, 0 minute, 0.000 seconds

Dateadd(month, Datediff(month, 0, getDate())-1, 0) returns the first date of last month at 0 hour, 0 minute, 0.000 seconds

Dateadd(year, Datediff(year, 0, getDate())-1, 0) returns the first date of last year at 0 hour, 0 minute, and 0.000 seconds

From there we can be lazy about calculating the last date of last week:

Dateadd(wk, Datediff(wk, 6, getDate()), 7) actually returns the first date of this week at 0 hour, 0 minute, 0.000 seconds, if we run a query with '<' instead of '<=' then this will work just fine.

To complete the queries:

Dateadd(month, Datediff(month, 0, getDate()), 0) returns the first date of this month at 0 hour, 0 minute, 0.000 seconds

Dateadd(year, Datediff(year, 0, getDate()), 0) returns the first date of this year at 0 hour, 0 minute, and 0.000 seconds

Thursday, July 4, 2013

Handling multi value parameters in Reporting Services 2008 R2

In reporting services, one can set up a multiple value parameter as a dropdown list with check boxes. User can then select the desired values as search parameter for the report. Such a parameter will look like as follows:

However, for an SQL stored procedure that expects a string as parameter for provider, how would a multivalue selection be handled? Multivalue selection enters the stored procedure as comma separated string. However, if we try the following: Select * from fruits where name = 'apple, orange, banana' You are not going to get much results. So we need a way to split the comma separated strings. An example of a split function looks like this:

This function returns a table with each comma separated value as individual row. So the query of fruit selection becomes the following: select * from fruits where name in (select Token from dbo.split('apple, orange, banana'))

Thursday, June 20, 2013

Outsourcing

It has been going on for more than a decade, white collar jobs had been flooded to Asia from the US since the first tech crash. It started off with customer service, then moving on to software development.

Thousands of software development jobs moved across the ocean and those who couldn't deal with the change ended up losing their jobs, while software architects, especially those who has the background of the outsourcing destinations, suddenly took a leap in their career.

After years of experiments, the results were mixed. Some company claimed the quality of the product were too poor to meet the industrial standard, while some managed to maintain the quality while saving a big chunk of labor cost in the process.

So what was the difference? It can basically be summarized in 3 words: Design, Planning, Management.

Though software engineers across the ocean can do the development job for one fifth of the cost, but they are not meant to be software architects. Even if you come across one with software architect experience, it is unlikely they will have the same domain knowledge and cultural understanding to design for a software to meet the customer's demand. Thus the design should not be outsourced.

Once the design is done, success is still not guaranteed. The architect should have a plan on how development should be done. Preferably chopping down the assignment to smaller modules that is bite size for the oversea developers, then running unit tests to control the quality.

The last but not least is the day to day management to avoid any unexpected last minute crisis. The last thing a project manager wants is the development went totally off track and no one noticed the problem until it is too late. This goes with the planning as the what how when of management should be included in the planning. Management should of course account for the cultural and language difference. Thus this is considered the biggest obstacle to run a successful software project.

In conlusion outsourcing could work but success is in no way guaranteed. However by carefully executing the project and account for the cultural difference. Success is definitely achievable.

Thursday, June 6, 2013

Data mining algorithm: Association

This one is often in action, whenever we buy something online, we always sees advertisements in the form of: "People who bought A also bought ..." and a list of products.

It is most likely the system is using association to find out what product purchase were associated together. Association looks through an object set (products in a purchase for example), and see what object attribute values tends to be together in the same object set. In English, what products appear often together in the same purchase? However, this is only the first part of association. To really make sense of the association result, one must also look into the dependency of the items. We may know someone who buys a phone usually buys the phone cover along with it, but does a person who buys a phone cover usually buys a phone along with it as well? Probably not. Thus when defining the association rule one must define the implication as well.

For more information:

http://msdn.microsoft.com/en-us/library/ms174916

Thursday, May 23, 2013

Data mining algorithm: Clustering

Clustering works by plotting out each case as dots on multiple axis based on the dimension defined by the SSAS developer. Then the system proceed to group together points that are close to each other. In this model you are not trying to predict anything, rather just having an overview of what are some of the associations between dimension values.

In the images above the sample is divided into 7 clusters each with a breakdown on attribute value for each predefined attributes. Here we can see that there is probably an association between the ones who are single, 0 for class and 000 for industry from Cluster 1 which fits into the sentence (Those who are single belongs to class 0 and industry 000). It is a good way to do marketing research with Clustering, then we can maybe draw some conclusion on "Those who are between 40 and 50 and owns a house are also very likely to own a car.", then a company who does car related business would be able to identify potential targets for their marketing campaign.

For more information: http://technet.microsoft.com/en-us/library/ms174879.aspx

Thursday, May 9, 2013

Data mining algorithm: Decision Tree

This came a bit late as I have promised to write about different data mining algorithms during the summer. Well, consider the weather in Oslo during July, I believe that summer has just started. So it still counts.

To start off I am going to talk about the data mining algorithm that is consider the simplest to understand, Decision Tree.

As if it is not already very obvious, decision tree is used to analyze the factors (or in database terms, a set of attributes) that affects the decision. For example, what kind of people is more likely to subscribe to newspaper? Which people is more likely going to vote for Romney in November?

To build a deicison tree, you first have a decision defined, usually one of the attribute that is either a true false or something that has a 2 value definition. Decision tree then splits the pool of sample into a tree heirarchy, each level represents an attribute and the sample is split into different nodes based on the value (or grouping of value). Once the desired attributes are used to build the decision tree, each node will then have a subset of the sample. The subset of people who is true or false to the initial decision.

A sample decision tree looks like this:

From the above example, it is all about the decision of play or don't play. Then the tree is built based on the weather outlook, humidity/windy condition. At the end we realize that if it is raining, then the windy or not determines whether the game is on or not, on the other hand, if it is sunny, humidity dictates whether the game is on. While the game is always on in the case of an overcast.

For more information:
http://technet.microsoft.com/en-us/library/ms175312.aspx

Thursday, May 2, 2013

Data Mining with Microsoft SSAS

One of the popular reason for using Microsoft SSAS is because of its standard support of data mining features. Something that would cost a fortune with other softwares. Of course, I am not saying that MS SSAS is going to be as good as those softwares, but it definitely is suffice to do some basic straightforward data mining in most cases. Along with DMX (Data mining expressions) and tightly integrated CRM and Sharepoint, this whole architecture is definitely something most SMB wishes for in creating a "budget" solution.

Throughout the summer I will go through the following data mining algorithms that are supported by MS SSAS.

Association
Clustering
Decision Trees
Linear Regression
Logistic Regression
Naive Bayes
Neural Network
Sequence Clustering
Time Series

For each data mining algorithms I will give an introduction and follow up by some case scenarios where it will be useful.

Stay tuned.