Thursday, May 23, 2013

Data mining algorithm: Clustering

Clustering works by plotting out each case as dots on multiple axis based on the dimension defined by the SSAS developer. Then the system proceed to group together points that are close to each other. In this model you are not trying to predict anything, rather just having an overview of what are some of the associations between dimension values.



In the images above the sample is divided into 7 clusters each with a breakdown on attribute value for each predefined attributes. Here we can see that there is probably an association between the ones who are single, 0 for class and 000 for industry from Cluster 1 which fits into the sentence (Those who are single belongs to class 0 and industry 000). It is a good way to do marketing research with Clustering, then we can maybe draw some conclusion on "Those who are between 40 and 50 and owns a house are also very likely to own a car.", then a company who does car related business would be able to identify potential targets for their marketing campaign.

For more information: http://technet.microsoft.com/en-us/library/ms174879.aspx

Thursday, May 9, 2013

Data mining algorithm: Decision Tree

This came a bit late as I have promised to write about different data mining algorithms during the summer. Well, consider the weather in Oslo during July, I believe that summer has just started. So it still counts.

To start off I am going to talk about the data mining algorithm that is consider the simplest to understand, Decision Tree.

As if it is not already very obvious, decision tree is used to analyze the factors (or in database terms, a set of attributes) that affects the decision. For example, what kind of people is more likely to subscribe to newspaper? Which people is more likely going to vote for Romney in November?

To build a deicison tree, you first have a decision defined, usually one of the attribute that is either a true false or something that has a 2 value definition. Decision tree then splits the pool of sample into a tree heirarchy, each level represents an attribute and the sample is split into different nodes based on the value (or grouping of value). Once the desired attributes are used to build the decision tree, each node will then have a subset of the sample. The subset of people who is true or false to the initial decision.

A sample decision tree looks like this:



From the above example, it is all about the decision of play or don't play. Then the tree is built based on the weather outlook, humidity/windy condition. At the end we realize that if it is raining, then the windy or not determines whether the game is on or not, on the other hand, if it is sunny, humidity dictates whether the game is on. While the game is always on in the case of an overcast.

For more information:
http://technet.microsoft.com/en-us/library/ms175312.aspx

Thursday, May 2, 2013

Data Mining with Microsoft SSAS

One of the popular reason for using Microsoft SSAS is because of its standard support of data mining features. Something that would cost a fortune with other softwares. Of course, I am not saying that MS SSAS is going to be as good as those softwares, but it definitely is suffice to do some basic straightforward data mining in most cases. Along with DMX (Data mining expressions) and tightly integrated CRM and Sharepoint, this whole architecture is definitely something most SMB wishes for in creating a "budget" solution.

Throughout the summer I will go through the following data mining algorithms that are supported by MS SSAS.
  • Association
  • Clustering
  • Decision Trees
  • Linear Regression
  • Logistic Regression
  • Naive Bayes
  • Neural Network
  • Sequence Clustering
  • Time Series
For each data mining algorithms I will give an introduction and follow up by some case scenarios where it will be useful.

Stay tuned.