Thursday, May 9, 2013

Data mining algorithm: Decision Tree

This came a bit late as I have promised to write about different data mining algorithms during the summer. Well, consider the weather in Oslo during July, I believe that summer has just started. So it still counts.

To start off I am going to talk about the data mining algorithm that is consider the simplest to understand, Decision Tree.

As if it is not already very obvious, decision tree is used to analyze the factors (or in database terms, a set of attributes) that affects the decision. For example, what kind of people is more likely to subscribe to newspaper? Which people is more likely going to vote for Romney in November?

To build a deicison tree, you first have a decision defined, usually one of the attribute that is either a true false or something that has a 2 value definition. Decision tree then splits the pool of sample into a tree heirarchy, each level represents an attribute and the sample is split into different nodes based on the value (or grouping of value). Once the desired attributes are used to build the decision tree, each node will then have a subset of the sample. The subset of people who is true or false to the initial decision.

A sample decision tree looks like this:



From the above example, it is all about the decision of play or don't play. Then the tree is built based on the weather outlook, humidity/windy condition. At the end we realize that if it is raining, then the windy or not determines whether the game is on or not, on the other hand, if it is sunny, humidity dictates whether the game is on. While the game is always on in the case of an overcast.

For more information:
http://technet.microsoft.com/en-us/library/ms175312.aspx

No comments:

Post a Comment