To address the problem of decision tree construction, an algorithm known as Recursive Partitioning was developed. While there are many different versions of Recursive Partitioning (RP) available, each with its own unique details, the overall methodology is consistently the same regardless of the exact implementation. This can be shown as File:Rp
Understanding The Problem
To understand the RP process, a few basic concepts must first be understood. The first of these is the concept of splitting (partitioning) the training set. During the process of decision tree induction, we have to consider what questions we will ask in order to direct the user down the appropriate path. For simplicity, let’s consider that each potential question can have a true or false answer, thus, any particular node will have at most two paths leading from it to the next node(s) in the path. Every possible value of every possible feature within the training set represents a potential split that could be done. The result is that we will be able to go down the right path or the left path based upon the data and we will effectively split the data at each node into two independent groups – this is partitioning. For example, let us consider a training set which has purely numerical data. The features will be called [math]X_n[/math] and the possible values for those features will be called [math]Y_m[/math]. As a result, every question that could be asked can take the form, "Is [math]X_n[/math] less than or equal to [math]Y_m[/math]?" The resulting answer will direct us down the appropriate path, e.g., if [math]X_n[/math] is less than or equal to [math]Y_m[/math], then go left, otherwise, go right. Once we have two new nodes (children nodes) linked to a previous node (the parent node), we can repeat the process for each child node independently using only the observations present in that child – the recursive step.
Finally, we must have some type of stopping criteria. If we were to allow the splitting process to continue until each leaf only had 1 observation, we would have a perfect tree! This situation, however, is very dangerous if we want to use our resulting decision tree for the purpose of prediction. Most likely a tree of this sort is over fit for the training data and will not perform well on new data. To avoid this process, we introduce a stopping criteria to halt the recursive partitioning process. Much like purity measures, stopping criteria come in many different forms, including:
A maximum number of nodes in the tree. Once this maximum is reached, the process is halted. A minimum number of observations in a particular node can be set such that if the number of observations in a node is less than or equal to a minimum value, partitioning of that node will not be attempted, and it becomes a leaf. A threshold for the purity measure can be imposed such that if a node has a purity value higher than the threshold, no partitioning will be attempted regardless of the number of observations. (A node which is pure enough based upon a predefined threshold does not require any further splitting regardless of how many observations are present.) Once we have determined that we will stop the procedure, we have effectively reached a leaf in our tree. Based upon the observations that have made it through the tree to that leaf, we can assign it a class value. One common way to do this is to determine which class is in higher numbers and use that class as the leaf’s class value. This is merely a majority rules voting method and, like all other aspects of tree induction, many different methods exist by which to determine a leaf’s class label.
While the above procedure for decision tree induction can be modified and fine-tuned for the task at hand, the general procedure remains the same regardless of the choice of purity measures or stopping criteria. Additionally, while this is a method that is used extensively throughout the field of pattern recognition, a potential flaw exists. At each node which we have decided needs to be split, we look at the immediate effects on the children. As mentioned above, we look through each and every possible way to partition this node’s observations and choose the one which gives us the best purity in the immediate children. It is conceivable, however, that if we choose a splitting criteria which is not optimal, we might have better choices much later in the tree which we cannot see in the immediate children. In other words, a sub-optimal split may lead to an increase in optimality of choices for later splits.
R allows users to apply Recursive partitioning through the packages CARET. The methods CARET use for the algorithm building are
- Recursive Partitioning
- Boosted Trees
- Random Forests