Conceptual

Draw an example (of your own invention) of a partition of twodimensional feature space that could result from recursive binary splitting. Your example should contain at least six regions. Draw a decision tree corresponding to this partition. Be sure to label all aspects of your figures, including the regions R 1 , R 2 , . . . , the cutpoints t 1 , t 2 , . . . , and so forth.

Hint: Your result should look something like Figures 8.1 and 8.2.

It is mentioned in Section 8.2.3 that boosting using depth-one trees (or stumps ) leads to an additive model: that is, a model of the form

\[f(X) = \sum_{j=1}^p f_j(X_j)\]

Explain why this is the case. You can begin with (8.12) in Algorithm 8.2.

Consider the Gini index, classification error, and entropy in a simple classification setting with two classes. Create a single plot that displays each of these quantities as a function of p ˆ m 1. The x -axis should display p ˆ m 1, ranging from 0 to 1, and the y -axis should display the value of the Gini index, classification error, and entropy.
- ˆ ˆ
- Hint: In a setting with two classes, pm 1 = 1 − pm 2 . You could make this plot by hand, but it will be much easier to make in R .
This question relates to the plots in Figure 8.14.

364 8. Tree-Based Methods

(b) Create a diagram similar to the left-hand panel of Figure 8.14, using the tree illustrated in the right-hand panel of the same figure. You should divide up the predictor space into the correct regions, and indicate the mean for each region.

Suppose we produce ten bootstrapped samples from a data set containing red and green classes. We then apply a classification tree to each bootstrapped sample and, for a specific value of X , produce 10 estimates of P (Class is Red _

X_ ):

0 . 1 , 0 . 15 , 0 . 2 , 0 . 2 , 0 . 55 , 0 . 6 , 0 . 6 , 0 . 65 , 0 . 7 , and 0 . 75 .

There are two common ways to combine these results together into a single class prediction. One is the majority vote approach discussed in this chapter. The second approach is to classify based on the average probability. In this example, what is the final classification under each of these two approaches?

Provide a detailed explanation of the algorithm that is used to fit a regression tree.

서브목차