8.3.5 Bayesian Additive Regression Trees

In this section we demonstrate a Python implementation of BART found in the ISLP.bart package. We fit a model to the Boston housing data set. This BART() estimator is designed for quantitative outcome variables, though BART() other implementations are available for fitting logistic and probit models to categorical outcomes.

In [33]:bart_boston=BART(random_state=0,burnin=5,ndraw=15)
bart_boston.fit(X_train,y_train)
Out[33]:BART(burnin=5,ndraw=15,random_state=0)

On this data set, with this split into test and training, we see that the test error of BART is similar to that of random forest.

In [34]:yhat_test=bart_boston.predict(X_test.astype(np.float32))
np.mean((y_test-yhat_test)**2)
Out[34]:20.92

We can check how many times each variable appeared in the collection of trees. This gives a summary similar to the variable importance plot for boosting and random forests.

8.4 Exercises 363

In [35]:var_inclusion=pd.Series(bart_boston.variable_inclusion_.mean(0),
index=D.columns)
var_inclusion
Out[35]: crim 25.333333
  zn 27.000000
  indus 21.266667
  chas 20.466667
  nox 25.400000
  rm 32.400000
  age 26.133333
  dis 25.666667
  rad 24.666667
  tax 23.933333
  ptratio 25.000000
  lstat 31.866667
  dtype: float64
서브목차