4.3.3 Making Predictions

Once the coefficients have been estimated, we can compute the probability of default for any given credit card balance. For example, using the coefficient estimates given in Table 4.1, we predict that the default probability

  1. Classification

142

4. Classifcation 4. Classifcation
   
  Coefcient
Std. error
z-statistic
p-value
Intercept
balance
−_10.6513
0.3612
29.5
<0.0001
0.0055
0.0002
24.9
<_0.0001

TABLE 4.1. For the Default data, estimated coefficients of the logistic regression model that predicts the probability of default using balance . A one-unit increase in balance is associated with an increase in the log odds of default by 0 . 0055 units.

ase in balance is
5 units.
associated wi th an increase in the log od ds of defau
  Coefcient Std. error z-statistic p-value
Intercept _−_3.5041 0.0707 _−_49.55 _<_0.0001
student[Yes] 0.4049 0.1150 3.52 0.0004

TABLE 4.2. For the Default data, estimated coefficients of the logistic regression model that predicts the probability of default using student status. Student status is encoded as a dummy variable, with a value of 1 for a student and a value of 0 for a non-student, and represented by the variable student[Yes] in the table.

for an individual with a balance of $1,000 is

\[\hat{p}(X) = rac{e^{-10.6513 + 0.0055 imes 1000}}{1 + e^{-10.6513 + 0.0055 imes 1000}} = 0.00576\]

which is below 1 %. In contrast, the predicted probability of default for an individual with a balance of $2 , 000 is much higher, and equals 0 . 586 or 58 . 6 %.

One can use qualitative predictors with the logistic regression model using the dummy variable approach from Section 3.3.1. As an example, the Default data set contains the qualitative variable student . To fit a model that uses student status as a predictor variable, we simply create a dummy variable that takes on a value of 1 for students and 0 for non-students. The logistic regression model that results from predicting probability of default from student status can be seen in Table 4.2. The coefficient associated with the dummy variable is positive, and the associated p -value is statistically significant. This indicates that students tend to have higher default probabilities than non-students:

\[egin{align*} \hat{ ext{Pr}}( ext{default} = ext{Yes} \mid ext{student} = ext{Yes}) &= rac{e^{-3.5041 + 0.4049 imes 1}}{1 + e^{-3.5041 + 0.4049 imes 1}} = 0.0431 \ \hat{ ext{Pr}}( ext{default} = ext{Yes} \mid ext{student} = ext{No}) &= rac{e^{-3.5041 + 0.4049 imes 0}}{1 + e^{-3.5041 + 0.4049 imes 0}} = 0.0292 \end{align*}\]
서브목차