4.3.3 Making Predictions
Once the coefficients have been estimated, we can compute the probability of default for any given credit card balance. For example, using the coefficient estimates given in Table 4.1, we predict that the default probability
- Classification
142
| 4. Classifcation | 4. Classifcation |
|---|---|
| Coefcient Std. error z-statistic p-value |
|
Interceptbalance |
−_10.6513 0.3612 −29.5 <0.0001 0.0055 0.0002 24.9 <_0.0001 |
TABLE 4.1. For the Default data, estimated coefficients of the logistic regression model that predicts the probability of default using balance . A one-unit increase in balance is associated with an increase in the log odds of default by 0 . 0055 units.
ase in balance is5 units. |
associated wi | th an increase | in the log od | ds of defau |
|---|---|---|---|---|
| Coefcient | Std. error | z-statistic | p-value | |
Intercept |
_−_3.5041 | 0.0707 | _−_49.55 | _<_0.0001 |
student[Yes] |
0.4049 | 0.1150 | 3.52 | 0.0004 |
TABLE 4.2. For the Default data, estimated coefficients of the logistic regression model that predicts the probability of default using student status. Student status is encoded as a dummy variable, with a value of 1 for a student and a value of 0 for a non-student, and represented by the variable student[Yes] in the table.
for an individual with a balance of $1,000 is
which is below 1 %. In contrast, the predicted probability of default for an individual with a balance of $2 , 000 is much higher, and equals 0 . 586 or 58 . 6 %.
One can use qualitative predictors with the logistic regression model using the dummy variable approach from Section 3.3.1. As an example, the Default data set contains the qualitative variable student . To fit a model that uses student status as a predictor variable, we simply create a dummy variable that takes on a value of 1 for students and 0 for non-students. The logistic regression model that results from predicting probability of default from student status can be seen in Table 4.2. The coefficient associated with the dummy variable is positive, and the associated p -value is statistically significant. This indicates that students tend to have higher default probabilities than non-students: