11.6 Shrinkage for the Cox Model
In this section, we illustrate that the shrinkage methods of Section 6.2 can be applied to the survival data setting. In particular, motivated by the “loss+penalty” formulation of Section 6.2, we consider minimizing a penalized version of the negative log partial likelihood in (11.16),
\[-\sum_{i : \delta_i = 1} \left( \sum_{j=1}^p x_{ij} \beta_j - \log \left( \sum_{i' : y_{i'} \ge y_i} \exp \left( \sum_{j=1}^p x_{i'j} \beta_j \right) \right) \right) + \lambda P(\beta) \quad (11.17)\]| with respect to β = ( β 1 , . . . , βp ) [T] . We might take P ( β ) =[�] [p] j =1 [β] j[2][,][which] corresponds to a ridge penalty, or P ( β ) =[�] [p] j =1 _[ | ][β][j][ | ]_[, which corresponds][to] a lasso penalty. |
In (11.17), λ is a non-negative tuning parameter; typically we will minimize it over a range of values of λ . When λ = 0, then minimizing (11.17) is equivalent to simply maximizing the usual Cox partial likelihood (11.16). However, when λ > 0, then minimizing (11.17) yields a shrunken version of the coefficient estimates. When λ is large, then using a ridge penalty will give small coefficients that are not exactly equal to zero. By contrast, for a
11.6 Shrinkage for the Cox Model 485

FIGURE 11.6. For the Publication data, we display survival curves for time until publication, stratified by whether or not the study produced a positive result, after adjusting for all other covariates.
sufficiently large value of λ , using a lasso penalty will give some coefficients that are exactly equal to zero.
We now apply the lasso-penalized Cox model to the Publication data, described in Section 11.5.4. We first randomly split the 244 trials into equallysized training and test sets. The cross-validation results from the training set are shown in Figure 11.7. The “partial likelihood deviance”, shown on the y -axis, is twice the cross-validated negative log partial likelihood; it plays the role of the cross-validation error.[15] Note the “U-shape” of the partial likelihood deviance: just as we saw in previous chapters, the crossvalidation error is minimized for an intermediate level of model complexity. Specifically, this occurs when just two predictors, budget and impact , have non-zero estimated coefficients.
| Now, how do we apply this model to the test set? This brings up an important conceptual point: in essence, there is no simple way to compare predicted survival times and true survival times on the test set. The first problem is that some of the observations are censored, and so the true survival times for those observations are unobserved. The second issue arises from the fact that in the Cox model, rather than predicting a single survival time given a covariate vector x , we instead estimate an entire survival curve, S ( _t | x_ ), as a function of t . |
Therefore, to assess the model fit, we must take a different approach, which involves stratifying the observations using the coefficient estimates. In particular, for each test observation, we compute the “risk” score
\[\hat{\eta}_i = \hat{\beta}_{\text{budget}} \cdot x_{i, \text{budget}} + \hat{\beta}_{\text{impact}} \cdot x_{i, \text{impact}}\]where β[ˆ] budget and β[ˆ] impact are the coefficient estimates for these two features from the training set. We then use these risk scores to categorize the observations based on their “risk”. For instance, the high risk group consists of the observations for which budget i · β[ˆ] budget + impact i · β[ˆ] impact is largest; by
15Cross-validation for the Cox model is more involved than for linear or logistic regression, because the objective function is not a sum over the observations.
486 11. Survival Analysis and Censored Data

FIGURE 11.7. For the Publication data described in Section 11.5.4, cross-validation results for the lasso-penalized Cox model are shown. The y-axis displays the partial likelihood deviance, which plays the role of the cross-validation error. The x-axis displays the ℓ 1 norm (that is, the sum of the absolute values) of the coefficients of the lasso-penalized Cox model with tuning parameter λ, divided by the ℓ 1 norm of the coefficients of the unpenalized Cox model. The dashed line indicates the minimum cross-validation error.
(11.14), we see that these are the observations for which the instantaneous probability of being published at any moment in time is largest. In other words, the high risk group consists of the trials that are likely to be published sooner. On the Publication data, we stratify the observations into tertiles of low, medium, and high risk. The resulting survival curves for each of the three strata are displayed in Figure 11.8. We see that there is clear separation between the three strata, and that the strata are correctly ordered in terms of low, medium, and high risk of publication.