6.2.1 Ridge Regression
Recall from Chapter 3 that the least squares fitting procedure estimates β 0 , β 1 , . . . , βp using the values that minimize
\[\sum_{i=1}^n \left( y_i - \beta_0 - \sum_{j=1}^p \beta_j x_{ij} \right)^2 = \text{RSS}\]Ridge regression is very similar to least squares, except that the coefficients are estimated by minimizing a slightly different quantity. In particular, the ridge regression coefficient estimates β[ˆ] [R] are the values that minimize
\[\sum_{i=1}^n \left( y_i - \beta_0 - \sum_{j=1}^p \beta_j x_{ij} \right)^2 + \lambda \sum_{j=1}^p \beta_j^2 = \text{RSS} + \lambda \sum_{j=1}^p \beta_j^2 \quad (6.5)\]where λ ≥ 0 is a tuning parameter , to be determined separately. Equa- tuning tion 6.5 trades off two different criteria. As with least squares, ridge regresparameter sion seeks coefficient estimates that fit the data well, by making the RSS small. However, the second term, λ[�] j[β] j[2][,][called][a] [shrinkage][penalty][,][is] shrinkage small when β 1 , . . . , βp are close to zero, and so it has the effect of shrinking penalty the estimates of βj towards zero. The tuning parameter λ serves to control
6.2 Shrinkage Methods 241

FIGURE 6.4. The standardized ridge regression coefficients are displayed for the Credit data set, as a function of λ and ∥β[ˆ] λ[R][∥][2] [/][∥][β][ˆ] [∥][2] [.]
the relative impact of these two terms on the regression coefficient estimates. When λ = 0, the penalty term has no effect, and ridge regression will produce the least squares estimates. However, as λ →∞ , the impact of the shrinkage penalty grows, and the ridge regression coefficient estimates will approach zero. Unlike least squares, which generates only one set of coefficient estimates, ridge regression will produce a different set of coefficient estimates, β[ˆ] λ[R][,][for][each][value][of] [λ][.][Selecting][a][good][value][for] [λ][is][critical;] we defer this discussion to Section 6.2.3, where we use cross-validation.
Note that in (6.5), the shrinkage penalty is applied to β 1 , . . . , βp , but not to the intercept β 0. We want to shrink the estimated association of each variable with the response; however, we do not want to shrink the intercept, which is simply a measure of the mean value of the response when xi 1 = xi 2 = . . . = xip = 0. If we assume that the variables—that is, the columns of the data matrix X —have been centered to have mean zero before ridge regression is performed, then the estimated intercept will take the form β[ˆ] 0 = y ¯ =[�] [n] i =1 [y][i][/n][.]
Sub-Chapters (하위 목차)
An Application to the Credit Data (신용 데이터 적용 사례)
은행 신용도 데이터셋에 릿지 함수를 튜닝했을 때 변수들의 계수값이 조율 파라미터 $\lambda$의 상승에 따라 어떻게 부드럽게 감쇠하는지 등고 플롯으로 확인합니다.