7.5.2 Choosing the Smoothing Parameter λ
We have seen that a smoothing spline is simply a natural cubic spline with knots at every unique value of xi . It might seem that a smoothing spline will have far too many degrees of freedom, since a knot at each data point allows a great deal of flexibility. But the tuning parameter λ controls the roughness of the smoothing spline, and hence the effective degrees of freedom . It is possible to show that as λ increases from 0 to ∞ , the effective effective degrees of freedom, which we write dfλ , decrease from n to 2.
degrees of freedom
In the context of smoothing splines, why do we discuss effective degrees of freedom instead of degrees of freedom? Usually degrees of freedom refer to the number of free parameters, such as the number of coefficients fit in a polynomial or cubic spline. Although a smoothing spline has n parameters and hence n nominal degrees of freedom, these n parameters are heavily constrained or shrunk down. Hence dfλ is a measure of the flexibility of the smoothing spline—the higher it is, the more flexible (and the lower-bias but higher-variance) the smoothing spline. The definition of effective degrees of
302 7. Moving Beyond Linearity
freedom is somewhat technical. We can write
\[\hat{\mathbf{g}}_\lambda = \mathbf{S}_\lambda \mathbf{y}\]where g ˆ λ is the solution to (7.11) for a particular choice of λ —that is, it is an n -vector containing the fitted values of the smoothing spline at the training points x 1 , . . . , xn . Equation 7.12 indicates that the vector of fitted values when applying a smoothing spline to the data can be written as a n × n matrix S λ (for which there is a formula) times the response vector y . Then the effective degrees of freedom is defined to be
\[df_\lambda = \text{tr}(\mathbf{S}_\lambda) = \sum_{i=1}^n \{\mathbf{S}_\lambda\}_{ii} \quad (7.12)\]the sum of the diagonal elements of the matrix S λ .
In fitting a smoothing spline, we do not need to select the number or location of the knots—there will be a knot at each training observation, x 1 , . . . , xn . Instead, we have another problem: we need to choose the value of λ . It should come as no surprise that one possible solution to this problem is cross-validation. In other words, we can find the value of λ that makes the cross-validated RSS as small as possible. It turns out that the leaveone-out cross-validation error (LOOCV) can be computed very efficiently for smoothing splines, with essentially the same cost as computing a single fit, using the following formula:
\[\text{LOOCV} = \frac{1}{n} \sum_{i=1}^n \left( \frac{y_i - \hat{g}_\lambda^{(-i)}(x_i)}{1 - \{\mathbf{S}_\lambda\}_{ii}} \right)^2 \quad (7.13)\]The notation g ˆ λ[(] [−][i][)] ( xi ) indicates the fitted value for this smoothing spline evaluated at xi , where the fit uses all of the training observations except ˆ for the i th observation ( xi, yi ). In contrast, gλ ( xi ) indicates the smoothing spline function fit to all of the training observations and evaluated at xi . This remarkable formula says that we can compute each of these leaveone-out fits using only g ˆ λ , the original fit to all of the data![5] We have a very similar formula (5.2) on page 205 in Chapter 5 for least squares linear regression. Using (5.2), we can very quickly perform LOOCV for the regression splines discussed earlier in this chapter, as well as for least squares regression using arbitrary basis functions.
Figure 7.8 shows the results from fitting a smoothing spline to the Wage data. The red curve indicates the fit obtained from pre-specifying that we would like a smoothing spline with 16 effective degrees of freedom. The blue curve is the smoothing spline obtained when λ is chosen using LOOCV; in this case, the value of λ chosen results in 6 . 8 effective degrees of freedom (computed using (7.13)). For this data, there is little discernible difference between the two smoothing splines, beyond the fact that the one with 16 degrees of freedom seems slightly wigglier. Since there is little difference between the two fits, the smoothing spline fit with 6 . 8 degrees of freedom
5The exact formulas for computing g ˆ( xi ) and S λ are very technical; however, efficient algorithms are available for computing these quantities.
7.6 Local Regression 303
Sub-Chapters (하위 목차)
Smoothing Spline Example (제약 기반 평활 곡선 스플라인 데이터 모델링 실제 예시 케이스 결과 뷰)
근로자 임금(Wage) 예측 등과 같은 전형적 다이내믹 일반화 타겟의 부드러운 평활 스플라인 모델 공식을 통계 알고리즘으로 적합했을 때, 모델 선형 대비 1순위에 랭크되는 매끄러운 곡면 디펜스 맵핑 투영 예제 결과치를 화면으로 확인합니다.