7.6 Local Regression
Local regression is a different approach for fitting flexible non-linear func- local tions, which involves computing the fit at a target point x 0 using only the nearby training observations. Figure 7.9 illustrates the idea on some simulated data, with one target point near 0 . 4, and another near the boundary at 0 . 05. In this figure the blue line represents the function f ( x ) from which the data were generated, and the light orange line corresponds to the local regression estimate f[ˆ] ( x ). Local regression is described in Algorithm 7.1. Note that in Step 3 of Algorithm 7.1, the weights Ki 0 will differ for each value of x 0. In other words, in order to obtain the local regression fit at a new point, we need to fit a new weighted least squares regression model by minimizing (7.14) for a new set of weights. Local regression is sometimes referred to as a memory-based procedure, because like nearest-neighbors, we need all the training data each time we wish to compute a prediction. We will avoid getting into the technical details of local regression here—there are books written on the topic.
regression
In order to perform local regression, there are a number of choices to be made, such as how to define the weighting function K , and whether to fit a linear, constant, or quadratic regression in Step 3. (Equation 7.14 corresponds to a linear regression.) While all of these choices make some difference, the most important choice is the span s , which is the proportion of points used to compute the local regression at x 0, as defined in Step 1 above. The span plays a role like that of the tuning parameter λ in smooth-
304 7. Moving Beyond Linearity

FIGURE 7.9. Local regression illustrated on some simulated data, where the blue curve represents f ( x ) from which the data were generated, and the light orange curve corresponds to the local regression estimate f[ˆ] ( x ) . The orange colored points are local to the target point x 0 , represented by the orange vertical line. The yellow bell-shape superimposed on the plot indicates weights assigned to each point, decreasing to zero with distance from the target point. The fit f[ˆ] ( x 0) at x 0 is obtained by fitting a weighted linear regression (orange line segment), and using the fitted value at x 0 (orange solid dot) as the estimate f[ˆ] ( x 0) .
ing splines: it controls the flexibility of the non-linear fit. The smaller the value of s , the more local and wiggly will be our fit; alternatively, a very large value of s will lead to a global fit to the data using all of the training observations. We can again use cross-validation to choose s , or we can specify it directly. Figure 7.10 displays local linear regression fits on the Wage data, using two values of s : 0 . 7 and 0 . 2. As expected, the fit obtained using s = 0 . 7 is smoother than that obtained using s = 0 . 2.
The idea of local regression can be generalized in many different ways. In a setting with multiple features X 1 , X 2 , . . . , Xp , one very useful generalization involves fitting a multiple linear regression model that is global in some variables, but local in another, such as time. Such varying coefficient models are a useful way of adapting a model to the most recently gathered varying data. Local regression also generalizes very naturally when we want to fit models that are local in a pair of variables X 1 and X 2, rather than one. model We can simply use two-dimensional neighborhoods, and fit bivariate linear regression models using the observations that are near each target point in two-dimensional space. Theoretically the same approach can be implemented in higher dimensions, using linear regressions fit to p -dimensional neighborhoods. However, local regression can perform poorly if p is much larger than about 3 or 4 because there will generally be very few training observations close to x 0. Nearest-neighbors regression, discussed in Chapter 3, suffers from a similar problem in high dimensions.
coefficient model
7.7 Generalized Additive Models 305
Sub-Chapters (하위 목차)
Algorithm 7.1 Local Regression At X = x0 (국소 범주 가중치 K 이웃 범위 반경 설정 추적 알고리즘)
-
Local Linear Regression 적용 이론 실무 활용 스니펫 정보망 데이터 내 어떠한 K 퍼센트 타겟 관측치들을 추출하고 종 모양 커널 등 가중치 거리를 계산하여 매겨, 구역 내 로컬 부분 회귀 최소 제곱을 지속 튜닝하며 어떻게 잔차 에러를 감소시키는지 실질 로직 알고리즘들을 디버깅 스레드로 봅니다.