7.2 Step Functions

Using polynomial functions of the features as predictors in a linear model imposes a global structure on the non-linear function of X . We can instead use step functions in order to avoid imposing such a global structure. Here step we break the range of X into bins , and fit a different constant in each bin. This amounts to converting a continuous variable into an ordered categorical variable .

function

ordered categorical variable

In greater detail, we create cutpoints c 1, c 2 , . . . , cK in the range of X , and then construct K + 1 new variables

\[\begin{align*} C_0(X) &= I(X < c_1), \\ C_1(X) &= I(c_1 \le X < c_2), \\ C_2(X) &= I(c_2 \le X < c_3), \\ &\vdots \\ C_K(X) &= I(c_K \le X). \end{align*}\]

where I ( · ) is an indicator function that returns a 1 if the condition is true, indicator and returns a 0 otherwise. For example, I ( cK ≤ X ) equals 1 if cK ≤ X , and function equals 0 otherwise. These are sometimes called dummy variables. Notice that for any value of X , C 0( X ) + C 1( X ) + · · · + CK ( X ) = 1, since X must be in exactly one of the K + 1 intervals. We then use least squares to fit a linear model using C 1( X ) , C 2( X ) , . . . , CK ( X ) as predictors[2] :

\[y_i = \beta_0 + \beta_1 C_1(x_i) + \beta_2 C_2(x_i) + \dots + \beta_K C_K(x_i) + \epsilon_i \quad (7.5)\]

For a given value of X , at most one of C 1 , C 2 , . . . , CK can be non-zero. Note that when X < c 1, all of the predictors in (7.5) are zero, so β 0 can

2We exclude C 0( X ) as a predictor in (7.5) because it is redundant with the intercept. This is similar to the fact that we need only two dummy variables to code a qualitative variable with three levels, provided that the model will contain an intercept. The decision to exclude C 0( X ) instead of some other Ck ( X ) in (7.5) is arbitrary. Alternatively, we could include C 0( X ) , C 1( X ) , . . . , CK ( X ), and exclude the intercept.

7.3 Basis Functions 293


Sub-Chapters (하위 목차)

Piecewise Constant (조각별 상수 함수)

잘려진 각 구역(바구니/빈) 안에서는 복잡성 없이 단순히 반응 변수들의 단일 상수 모델 평균치 레벨만으로 Y 데이터를 무단 예측하는 메커니즘을 수치적으로 이해하고 해석합니다.

서브목차