7.1 Polynomial Regression

Historically, the standard way to extend linear regression to settings in which the relationship between the predictors and the response is nonlinear has been to replace the standard linear model

\[y_i = \beta_0 + \beta_1 x_i + \epsilon_i\]

with a polynomial function

\[y_i = \beta_0 + \beta_1 x_i + \beta_2 x_i^2 + \dots + \beta_d x_i^d + \epsilon_i \quad (7.1)\]

where ϵi is the error term. This approach is known as polynomial regression , polynomial and in fact we saw an example of this method in Section 3.3.2. For large regression enough degree d , a polynomial regression allows us to produce an extremely non-linear curve. Notice that the coefficients in (7.1) can be easily estimated using least squares linear regression because this is just a standard linear model with predictors xi, x[2] i[, x][3] i[, . . . , x][d] i[.][Generally][speaking,][it][is][unusual] to use d greater than 3 or 4 because for large values of d , the polynomial curve can become overly flexible and can take on some very strange shapes. This is especially true near the boundary of the X variable.

The left-hand panel in Figure 7.1 is a plot of wage against age for the Wage data set, which contains income and demographic information for males who reside in the central Atlantic region of the United States. We see the results of fitting a degree-4 polynomial using least squares (solid blue curve). Even though this is a linear regression model like any other, the individual coefficients are not of particular interest. Instead, we look at the entire fitted function across a grid of 63 values for age from 18 to 80 in order to understand the relationship between age and wage .

7.1 Polynomial Regression 291

Degree−4 Polynomial

Figure 7.1

FIGURE 7.1. The Wage data. Left: The solid blue curve is a degree-4 polynomial of wage (in thousands of dollars) as a function of age , fit by least squares. The dashed curves indicate an estimated 95 % confidence interval. Right: We model the binary event wage>250 using logistic regression, again with a degree-4 polynomial. The fitted posterior probability of wage exceeding $250 , 000 is shown in blue, along with an estimated 95 % confidence interval.

In Figure 7.1, a pair of dashed curves accompanies the fit; these are (2 × ) standard error curves. Let’s see how these arise. Suppose we have computed the fit at a particular value of age , x 0:

\[\hat{f}(x_0) = \hat{\beta}_0 + \hat{\beta}_1 x_0 + \hat{\beta}_2 x_0^2 + \hat{\beta}_3 x_0^3 + \hat{\beta}_4 x_0^4 \quad (7.2)\]

What is the variance of the fit, i.e. Var f[ˆ] ( x 0)? Least squares returns variance estimates for each of the fitted coefficients β[ˆ] j , as well as the covariances between pairs of coefficient estimates. We can use these to compute the estimatedˆ variance of f[ˆ] ( x 0).[1] The estimated pointwise standard error of f ( x 0) is the square-root of this variance. This computation is repeated at each reference point x 0, and we plot the fitted curve, as well as twice the standard error on either side of the fitted curve. We plot twice the standard error because, for normally distributed error terms, this quantity corresponds to an approximate 95 % confidence interval.

It seems like the wages in Figure 7.1 are from two distinct populations: there appears to be a high earners group earning more than $250 , 000 per annum, as well as a low earners group. We can treat wage as a binary variable by splitting it into these two groups. Logistic regression can then be used to predict this binary response, using polynomial functions of age

1If C ˆ is the 5 × 5 covariance matrix of the β ˆ j , and if ℓT 0[=][(1] [, x][0] [, x][2] 0 [, x][3] 0 [, x][4] 0[)][,][then] Var[ f[ˆ] ( x 0)] = ℓ[T] 0 [C][ˆ] [ℓ][0][.]

292 7. Moving Beyond Linearity

as predictors. In other words, we fit the model

\[\Pr(y_i > 250 \mid x_i) = \frac{\exp(\beta_0 + \beta_1 x_i + \dots + \beta_d x_i^d)}{1 + \exp(\beta_0 + \beta_1 x_i + \dots + \beta_d x_i^d)} \quad (7.3)\]

The result is shown in the right-hand panel of Figure 7.1. The gray marks on the top and bottom of the panel indicate the ages of the high earners and the low earners. The solid blue curve indicates the fitted probabilities of being a high earner, as a function of age . The estimated 95 % confidence interval is shown as well. We see that here the confidence intervals are fairly wide, especially on the right-hand side. Although the sample size for this data set is substantial ( n = 3 , 000), there are only 79 high earners, which results in a high variance in the estimated coefficients and consequently wide confidence intervals.

서브목차