Piecewise Constant

Figure 7.2

FIGURE 7.2. The Wage data. Left: The solid curve displays the fitted value from a least squares regression of wage (in thousands of dollars) using step functions of age . The dashed curves indicate an estimated 95 % confidence interval. Right: We model the binary event wage>250 using logistic regression, again using step functions of age . The fitted posterior probability of wage exceeding $250 , 000 is shown, along with an estimated 95 % confidence interval.

be interpreted as the mean value of Y for X < c 1. By comparison, (7.5) predicts a response of β 0+ βj for cj ≤ X < cj +1, so βj represents the average increase in the response for X in cj ≤ X < cj +1 relative to X < c 1.

An example of fitting step functions to the Wage data from Figure 7.1 is shown in the left-hand panel of Figure 7.2. We also fit the logistic regression model

\[\Pr(y_i > 250 \mid x_i) = \frac{\exp(\beta_0 + \beta_1 C_1(x_i) + \dots + \beta_K C_K(x_i))}{1 + \exp(\beta_0 + \beta_1 C_1(x_i) + \dots + \beta_K C_K(x_i))} \quad (7.6)\]

in order to predict the probability that an individual is a high earner on the basis of age . The right-hand panel of Figure 7.2 displays the fitted posterior probabilities obtained using this approach.

Unfortunately, unless there are natural breakpoints in the predictors, piecewise-constant functions can miss the action. For example, in the lefthand panel of Figure 7.2, the first bin clearly misses the increasing trend of wage with age . Nevertheless, step function approaches are very popular in biostatistics and epidemiology, among other disciplines. For example, 5-year age groups are often used to define the bins.

서브목차