Conceptual

  1. For each example, state whether or not the censoring mechanism is independent. Justify your answer.

    • (a) In a study of disease relapse, due to a careless research scientist, all patients whose phone numbers begin with the number “2” are lost to follow up.

    • (b) In a study of longevity, a formatting error causes all patient ages that exceed 99 years to be lost (i.e. we know that those patients are more than 99 years old, but we do not know their exact ages).

    • (c) Hospital A conducts a study of longevity. However, very sick patients tend to be transferred to Hospital B, and are lost to follow up.

    • (d) In a study of unemployment duration, the people who find work earlier are less motivated to stay in touch with study investigators, and therefore are more likely to be lost to follow up.

    • (e) In a study of pregnancy duration, women who deliver their babies pre-term are more likely to do so away from their usual hospital, and thus are more likely to be censored, relative to women who deliver full-term babies.

11.9 Exercises 499

  • (f) A researcher wishes to model the number of years of education of the residents of a small town. Residents who enroll in college out of town are more likely to be lost to follow up, and are also more likely to attend graduate school, relative to those who attend college in town.

  • (g) Researchers conduct a study of disease-free survival (i.e. time until disease relapse following treatment). Patients who have not relapsed within five years are considered to be cured, and thus their survival time is censored at five years.

  • (h) We wish to model the failure time for some electrical component. This component can be manufactured in Iowa or in Pittsburgh, with no difference in quality. The Iowa factory opened five years ago, and so components manufactured in Iowa are censored at five years. The Pittsburgh factory opened two years ago, so those components are censored at two years.

  • (i) We wish to model the failure time of an electrical component made in two different factories, one of which opened before the other. We have reason to believe that the components manufactured in the factory that opened earlier are of higher quality.

  1. We conduct a study with n = 4 participants who have just purchased cell phones, in order to model the time until phone replacement. The first participant replaces her phone after 1.2 years. The second participant still has not replaced her phone at the end of the two-year study period. The third participant changes her phone number and is lost to follow up (but has not yet replaced her phone) 1.5 years into the study. The fourth participant replaces her phone after 0.2 years.

For each of the four participants ( i = 1 , . . . , 4), answer the following questions using the notation introduced in Section 11.1:

  • (a) Is the participant’s cell phone replacement time censored?

  • (b) Is the value of ci known, and if so, then what is it?

  • (c) Is the value of ti known, and if so, then what is it?

  • (d) Is the value of yi known, and if so, then what is it?

  • (e) Is the value of δi known, and if so, then what is it?

  1. For the example in Exercise 2, report the values of K , d 1 , . . . , dK , r 1 , . . . , rK , and q 1 , . . . , qK , where this notation was defined in Section 11.3.

  2. This problem makes use of the Kaplan-Meier survival curve displayed in Figure 11.9. The raw data that went into plotting this survival curve is given in Table 11.4. The covariate column of that table is not needed for this problem.

    • (a) What is the estimated probability of survival past 50 days?
  3. Survival Analysis and Censored Data

500

Observation (Y ) Censoring Indicator (δ) Covariate (X)
  26.5   1 0.1
  37.2   1 11
  57.3   1 -0.3
  90.8   0 2.8
  20.2   0 1.8
  89.8   0 0.4

TABLE 11.4. Data used in Exercise 4.

  • (b) Write out an analytical expression for the estimated survival function. For instance, your answer might be something along the lines of
\[S(t) = \begin{cases} 1 & \text{if } t < 2 \\ 0.5 & \text{if } t \ge 2 \end{cases}\]

(The previous equation is for illustration only: it is not the correct answer!)

  1. Sketch the survival function given by the equation
\[S(t) = \begin{cases} 1 & t < 1 \\ 0.8 & 1 \le t < 2 \\ 0.5 & 2 \le t < 3 \\ 0.3 & t \ge 3 \end{cases}\]

Your answer should look something like Figure 11.9.

Figure 11.9

FIGURE 11.9. A Kaplan-Meier survival curve used in Exercise 4.

  1. This problem makes use of the data displayed in Figure 11.1. In completing this problem, you can refer to the observation times as y 1 , . . . , y 4. The ordering of these observation times can be seen from Figure 11.1; their exact values are not required.

    • (a) Report the values of δ 1 , . . . , δ 4, K , d 1 , . . . , dK , r 1 , . . . , rK , and q 1 , . . . , qK . The relevant notation is defined in Sections 11.1 and 11.3.

11.9 Exercises 501

  • (b) Sketch the Kaplan-Meier survival curve corresponding to this data set. (You do not need to use any software to do this — you can sketch it by hand using the results obtained in (a).)

  • (c) Based on the survival curve estimated in (b), what is the probability that the event occurs within 200 days? What is the probability that the event does not occur within 310 days?

  • (d) Write out an expression for the estimated survival curve from (b).

  1. In this problem, we will derive (11.5) and (11.6), which are needed for the construction of the log-rank test statistic (11.8). Recall the notation in Table 11.1.

    • (a) Assume that there is no difference between the survival functions of the two groups. Then we can think of q 1 k as the number of failures if we draw r 1 k observations, without replacement, from a risk set of rk observations that contains a total of qk failures. Argue that q 1 k follows a hypergeometric distribution . Write the hyper-

    • parameters of this distribution in terms of r 1 k , rk , and qk .

      • geometric distribution
    • (b) Given your previous answer, and the properties of the hypergeometric distribution, what are the mean and variance of q 1 k ? Compare your answer to (11.5) and (11.6).

  2. Recall that the survival function S ( t ), the hazard function h ( t ), and the density function f ( t ) are defined in (11.2), (11.9), and (11.11), respectively. Furthermore, define F ( t ) = 1 − S ( t ). Show that the following relationships hold:

\[f(t) = \frac{dF(t)}{dt} \quad \text{and} \quad S(t) = \exp \left( - \int_0^t h(u) du \right)\]
  1. In this exercise, we will explore the consequences of assuming that the survival times follow an exponential distribution.

    • (a) Suppose that a survival time follows an Exp( λ ) distribution, so that its density function is f ( t ) = λ exp( −λt ). Using the relationships provided in Exercise 8, show that S ( t ) = exp( −λt ).

    • (b) Now suppose that each of n independent survival times follows an Exp( λ ) distribution. Write out an expression for the likelihood function (11.13).

    • (c) Show that the maximum likelihood estimator for λ is

\[\hat{\lambda} = \frac{\sum_{i=1}^n \delta_i}{\sum_{i=1}^n y_i}\]
  • (d) Use your answer to (c) to derive an estimator of the mean survival time.

Hint: For (d), recall that the mean of an Exp( λ ) random variable is 1 /λ.

502 11. Survival Analysis and Censored Data

서브목차