11.8.2 Publication Data

The Publication data presented in Section 11.5.4 can be found in the ISLP package. We first reproduce Figure 11.5 by plotting the Kaplan-Meier curves stratified on the posres variable, which records whether the study had a positive or negative result.

In [18]:fig,ax=subplots(figsize=(8,8))
Publication=load_data('Publication')
by_result={}
forresult,dfinPublication.groupby('posres'):
by_result[result]=df
km_result=km.fit(df['time'],df['status'])
km_result.plot(label='Result=%d'%result,ax=ax)

As discussed previously, the p -values from fitting Cox’s proportional hazards model to the posres variable are quite large, providing no evidence of a difference in time-to-publication between studies with positive versus negative results.

494 11. Survival Analysis and Censored Data

In [19]:posres_df=MS(['posres',
'time',
'status'],
intercept=False).fit_transform(Publication)
posres_fit=coxph().fit(posres_df,
'time',
'status')
posres_fit.summary[['coef','se(coef)','p']]
Out[19]:coefse(coef)p
covariate
posres0.1480760.1616250.359578

However, the results change dramatically when we include other predictors in the model. Here we exclude the funding mechanism variable.

In [20]:model=MS(Publication.columns.drop('mech'),
intercept=False)
coxph().fit(model.fit_transform(Publication),
'time',
'status').summary[['coef','se(coef)','p']]
Out[20]:
  coef se(coef) p
covariate      
posres 0.570774 0.175960 1.179606e-03
multi -0.040863 0.251194 8.707727e-01
clinend 0.546180 0.262001 3.710099e-02
sampsize 0.000005 0.000015 7.506978e-01
budget 0.004386 0.002464 7.511276e-02
impact 0.058318 0.006676 2.426779e-18

We see that there are a number of statistically significant variables, including whether the trial focused on a clinical endpoint, the impact of the study, and whether the study had positive or negative results.

서브목차