Survival analysis using the truncated endpoint file – secondary endpoint data
Last updated
Last updated
In addition to the regular endpoint files of DF9 (and subsequent data freezes), the register team will release a separate file for survival analyses. It is the so-called “truncated” endpoint datafile in which the common follow-up end date is 31.12.2019 (for DF9) for all of the registers (Figure 2). This is the last date to which the follow-up reaches in all registers included in the detailed longitudinal and endpoint data. When the follow-up is truncated for all registers to end at the same time, it becomes possible to see the complete disease status of the individuals even in the latest follow-up years.
Each endpoint in the Endpoint data contains the variable “ENDPOINT_AGE", which is a pre-calculated variable that contains individuals' ages at:
Cases: first recorded EVENT_AGE
Controls:
- FU_END_DATE (DF9: 31.12.2019, in the truncated endpoint file)
OR
- Age at death (if deceased – and even if moved abroad at some point)
OR
- Age at emigration (if moved abroad, and not deceased).
Survival analysis can be run using the variables BL_AGE (age when each individual has entered the study, i.e. donated DNA sample), ENDPOINT_AGE and a 1/0 indicator for the ENDPOINT.
BL_AGE is the age at which each individual has entered the study. Most of the individuals have joined the study after all follow-up register data has been made available (Figure 2). The exceptions are the primary care register Avohilmo (with the beginning of the follow-up in 2011), the specialist outpatient Hilmo registry (1998) and the Kela drug purchase register (1995), for which the follow-up may have begun after the individual joined the study. This small bias, that a small portion of the events go undetected (false negative) or that their first recorded EVENT_AGE is too large (such as for type 1 diabetes), has to be accepted for these registers.
Survival analysis can be run using the truncated endpoint file, as in the example below:
1. With age as the time scale
cox<- coxph(Surv(BL_AGE,DEATH_AGE,DEATH)~strata(GENDER) +CANC+INV_HDL+SMOKING+PREVAL_DIAB+factor(BMI_factor),data=foo)
2. With follow-up time scale
DEATH_AGEDIFF <- DEATH_AGE-BL_AGE
cox<- coxph(Surv(DEATH_AGEDIFF,DEATH)~strata(GENDER)+ BL_AGE+CANC+INV_HDL+SMOKING+PREVAL_DIAB+factor(BMI_factor),data=foo)