How to run survival analyses
Last updated
Last updated
Here we describe how to perform survival analyses (cox- models, kaplan-meiers etc.) using ready time-to-event phenotypes, and also show an example how to create your own time-to-event phenotype using bigQuery.
To run cox, model, you need 2 phenotype columns:
EVENT
column, indicating the event status (0/1), and
EVENT_AGE
column, indicating the survival time (must be >0 for all samples). NOTE: EVENT_AGE
is needed for non-events as well (for them, it can be survival time until the end of follow-up)!
The file for cox model should be in format like this:
In the endpoint- file, core endpoints are ready in the format to run cox model from birth to first event of endpoint. In this example, we will perform cox model for survival from birth until endpoint I9_CORATHER
.
Since the endpoint file is very large file, you can first filter that for the columns needed using awk. Here is an example to do that filtering to columns (FINNGENID
, I9_CORATHER
, I9_CORATHER_AGE
and SEX
(for gender stratified analysis)) you need for example using awk:
zcat /finngen/library-red/finngen_R11/phenotype_1.0/data/finngen_R11_endpoint_1.0.txt.gz | awk -v col1=FINNGENID -v col2=I9_CORATHER -v col3=I9_CORATHER_AGE -v col4=SEX 'NR==1{for(i=1;i<=NF;i++){if($i==col1)c1=i;if ($i==col2)c2=i;if ($i==col3)c3=i;if ($i==col4)c4=i;}} NR>=1{print $c1 " " $c2 " " $c3 " " $c4}' >I9_CORATHER_ages.txt
Then in Rstudio, once read in the file you can perform cox model with the following command (R package survival required: library(survival
):
fit<-survfit(coxph(Surv(I9_CORATHER_AGE, I9_CORATHER)~1, data = d))
You can plot corresponding Kaplan-meier plot by:
plot(fit)
Gender- stratified model can be done by:
fit_gender_str<-survfit(coxph(Surv(I9_CORATHER_AGE, I9_CORATHER)~strata(SEX), data = d))
And corresponding Kaplan-meier plot:
plot(fit_gender_str)