How to run survival analyses

Here we describe how to perform survival analyses (cox- models, kaplan-meiers etc.) using ready time-to-event phenotypes, and also show an example how to create your own time-to-event phenotype using bigQuery.

To run cox, model, you need 2 phenotype columns:

  1. EVENT column, indicating the event status (0/1), and

  2. EVENT_AGE column, indicating the survival time (must be >0 for all samples). NOTE: EVENT_AGE is needed for non-events as well (for them, it can be survival time until the end of follow-up)!

The file for cox model should be in format like this:

Run cox model for ready time-to-event phenotypes

In the endpoint- file, core endpoints are ready in the format to run cox model from birth to first event of endpoint. In this example, we will perform cox model for survival from birth until endpoint I9_CORATHER.

Since the endpoint file is very large file, you can first filter that for the columns needed using awk. Here is an example to do that filtering to columns (FINNGENID, I9_CORATHER, I9_CORATHER_AGE and SEX (for gender stratified analysis)) you need for example using awk:

zcat /finngen/library-red/finngen_R11/phenotype_1.0/data/finngen_R11_endpoint_1.0.txt.gz | awk -v col1=FINNGENID -v col2=I9_CORATHER -v col3=I9_CORATHER_AGE -v col4=SEX 'NR==1{for(i=1;i<=NF;i++){if($i==col1)c1=i;if ($i==col2)c2=i;if ($i==col3)c3=i;if ($i==col4)c4=i;}} NR>=1{print $c1 " " $c2 " " $c3 " " $c4}' >I9_CORATHER_ages.txt

Then in Rstudio, once read in the file you can perform cox model with the following command (R package survival required: library(survival):

fit<-survfit(coxph(Surv(I9_CORATHER_AGE, I9_CORATHER)~1, data = d))

You can plot corresponding Kaplan-meier plot by:

plot(fit)

Gender- stratified model can be done by:

fit_gender_str<-survfit(coxph(Surv(I9_CORATHER_AGE, I9_CORATHER)~strata(SEX), data = d))

And corresponding Kaplan-meier plot:

plot(fit_gender_str)

Last updated