How to run GWAS using GATE (survival models)
What?
Pipeline for running survival GWAS using GATE.
Introduction
GATE performs single-variant association tests (GWAS) for time-to-event endpoints (ie. survival). GATE uses the saddlepoint approximation (SPA) (Mhof, J. P., 1961; Kuonen, D. 1999; Dey, R. et. al, 2017) to account for heavy censoring rates. Therefore, it is similar to SAIGE, except that it tests the association of the variant with the time-to-event, instead of having the disease.
In addition to binary variable (0/1) stating the case-control status, you'll need a column in your phenotype-covariate file to tell the survival time (NOTE! this information is required for both cases and controls: for cases it is the survival time until the event, and for controls you can set it as time until death or end of follow-up [both age at death and end-of-followup information can be found in each release's phenotype-covariate file.])
Note: You need to name the 2 columns (case/control status and survival time) [pheno]
and [pheno]_survTime
for the pipeline to recognize them as such. (For example: I9_CHD
and I9_CHD_survTime
.)
Note: Survival time must be >0 for all individuals
Note: Age at end of follow-up or death (AGE_AT_DEATH_OR_END_OF_FOLLOWUP) should NOT be used as covariate in GATE (or in survival models in general). The age at end of followup for controls is inherently already modelled in survival analysis. If you are interested in modelling survival after a certain event, say retinopathy after Type 2 diabetes (T2D) diagnosis, you could add age at T2D diagnosis as a covariate to control for different onset age between individuals.
Example files for the GATE pipeline
Example files for running the GATE pipeline in the Sandbox can be found at: /finngen/library-green/scripts/gate/
. Files you need from there are:
.wdl
file:gate.wdl
sub-
.wdl
file:gate_sub.wdl.zip
, and one of the example input (.json) files:gate_R12.json,
gate_R11.json
,gate_R10.json
andgate_R6_example.json
Thes example .json
- files are for running GATE on survival time from a Type 2 diabetes diagnosis (T2D) to a peripheral artery disease (I9_PAD) diagnosis in DF6 (gate_R6_example.json
), and survival from birth to first event of I9_MI and I9_CORATHER in DF11 (gate_R11.json
) and DF10 (gate_R10.json
).
The phenotype-covariate file, phenotype list file and genotype file list needed to run the GATE example (DF6_exammple) can be found at: /finngen/shared/gate_example_files/20220823_132723/files/sruotsal/gate/gate_example_files/
:
r6_T2D_I9_PAD_phenointerval_0.pheno.txt
: example phenotype-covariate filegate_example_phenolist.txt
: example phenotype list filer6_bgen_chunk_list.txt
: example genotype list file
To add covariates into your own phenotype file, see for the instructions here. Note that the phenotype-covariate file requires a column FINNGENID, while some other GWAS analyses use FID and IID.
Prepare your files for GATE
Once you have downloaded the example files, you need to edit the gate.json
input file. The parts you (may) need to edit in it are:
gate.phenolistfile
: the path to a .txt file containing the list of phenotypes to run, each on their own row. For example, for only one phenotype:LIBRARY_SHARED/gate_example_files/20220823_132723/files/sruotsal/gate/gate_example_files/gate_example_phenolist.txt
gate.null.eventTimeBinSize:
the size of the survival time bin to group individuals in the analysis. Default is 1 (year). Depending on the distribution of the survival time, you may want to modify this e.g. to 0.083 (years) which is equivalent to 1 month.gate.null.phenofile
: the path to a phenotype-covariate file in.txt
format. Example:LIBRARY_SHARED/gate_example_files/20220823_132723/files/sruotsal/gate/gate_example_files/r6_T2D_I9_PAD_phenointerval_0.pheno.txt
Remember, you need to name the 2 columns (case/control status and survival time)
[pheno]
and[pheno]_survTime
for the pipeline to recognize them - in this example, they should readT2D_I9_PAD
andT2D_I9_PAD_survTime
.
gate.null.covariates
: a list of covariates to be used in the model, separated by,
.gate.null.bedfile
: the path to the.bed
file from the GRM file, needs to be edited according to what release you plan on using. In this example, R6 is being used. For your own analyses, we strongly recommend using the most recent (and updated) data release.gate.test_combine.bgenlistfile
: the path to a.txt
file with a list of.bgen
(8-bit) files (the genotype files). Needs to be edited according to your release (see above). In this example R6 is used:LIBRARY_SHARED/gate_example_files/20220823_132723/files/sruotsal/gate/gate_example_files/r6_bgen_chunk_list.txt
.gate.test_combine.combine.prefix
: a string to give as a prefix in the combined (over genotype files) summary statistics file.gate.test_combine.test.samplefile
: the path to a .txt file containing a list of samples (which needs to match the samples in the given.bgen
files), one on each row. Do not include a header or any extra columns! For example:
See how the Sandbox paths and pipelines are mapped here.
Submit your GATE job
Using Pipelines
See How to use the Pipelines area to see how to submit your job. Note especially that the pipeline input files are run from the /finngen/red/ folder.
If you need further information on the pipeline/job system, see section Pipelines is based on Cromwell and WDL.
Once your .json
file is ready, you can submit your GATE run via the command:
After submitting your job successfully, go to Applications
->Sandbox
->Pipelines
to track your job, and remember to save your jobs' workflow ID for tracking and checking the results when your run has finished.
Output
Once your job displays the Succeeded
state you can see the results in /finngen/pipeline/cromwell/workflows/gate/[WORKFLOW_ID]/
.
You can, for example, find the:
summary statistics (
.gz
and.saige.gz
) in/finngen/pipeline/cromwell/workflows/gate/[WORKFLOW_ID]/call-test_combine*/shard*/sub.test_combine/*/call-combine/
(if you have multiple phenotypes, the results for each phenotype go into their own sub-folders [shard-#
]).manhattan and QQ plots in the
/finngen/pipeline/cromwell/workflows/gate/[WORKFLOW_ID]/call-test_combine*/shard*/sub.test_combine/*/call-combine/glob*-
.
Link to Wei's GATE presentation at FinnGen User's Meeting 2nd November 2021.
Link to demo on How to run updated GATE pipeline in Sandbox from FinnGen User's Meeting on 30th August 2022.
Last updated