How to run GWAS using SAIGE
What?
Pipeline for running GWAS (for binary or quantitative phenotype) using SAIGE. NB! Examples for Saige pipeline has not been updated after DF10.
Introduction
In FinnGen data releases 1-6, the GWAS was performed using SAIGE, which performs single-variant association tests for binary traits and quantitative taits. For binary traits, SAIGE uses the saddlepoint approximation (SPA) (mhof, J. P. , 1961; Kuonen, D. 1999; Dey, R. et.al 2017) to account for case-control imbalances. Releases 7+ now use REGENIE, which we recommend for newer analyses.
!! NB !! Please be cautious with how many GWAS you create and the number of phenotypes you include. Submitting more than ten GWAS jobs simultaneously or GWAS with more than 15 of phenotypes may jam the process and can make your organization's pipeline unusable for others. If you are going to launch more than 5 GWASs or GWAS with tens of phenotypes please contact the humgen-servicedesk@helsinki.fi and we can temporarily increase the resources of your organization's Sandbox and downscale afterward.
Example files for the SAIGE pipeline
You can find the example files for running SAIGE in Sandbox from /finngen/library-green/scripts/saige/
:
.json files (Note: This file must be edited before running!):
saige_R6.json
andsaige_R10.json
.wdl file:
saige.wdl
sub-wdl file:
saige_sub.zip
These are examples for running SAIGE on the endpoint E4_DIABETES
in R6 (saige_R6.json
) and R10 (saige_R10.json
).
Covariate + phenotype file
You may use some or all of the default covariates or add new covariates. If you like to make a covariate to the SAIGE run please follow the instructions on how to make covariate + phenotype file for GWAS pipeline.
Prepare your files for SAIGE
Before you can submit your job, you need to download example files needed, and edit the .json file. The easiest way is to copy the files into your /finngen/red/
folder from library-red
and modify them if needed. You need to modify the .json
file (see points 1-6 here), and create covariate + phenotype
file, phenotype list
file".
The parts you (may) need to edit in the json file are:
saige.test_combine.bgenlistfile:
path to a .txt file with a list ofbgen
files to run the test for, each on its own row. An example file for R6 can be found at/finngen/shared/r6_saige/20201006_141639/files/saige_pipeline_R6/input_files/R6_bgen_filelist.txt
.saige.null.phenofile:
path to a phenotype-covariate file (a text file including the phenotype codes and covariates used). An example file for R6 can be found at/finngen/library-red/finngen_R6/phenotype_2.0/data/finngen_R6_cov_pheno_1.0.txt.gz
.Hint: if you would like to use a subset of individuals in your analysis, mark the samples that you don't want to use as
NA
in your phenotype and add this new phenotype as a new column in the existing phenofile. Edit the phenotype list so that it contains the name of your new phenotype column.
saige.phenolistfile:
path to a .txt file containing a list of phenotypes to run in a single column, each on its own row. These codes should correspond to the exact column ID on your Phenotype file for running SAIGE. For example:
saige.traitType:
Set tobinary
orquantitative
. Specifies whether your phenotype(s) is binary or quantitative, and thus whether to run a logistic or linear model.saige.null.bedfile:
Path to .bed file for your genetic relatedness matrix (GRM). An example file for R6 can be found at/finngen/library-red/finngen_R6/grm_1.0/data/finngen_R6_grm_v1_ld_0.1.bed
saige.null.covariates:
List of covariates used in the analysis, separated by,
- for example"age,gender"
.saige.test_combine.test.samplefile:
Path to a .txt file listing sample IDs used in the analysis, one sample ID per row. An example file for R6 can be found at:/finngen/shared/r6_saige/20201006_141639/files/saige_pipeline_R6input_files/finngen_r6_sample_list.txt
Note: the number of samples specified in the sample file must match the number of samples in the .bgen file(s).
Note that the FinnGen IDs in the saige.test_combine.test.samplefile
you specify in the .json
file should match and be in the same order as FinnGen IDs in the saige.null.bedfile
. For example, in .json
file for R8, "saige.test.samplefile": "path/to/finngen_r8_bgen_sample_list.txt"
is compatible with "saige.null.bedfile": "/path/to/R8_GRM_V0_LD_0.1.bed"
.
Once your files are ready, open the pipeline tool and copy-paste your .wdl
and .json
files, and now, be able to run your GWAS.
See how the Sandbox paths and pipelines are mapped here.
Logistic or linear?
In SAIGE, you'll define whether to use logistic or linear model by setting in the .json file saige_pipeline_ps.traitType
to binary
for a logistic model, and quantitative
for a linear model, for binary and continuous traits, respectively.
Submit your SAIGE job
Using Pipelines
You can submit your SAIGE job to the pipeline system via the command line with the following command:
Remember to save your job's ID to keep track of your job and view the output. See also Tips on how to find a pipeline job ID.
Output
Once your job is done running, you can find the output of your SAIGE run from: /finngen/pipeline/cromwell/workflows/saige/[WORKFLOW_ID]/call-test_combine/shard-#/sub.test_combine/[SUB_WORKFLOW_ID]/call-combine/
(each of your phenotypes have their own folder 'shard' number, starting from 0. If you only have one phenotype, there will be only one folder: shard-0
)
In the output folder(s) you will find:
2 summary statistics files:
{prefix}_{pheno}.gz:
Summary statistics file with columns cleaned for use with pheweb's format{prefix}_{pheno}.saige.gz:
Full summary statistics file
Plots (under glob-*
subfolder):
{prefix}_{pheno}.gz_pheweb_pval_manhattan.png:
Manhattan plot{prefix}_{pheno}.gz_pheweb_pval_manhattan_loglog.png:
Log-adjusted Manhattan plot{prefix}_{pheno}.gz_pval_qqplot.png:
Quantile-quantile (QQ) plot
Related:
Last updated