How to run GWAS using SAIGE

What?

Pipeline for running GWAS (for binary or quantitative phenotype) using SAIGE. NB! Examples for Saige pipeline has not been updated after DF10.

Introduction

In FinnGen data releases 1-6, the GWAS was performed using SAIGE, which performs single-variant association tests for binary traits and quantitative taits. For binary traits, SAIGE uses the saddlepoint approximation (SPA) (mhof, J. P. , 1961; Kuonen, D. 1999; Dey, R. et.al 2017) to account for case-control imbalances. Releases 7+ now use REGENIE, which we recommend for newer analyses.

!! NB !! Please be cautious with how many GWAS you create and the number of phenotypes you include. Submitting more than ten GWAS jobs simultaneously or GWAS with more than 15 of phenotypes may jam the process and can make your organization's pipeline unusable for others. If you are going to launch more than 5 GWASs or GWAS with tens of phenotypes please contact the humgen-servicedesk@helsinki.fi and we can temporarily increase the resources of your organization's Sandbox and downscale afterward.

Example files for the SAIGE pipeline

You can find the example files for running SAIGE in Sandbox from /finngen/library-green/scripts/saige/:

.json files (Note: This file must be edited before running!): saige_R6.json and saige_R10.json
.wdl file: saige.wdl
sub-wdl file: saige_sub.zip

These are examples for running SAIGE on the endpoint E4_DIABETES in R6 (saige_R6.json) and R10 (saige_R10.json).

Covariate + phenotype file

You may use some or all of the default covariates or add new covariates. If you like to make a covariate to the SAIGE run please follow the instructions on how to make covariate + phenotype file for GWAS pipeline.

Prepare your files for SAIGE

Before you can submit your job, you need to download example files needed, and edit the .json file. The easiest way is to copy the files into your /finngen/red/ folder from library-red and modify them if needed. You need to modify the .json file (see points 1-6 here), and create covariate + phenotype file, phenotype list file".

The parts you (may) need to edit in the json file are:

saige.test_combine.bgenlistfile: path to a .txt file with a list of bgen files to run the test for, each on its own row. An example file for R6 can be found at /finngen/shared/r6_saige/20201006_141639/files/saige_pipeline_R6/input_files/R6_bgen_filelist.txt .
saige.null.phenofile: path to a phenotype-covariate file (a text file including the phenotype codes and covariates used). An example file for R6 can be found at /finngen/library-red/finngen_R6/phenotype_2.0/data/finngen_R6_cov_pheno_1.0.txt.gz.
- Hint: if you would like to use a subset of individuals in your analysis, mark the samples that you don't want to use as NA in your phenotype and add this new phenotype as a new column in the existing phenofile. Edit the phenotype list so that it contains the name of your new phenotype column.
saige.phenolistfile: path to a .txt file containing a list of phenotypes to run in a single column, each on its own row. These codes should correspond to the exact column ID on your Phenotype file for running SAIGE. For example:

I9_CHD
J10_ASTHMA

saige.traitType: Set to binary or quantitative. Specifies whether your phenotype(s) is binary or quantitative, and thus whether to run a logistic or linear model.
saige.null.bedfile: Path to .bed file for your genetic relatedness matrix (GRM). An example file for R6 can be found at /finngen/library-red/finngen_R6/grm_1.0/data/finngen_R6_grm_v1_ld_0.1.bed
saige.null.covariates: List of covariates used in the analysis, separated by , - for example "age,gender".
saige.test_combine.test.samplefile: Path to a .txt file listing sample IDs used in the analysis, one sample ID per row. An example file for R6 can be found at: /finngen/shared/r6_saige/20201006_141639/files/saige_pipeline_R6input_files/finngen_r6_sample_list.txt
Note: the number of samples specified in the sample file must match the number of samples in the .bgen file(s).

Note that the FinnGen IDs in the saige.test_combine.test.samplefile you specify in the .json file should match and be in the same order as FinnGen IDs in the saige.null.bedfile. For example, in .json file for R8, "saige.test.samplefile": "path/to/finngen_r8_bgen_sample_list.txt" is compatible with "saige.null.bedfile": "/path/to/R8_GRM_V0_LD_0.1.bed".

Once your files are ready, open the pipeline tool and copy-paste your .wdl and .json files, and now, be able to run your GWAS.

See how the Sandbox paths and pipelines are mapped here.

Logistic or linear?

In SAIGE, you'll define whether to use logistic or linear model by setting in the .json file saige_pipeline_ps.traitType to binary for a logistic model, and quantitative for a linear model, for binary and continuous traits, respectively.

Submit your SAIGE job

Using Pipelines

You can submit your SAIGE job to the pipeline system via the command line with the following command:

finngen-cli rw -w /path/to/saige.wdl \
                -i /path/to/saige.json \
                -d /path/to/saige_sub.zip

Remember to save your job's ID to keep track of your job and view the output. See also Tips on how to find a pipeline job ID.

Output

Once your job is done running, you can find the output of your SAIGE run from: /finngen/pipeline/cromwell/workflows/saige/[WORKFLOW_ID]/call-test_combine/shard-#/sub.test_combine/[SUB_WORKFLOW_ID]/call-combine/ (each of your phenotypes have their own folder 'shard' number, starting from 0. If you only have one phenotype, there will be only one folder: shard-0)

In the output folder(s) you will find:

2 summary statistics files:

{prefix}_{pheno}.gz: Summary statistics file with columns cleaned for use with pheweb's format
{prefix}_{pheno}.saige.gz: Full summary statistics file

Plots (under glob-* subfolder):

{prefix}_{pheno}.gz_pheweb_pval_manhattan.png: Manhattan plot
{prefix}_{pheno}.gz_pheweb_pval_manhattan_loglog.png: Log-adjusted Manhattan plot
{prefix}_{pheno}.gz_pval_qqplot.png: Quantile-quantile (QQ) plot

Related:

If your pipeline job fails

PreviousConditional Analysis with custom regions and loci NextAdding new covariates in GWAS using REGENIE and SAIGE

Last updated 8 months ago

Was this helpful?