How to run genome-wide association studies (GWAS)
Last updated
Last updated
There are many possibilities and programs to perform GWAS on your phenotype of interest in Sandbox, depending on your needs. Currently (from DF7 and DF9 onwards), the core and custom GWAS analyses are performed using REGENIE, and unless you specifically needs to use some other tool, REGENIE is generally recommended.
Here is a flowchart to help you to choose software to use in your case:
It is crucial to your analysis results that you choose the correct model (logistic/linear) for your analysis. This depends on your phenotype: for binary phenotypes, use a logistic model, and for continuous phenotypes use a linear model.
All software (except GATE, which performs survival modelling) in Sandbox that is designed to perform GWAS (REGENIE, SAIGE and plink2) can perform both logistic and linear models, but the way of defining the type of your model/phenotype differs across programs. Please see detailed instructions on how to define this in REGENIE, SAIGE or plink2 based on what you plan on using.
Note! The easiest way to conduct a GWAS is to use the custom GWAS tools. From the Sandbox v10.2 onwards Custom GWAS CLI is available for both binary and quantitative phenotypes, using REGENIE pipeline. In addition to additive model, also recessive and dominant analysis are available in Custom GWAS CLI.
REGENIE and SAIGE both perform logistic and linear mixed models (meaning related individuals can be included in the analysis). For binary traits, both use saddlepoint approximation (SPA) to calibrate unbalanced case-control ratios. Therefore, REGENIE is basically an improved SAIGE, with two major advantages:
1) it is faster than SAIGE and
2) when working with binary traits, the Firth correction used in REGENIE provides much more reasonable effect-size estimates and standard errors when the minor allele count is low, compared to SAIGE.
For FinnGen releases 1-6, the core GWAS were performed using SAIGE. Therefore, if you want to run similar GWAS as in those releases for your phenotypes, please use SAIGE. Otherwise, it is recommended that you use REGENIE.