How to run GWAS using REGENIE

As of DF7, all FinnGen core endpoint analyses switched to using REGENIE. REGENIE is basically an improved version of SAIGE with a few added computational efficiences and better effect size estimates, and is thus recommended to be used unless you specifically want to run analyses similar to those in FinnGen releases 1-6 (in which case, see How to run GWAS using SAIGE).

Note: The REGENIE pipeline can also be run using custom GWAS tools. From the Sandbox v10.2 onwards Custom GWAS CLI is available for both binary and quantitative phenotypes, using REGENIE pipeline. In addition to additive model, also recessive and dominant analysis are available in Custom GWAS CLI

!! NB !! Please be cautious with how many GWAS you create and the number of phenotypes you include. If you are going to launch more than 5 GWASs or GWAS with tens of phenotypes please contact the humgen-servicedesk@helsinki.fi so that we can temporarily increase the resources of your organization's Sandbox and downscale afterward. After resources have been increased, we recommend that you would run a single GWAS job every 30 minutes (in a bash script you can use ‘sleep 30m’ in your loop) such that you would run two phenotypes in an hour allowing you to run ~40 jobs in 24 hours. This helps avoid jamming the process and permits other users in your organization to use your organization’s pipeline.

Prepare your files for REGENIE

Before you can submit your job, you need to download example files needed, and edit the .json file, that looks like this:

The parts you should edit in the .json- file are highlighted in the figure, and are:

  • regenie.phenolist: the path to a phenotype list file. A phenotype list file is a text file with each row representing a phenotypic trait (similar to SAIGE), for example:

I9_CHD
T1D_WIDE

(Note: Multiple correlated phenotypes with missing values of less than 5% can be grouped as a single row separated by a tab in the file. However, we still recommend running each phenotype separately.)

  • regenie.cov_pheno: the path to a phenotype-covariate file. The pheno-covariate file is a tab- separated (possibly gzipped) .txt file containing all phenotype and covariate columns. The first two columns of the file should be FID and IID. Please provide the same sample ID in both columns:

FID    IID
FGID1    FGID1
FGID2    FGID2
FGID3    FGID3

NOTE: Make sure that there is no space in the pheno-covariate file!

  • regenie.covariates: List of covariate column names, separated by column; for example: "age,gender". NOTE: In the example .json file (regenie_example_R9.json) there are already defined covariates used in the R9 core GWAS: age, sex, genotyping batch eand PC1-10.

  • regenie.is_binary: true if your phenotype is binary (e.g. case-control), false if quantitative (e.g. BMI). Defines whether to run a logistic or linear model. See another example from Running quantitative GWAS with REGENIE.

You may also need to edit:

  • regenie.sub_step2.step2.test: defines the association model type (additive, recessive or dominant) used in the GWAS. In the example model it is additive ("normal" GWAS). Unless you are specifically running a recessive or dominant model, there is no need to change this setting.

Logistic or linear?

In REGENIE, you'll define whether to use a logistic or linear model by setting in the .json file regenie.is_binary as true for a logistic model and false for a linear model, for binary and continuous traits respectively. If running a REGENIE model for a quantitative trait, you can also use this example.

Some things to consider:

  • Make sure that your edited phenotype-covariate file is in /finngen/red/. Note that for copying files to /finngen/red/, you need to use gsutil and gs://fg-production-sandbox-<NO>-red/ path for /finngen/red.

  • Bucket paths in the .json file need to follow the form proposed in buckets.txt when specifying the inputs (e.g. for the modified .json file).

  • Make sure that you are using the latest version of REGENIE in regenie.sub_step1.step1.docker and regenie.sub_step2.docker.

See how the Sandbox paths and pipelines are mapped here.

Example files for the REGENIE pipeline

You can find the example files for running REGENIE in Sandbox from:

/finngen/library-green/scripts/regenie/:

  • .json files (needs to be edited!):

    • regenie_example_R9.json

    • regenie_example_R10.json and

    • regenie_example_R11.json

  • .wdl file: regenie.wdl *

  • sub-.wdl files as one zipped file: regenie_sub_wdl.zip*

*NOTE: all the files (wdl's and json- files) were updated in April (see User's meeting recording from April 2023). Please update all

These are examples to help you understand how to run REGENIE, using the endpoint J10_ASTHMA_EXMORE in DF9 (regenie_example_R9.json), in DF10 (regenie_example_R10.json) and in DF11 (regenie_example_R11.json).

Covariate + phenotype file

You may use some or all of the default covariates or add new covariates. If you like to make a covariate to the REGENIE run please follow the instructions on how to make a covariate + phenotype file for GWAS pipeline.

Submit your REGENIE job

If you're running REGENIE using Sandbox Pipelines, it's a good idea to first read the sections Pipelines is based on Cromwell and WDL, How to use the Pipelines tool and How to submit a pipeline from the command line.

Using command line

Once your files are in order, you can submit your run by typing the following command in the FinnGen terminal:

finngen-cli rw -w /path/to/regenie.wdl \
                -i /path/to/your.json \
                -d /path/to/regenie_sub_wdl.zip

REMEMBER to save your job ID [WORKFLOW_ID]to keep track of your job and to be able to view the output! See also tips on how to find a pipeline job ID. The [WORKFLOW_ID] and your job can be monitored from the pipelines:

Output

Once your job is successfully done, you can find your output files from: /finngen/pipeline/cromwell/workflows/regenie/[WORKFLOW_ID]/call-sub_step2/shard-#/sub.regenie_step2/[SUBWORKFLOW_ID]/call-gather/shard-#/

Running Regenie (R7) in the Sandbox was presented in User Meeting 24th of August 2021

Related:

Last updated