Running quantitative GWAS with REGENIE

Quantitative GWAS is GWAS performed on continuous variables. In this example, we will use the quantitative phenotype Body Mass Index (BMI) with R8 data. For an example of using binary (case-control) data with R7 data see instructions here. An overview of REGENIE is available here.

To run a quantitative GWAS pipeline in the FinnGen Sandbox, you only need to create the phenolist, bgenlist and JSON files yourself. All other files you will need are already present in the FinnGen red library.

Note: From the Sandbox v10.2 onwards, you can also run quantitative GWAS by using Custom GWAS CLI in a quantitative mode for REGENIE pipeline.

!! NB !! Please be cautious with how many GWAS you create and the number of phenotypes you include. Submitting more than ten GWAS jobs simultaneously or GWAS with more than 15 of phenotypes may jam the process and can make your organization's pipeline unusable for others. If you are going to launch more than 5 GWASs or GWAS with tens of phenotypes please contact the humgen-servicedesk@helsinki.fi and we can temporarily increase the resources of your organization's Sandbox and downscale afterward.

Phenolist file

The phenolist file is a text file containing all the names of your analysis phenotype(s). If you have only one phenotype, then this file contains only one line. If you have multiple phenotypes, type one phenotype per line. For example:

BMI

PHENO1

PHENO2

The phenotype file should contain only the name of your phenotype(s). Check that there are no extra blank lines or special characters in the file. You can check all characters in your phenotype file in Terminal using:

cat -A myphenolistfile.txt

An example phenolist file is available in FinnGen Sandbox's shared folder: LIBRARY_SHARED/regenie_input_file_quantitative_analysis/20220127_101913/files/vishal/phenolist_regenie_BMI.txt.

Bgenlist file

The bgenlist file is a text file containing paths to SiSU .bgen files, with one per line. An example file on how to list the input path(s) is also found in the Sandbox shared folder: LIBRARY_SHARED/regenie_input_file_quantitative_analysis/20220127_102131/files/vishal/r8_22_bgenlist.txt.

JSON input file

Create the JSON file for Cromwell. The JSON file for a quantitative GWAS resembles the JSON file here, but for a quantitative GWAS simply set regenie.is_binary to false. See an example of a quantitative GWAS JSON input file on R8 in the shared folder here: /finngen/library-geen/scripts/regenie/regenie_example_R8_quanti_BMI.json

Covariate + phenotype file

You may use some or all of the default covariates or add new covariates. If you like to make a covariate to the REGENIE run please follow the instructions on how to make a covariate + phenotype file for GWAS pipeline.

Running REGENIE

You can run REGENIE by using The Pipelines tool or submitting a pipeline from the command line as shown below.

In Sandbox's Terminal Emulator, type:

finngen-cli rw -w /path/to/your/regenie.wdl \
    -i /path/to/your/regenie.json \
    -d /path/to/your/regenie_sub_wdl.zip

Example:

finngen-cli rw -w /finngen/library-green/scripts/regenie/regenie.wdl \
    -i /finngen/library-geen/scripts/regenie/regenie_example_R8_quanti_BMI.json \
    -d /finngen/library-green/scripts/regenie//regenie_sub_wdl.zip

To monitor the progress of your run in the Pipeline tool, it's a good idea to save your job's ID and name. The job's ID is a long string of numbers and letters that appears on the Recent page of the Pipeline Tool. The name of your job is defined in your wld as "workflow". The default job name is "regenie". See also Tips on how to find a pipeline job ID.

When your job changes to the 'Succeeded' state, you can view your results at /finngen/pipeline/cromwell/workflows/regenie/ID.

These example results for BMI are copied to LIBRARY_SHARED/regenie_input_file_quantitative_analysis/20220127_102622/files/vishal/bmi for easy viewing.

Last updated