Custom GWAS CLI Binary mode

From the Sandbox v10.2 onwards Custom GWAS CLI is available in the binary mode and the quantitative mode for REGENIE pipeline. The binary mode can be run using additive, recessive, or dominant analysis. The minimum count of cases or control cohort is 20. Custom GWAS runs with cases or control cohort counts of less than 20 will fail.

Note: Please be conscious of how many runs you generate while using this method. If you are going to launch more than 5 GWASs at the same time please contact the finngen-servicedesk@helsinki.fi and we can temporarily increase the resources of your organization's Sandbox and downscale afterward. Submitting too many GWASs with default settings can make your organization's pipeline unusable for others.

For instructions on how to use Custom GWAS CLI enter the following command in a terminal:

finngen-cli request-gwas --help

Tip: Custom GWAS in Binary mode with an additive model can also be launched using the custom GWAS module in the Cohort Operations tool.

Running Custom GWAS in Binary mode

We are offering two ways to run GWAS from command line in binary mode:

Option 1:

You can use Atlas identifiers (IDs) for the case and control cohorts. See below how to check your case and cohort IDs.

Note! The case and control cohorts must be generated in the Atlas for the data release (R7, R8, R9, R10, etc) used in the Custom GWAS CLI tool. If not generated before requesting the GWAS run the custom GWAS pipeline job will fail.

Example code when using cohorts created in Atlas as input for Custom GWAS - CLI tool:

finngen-cli request-gwas \
--phenotype-name testPhenotype \
--analysis-description test_run \
--analysistype recessive \
--case-cohort-id 123 \
--control-cohort-id 124 \
--num-cases 16938 \
--num-controls 254085 \
--notification-email my.email@email.com \
--release finngen_R7

Set the analysis type (--analysistype) to additive, recessive, or dominant depending on the model you like to use (in the example above, recessive model is being used). If you don't select anything the Custom GWAS CLI will be run under the additive model (default).

Option 2:

You can make your own text files for cases and controls using R, text editor, or any other tool you like. There should be one file for cases and the other for controls. No column headings are allowed. (There is also no "1" or "0" as this is redundant when you specify which file is cases and which controls. Having anything in the file other than FG IDs will cause read errors.)

If you have a file with multiple columns (say from the genotype browser) you can create a file with just one column by using a command such as this one:

cat myfile | awk '{print $1}' > mynewfile

You will also need to remove the column headers if there are any.

The correct format for cases and control files is as follows:

FG00000001

FG00000002

FG00000003

FG00000004

FG00000005

FG00000006

An example code when you use cases and controls in text files in Custom GWAS - CLI tool:

finngen-cli request-gwas \
--phenotype-name testPhenotype \
--analysis-description test_run \
--binary true \
--analysistype dominant \
--casefile /path/to/cases.txt \
--controlfile /path/to/controls.txt \
--notification-email my.email@email.com \
--release finngen_R10

Set the analysis type (--analysistype) to additive, recessive, or dominant depending on the model you like to use (in the example above, dominant model is being used). If you don't select anything the Custom GWAS CLI will be run under the additive model (default).

Option 3:

You can make your own text files for cases and controls using R, text editor or any other tool you like. You can make your own text file, a 'phenofile', such as that used by plink or SAIGE.

If you have a file with multiple columns (say from the genotype browser) you can create a file with just one column by using a command such as this one:

cat myfile | awk '{print $1}' > mynewfile

A phenofile should have two tab-separated columns. In the first column are FinnGen IDs and the second column should have 1s and 0s for cases and controls, respectively. Column headings are expected.

An example phenofile as follows:

FIDCASECONTROL

FG00000001

1

FG00000002

0

FG00000003

0

FG00000004

1

FG00000005

1

FG00000006

0

Example code to use when using phenofile as an input for Custom GWAS - CLI tool:

finngen-cli request-gwas \
--phenotype-name CASECONTROL \
--analysis-description test_run \
--analysistype additive \
--phenofile /path/to/phenofile.tsv \
--notification-email my.email@email.com \
--release finngen_R9

Set the analysis type (--analysistype) to additive, recessive, or dominant depending on the model you like to use (in the example above, additive model is being used). If you don't select anything the Custom GWAS CLI will be run under the additive model (default).

Note: when using phenofile the phenotype-name value must match to phenotype column header (as above see "casecontrol" in the column header of the file and in the command that is used to launch the GWAS).

When GWAS-CLI is successfully submitted you should see the following text

You can check the status of your run from the custom GWAS tool RECENTS menu

Custom GWAS result tables

In a web browser outside the Sandbox, for recent user results since the release of DF11 (May 2023) go to https://userresults.finngen.fi. For older user results go to https://userresults-old.finngen.fi/.

Summary files and plots for custom GWAS runs are also available in the green library at https://console.cloud.google.com/storage/browser/finngen-production-library-green/finngen_R<no>/sandbox_custom_gwas/<phenotype_name>. Where <no> is the data freeze number and <phenotype_name> is the name user give for the phenotype after --phenotype-name (e.g. CASECONTROL in the examples above). From Sandbox v 11.0 onwards the metadata file will be exported to the green library with the summary data. The final number of cases and controls in the GWAS run can be checked from the metadata file.

All custom GWAS result tables are also saved in the Cromwell jobs directory in Sandbox /finngen/pipeline/cromwell/workflows/[workflow_name]/[workflow_ID].

They will also appear as a job to the pipelines tool. Save your run pipeline ID from the pipelines tool front page. This is useful if you later wish to run, for example, a finemapping pipeline for those specific custom GWAS results.

See also:

Tips on how to find a pipeline job ID.

Last updated