Custom GWAS CLI Binary mode
Last updated
Last updated
From the Sandbox v10.2 onwards Custom GWAS CLI is available in the binary mode and the quantitative mode for REGENIE pipeline. The binary mode can be run using additive, recessive, or dominant analysis. The minimum count of cases or control cohort is 20. Custom GWAS runs with cases or control cohort counts of less than 20 will fail.
Note: Please be conscious of how many runs you generate while using this method. If you are going to launch more than 5 GWASs at the same time please contact the finngen-servicedesk@helsinki.fi and we can temporarily increase the resources of your organization's Sandbox and downscale afterward. Submitting too many GWASs with default settings can make your organization's pipeline unusable for others.
For instructions on how to use Custom GWAS CLI enter the following command in a terminal:
finngen-cli request-gwas --help
Tip: Custom GWAS in Binary mode with an additive model can also be launched using the custom GWAS module in the Cohort Operations tool.
We are offering two ways to run GWAS from command line in binary mode:
You can use Atlas identifiers (IDs) for the case and control cohorts. See below how to check your case and cohort IDs.
Note! The case and control cohorts must be generated in the Atlas for the data release (R7, R8, R9, R10, etc) used in the Custom GWAS CLI tool. If not generated before requesting the GWAS run the custom GWAS pipeline job will fail.
Example code when using cohorts created in Atlas as input for Custom GWAS - CLI tool:
Set the analysis type (--analysistype
) to additive
, recessive
, or dominant
depending on the model you like to use (in the example above, recessive model is being used). If you don't select anything the Custom GWAS CLI will be run under the additive model (default).
You can make your own text files for cases and controls using R, text editor, or any other tool you like. There should be one file for cases and the other for controls. No column headings are allowed. (There is also no "1" or "0" as this is redundant when you specify which file is cases and which controls. Having anything in the file other than FG IDs will cause read errors.)
If you have a file with multiple columns (say from the genotype browser) you can create a file with just one column by using a command such as this one:
You will also need to remove the column headers if there are any.
The correct format for cases and control files is as follows (The use of FINNGENID as header is also allowed):
FG00000001 |
FG00000002 |
FG00000003 |
FG00000004 |
FG00000005 |
FG00000006 |
An example code when you use cases and controls in text files in Custom GWAS - CLI tool:
Set the analysis type (--analysistype
) to additive, recessive, or dominant depending on the model you like to use (in the example above, dominant model is being used). If you don't select anything the Custom GWAS CLI will be run under the additive model (default).
You can make your own text files for cases and controls using R, text editor or any other tool you like. You can make your own text file, a 'phenofile', such as that used by plink or SAIGE.
If you have a file with multiple columns (say from the genotype browser) you can create a file with just one column by using a command such as this one:
cat myfile | awk '{print $1}' > mynewfile
A phenofile should have two tab-separated columns. In the first column are FinnGen IDs and the second column should have 1s and 0s for cases and controls, respectively. Column headings are expected.
An example phenofile as follows (FID and FINNGENID are acceptable as header, lowercase letters in phenotype name will be converted to uppercase by the client):
FID | CASECONTROL |
---|---|
FG00000001 | 1 |
FG00000002 | 0 |
FG00000003 | 0 |
FG00000004 | 1 |
FG00000005 | 1 |
FG00000006 | 0 |
Example code to use when using phenofile as an input for Custom GWAS - CLI tool:
Set the analysis type (--analysistype
) to additive, recessive, or dominant depending on the model you like to use (in the example above, additive model is being used). If you don't select anything the Custom GWAS CLI will be run under the additive model (default).
Note: when using phenofile the phenotype-name value must match to phenotype column header (as above see "casecontrol" in the column header of the file and in the command that is used to launch the GWAS).
When GWAS-CLI is successfully submitted you should see the following text
You can check the status of your run from the custom GWAS tool RECENTS menu
In a web browser outside the Sandbox, for recent user results since the release of DF11 (May 2023) go to https://userresults.finngen.fi. For older user results go to https://userresults-old.finngen.fi/.
Summary files and plots for custom GWAS runs are also available in the green library at https://console.cloud.google.com/storage/browser/finngen-production-library-green/finngen_R<no>/sandbox_custom_gwas/<phenotype_name>. Where <no> is the data freeze number and <phenotype_name> is the name user give for the phenotype after --phenotype-name
(e.g. CASECONTROL in the examples above). From Sandbox v 11.0 onwards the metadata file will be exported to the green library with the summary data. The final number of cases and controls in the GWAS run can be checked from the metadata file.
All custom GWAS result tables are also saved in the Cromwell jobs directory in Sandbox /finngen/pipeline/cromwell/workflows/[workflow_name]/[workflow_ID]
.
They will also appear as a job to the pipelines tool. Save your run pipeline ID from the pipelines tool front page. This is useful if you later wish to run, for example, a finemapping pipeline for those specific custom GWAS results.