FinnGen exome query tool
Table of Contents
Introduction
FinnGen exome query tool is a command line interface allowing users to query carriers of variants in FinnGen exome data (link)
Two types of queries are available:
search for carriers of a single variant
search for coding variants in a gene
Query can be invoked by opening a terminal and executing:/finngen/shared_nfs/finngen/exome_query/run_query.
You can create an alias in your /home/ivm/.bashrc .
Open the file with text editor and add the following:
Type source /home/ivm/.bashrc
to activate alias. Now you can use just command exome
to invoke the tool.
See command line instructions with exome -h
and query type specific help for a single variant query (exome var -h)
or gene query (exome gene -h).
Below explanation of the most important concepts
Query types
Search for a single variant
Command var
is used to query a single variant. To get help on this query modes all commands type exome var -h.
Basic command: exome var `variant_id`
`variant_id`
needs to be in format chr:pos:ref:alt in GHRC 38 reference build. Example variant id 2:1503826:G:T. Tool will output summary of variant carriers and their genotype quality control to the screen.
Search for Variants in a Gene
Command gene
is used to query variants within a specific gene. To get help on this query modes all commands type exome gene -h.
Basic command: exome gene `gene_name`
`gene_name`
is the name of the gene in HGNC format. For example, BRCA1. The tool will output a list of variants along with their details within the specified gene.
You can request variants with certain functional consequences by giving a list of comma separated consequences e.g. --consequences missense_variant,stop_gained.
You can use shorthands for selecting all coding variants (--coding_variants
or -coding
) or protein truncating variants (--PTV
or -P
). Coding and PTV consequences used are:
Common options
Genotype filtering
Genotypes can be filtered by Python compatible syntax
--gt_filt ‘python statement resolving to boolean’ -> genotypes not passing are set to missing
The statement can use all fields that are declared in the VCF header (GT,DP,GQ) or computed on the fly (AB,pAB)
Example for filtering genotypes so that they all have to have > 10 reads supporting the genotype. If genotype is heretozygote, require GQ>20 and p-value of allelic balance deviating from 50/50 > 0.05 . If genotype is homozygous reference, require genotype quality > 30
-gt filt 'DP>10 and ( (GT==”0/1” and GQ>20 and pAB>0.05) or (GT==”0/0” and GQ>30) )'
AB and pAB are on the fly computed fields for allelic balance (AB) and probability of deviating from 50/50 balance (pAB)
For information on sequencing genotype quality metrics, see e.g. (https://gatk.broadinstitute.org/hc/en-us/articles/360035531692-VCF-Variant-Call-Format, "Interpreting genotype and other sample-level information" section)
Output options
--export_carriers /out/carriers.tsv
Exports carriers IDs with genotype information to a specified file. NOTE that the start of the path needs to be /out/ and the output will be directed to your home directory in /home/ivm/
--export_case_control /out/case_control.tsv
Exports cohort of carriers and non-carriers of non-ref alleles of variants. Column "COHORT" contains either CARRIERS/NON_CARRIERS for easy import to other sandbox tool (e.g. Cohort Operations)
Phenotypic consequences of variant carriers
Currently query tool does not contain association analyses, instead you are encouraged to import generated carrier files to Sandbox tools like Cohort Operations or LifeTrack.
Last updated