FinnGen exome query tool

Table of Contents

Introduction

FinnGen exome query tool is a command line interface allowing users to query carriers of variants in FinnGen exome data (link)

Two types of queries are available:

  • search for carriers of a single variant

  • search for coding variants in a gene

Query can be invoked by opening a terminal and executing:/finngen/shared_nfs/finngen/exome_query/run_query. You can create an alias in your /home/ivm/.bashrc . Open the file with text editor and add the following:

exome () {
    /finngen/shared_nfs/finngen/exome_query/run_query "$@"
}

Type source /home/ivm/.bashrc to activate alias. Now you can use just command exome to invoke the tool.

See command line instructions with exome -h and query type specific help for a single variant query (exome var -h) or gene query (exome gene -h). Below explanation of the most important concepts

Query types

Search for a single variant

Command var is used to query a single variant. To get help on this query modes all commands type exome var -h.

Basic command: exome var `variant_id`

`variant_id` needs to be in format chr:pos:ref:alt in GHRC 38 reference build. Example variant id 2:1503826:G:T. Tool will output summary of variant carriers and their genotype quality control to the screen.

Search for Variants in a Gene

Command gene is used to query variants within a specific gene. To get help on this query modes all commands type exome gene -h.

Basic command: exome gene `gene_name`

`gene_name` is the name of the gene in HGNC format. For example, BRCA1. The tool will output a list of variants along with their details within the specified gene.

You can request variants with certain functional consequences by giving a list of comma separated consequences e.g. --consequences missense_variant,stop_gained. You can use shorthands for selecting all coding variants (--coding_variants or -coding) or protein truncating variants (--PTV or -P). Coding and PTV consequences used are:

PTV_CONSEQUENCES = [
    "stop_gained",
    "frameshift_variant",
    "splice_acceptor_variant",
    "splice_donor_variant",
    "start_lost",
    "stop_lost",
]

CODING_CONSEQUENCES = [
    "missense_variant",
    "stop_gained",
    "frameshift_variant",
    "splice_acceptor_variant",
    "splice_donor_variant",
    "start_lost",
    "stop_lost",
    "inframe_insertion",
    "inframe_deletion",
]

Common options

Genotype filtering

Genotypes can be filtered by Python compatible syntax

  • --gt_filt ‘python statement resolving to boolean’ -> genotypes not passing are set to missing

  • The statement can use all fields that are declared in the VCF header (GT,DP,GQ) or computed on the fly (AB,pAB)

  • Example for filtering genotypes so that they all have to have > 10 reads supporting the genotype. If genotype is heretozygote, require GQ>20 and p-value of allelic balance deviating from 50/50 > 0.05 . If genotype is homozygous reference, require genotype quality > 30 -gt filt 'DP>10 and ( (GT==”0/1” and GQ>20 and pAB>0.05) or (GT==”0/0” and GQ>30) )'

  • AB and pAB are on the fly computed fields for allelic balance (AB) and probability of deviating from 50/50 balance (pAB)

  • For information on sequencing genotype quality metrics, see e.g. (https://gatk.broadinstitute.org/hc/en-us/articles/360035531692-VCF-Variant-Call-Format, "Interpreting genotype and other sample-level information" section)

Output options

--export_carriers /out/carriers.tsv

Exports carriers IDs with genotype information to a specified file. NOTE that the start of the path needs to be /out/ and the output will be directed to your home directory in /home/ivm/

--export_case_control /out/case_control.tsv

Exports cohort of carriers and non-carriers of non-ref alleles of variants. Column "COHORT" contains either CARRIERS/NON_CARRIERS for easy import to other sandbox tool (e.g. Cohort Operations)

Phenotypic consequences of variant carriers

Currently query tool does not contain association analyses, instead you are encouraged to import generated carrier files to Sandbox tools like Cohort Operations or LifeTrack.

Last updated