Colocalizations in FinnGen

Our colocalization approach uses the probabilistic model for integrating GWAS and eQTL data presented in eCAVIAR (Hormozdiari et al. 2016). Compared to eCAVIAR, we are using SuSiE (Wang et al. 2019) to fine-map our inputs and provide an additional colocalization metric (CLPA).

Our goal is to extract a list of genomic regions that show colocalization between two phenotypes p1 and p2. Further, we assume that the summary statistics of p1 and p2 have been finemapped. The finemapping output for each phenotype contains three columns: the variant identifier (VAR), posterior inclusion probability (PIP), and the credible set (CS) identifier.

CLPP

The Causal Posterior Probability (CLPP) is computed between two credible sets cs1 and cs2, with cs1 coming from a given phenotype p1 and cs2 coming from phenotype p2. CLPP is defined as follows: For vectors x and y, containing the PIP for variants in cs1 and cs2, respectively, CLPP is calculated by

This CLPP calculation is similar to equation 8 in Hormozdiari et al. 2016.

CLPP is dependent on the credible set size. By definition, any credible set size > 1 will yield a CLPP < 1.

CLPA

We derived another colocalization metric called causal posterior agreement (CLPA) that is independent of credible set size.

The picture below shows how colocalizations are defined.

Example Comparison

This rough example shows why we mostly use CLPA since it is independent of sample size.

Data

The colocalization is performed between FinnGen endpoints as well as between FinnGen endpoints and various QTL resources, as shown in the image below.

These resources are listed below:

FinnGen resources

The SuSiE finemapping results for the release were used as the FinnGen data.

Expression QTL datasets

- GTEx v8: SuSiE fine-mapping, 49 tissues, donors of mixed ancestry, Aguet et al. (2019, BioRxiv) (49 tissues only involve tissues with a sample size of n >= 50). Fine-mapping performed by Hilary Finucane, Jacob Ulirsch, Masahiro Kanai from the Finucane Lab. Effect size interpretation: change in normalised gene expression (sd units) per alternate allele. Normalization = inverse normal transformation.

- EMBL-EBI (European Bioinformatics Institute) eQTL catalogue datasets. eQTL data from 24 tissues/cell types, 16 RNAseq sources, 6 Microarray, SuSiE fine-mapping, donors of 88% European ancestry, Kerimov et al. (2020, BioRxiv). For RNAseq data, four quantification methods (gene expression, exon expression, transcript usage, txrevise event usage). Fine-mapping was performed by Kaur Alasoo and Nurlan Kerimov. Effect size interpretation: change in normalised gene expression (sd units) per alternate allele. Normalization = inverse normal transformation.

- FUSION study (RNAseq), muscle and adipose tissue.

- Kolberg: mega-analysis of immune cells from the microarray datasets.

Metabolom QTL datasets

- GeneRISK: 186 lipid species QTLs, SuSiE fine-mapping of Widen et al. (2020), 7632 Finnish samples. Effect size interpretation: change in standard deviation of the lipid species per alternate allele.

Biomarkers

- UK Biobank: 36 continuous endpoints, 57 biomarkers from UKBB prepared by Finucane lab, 361'194 White British samples, SuSiE fine-mapping. Effect size interpretation for quantitative traits: change in standard deviation of the normalized outcome per alternate allele. Effect size interpretation for binary traits increase in log(odds ratios) per alternate allele.

Post-colocalization QC

Only unique source1-source2-pheno1-pheno2-tissue2-quant2-locus_id1-locus_id2 combinations were included in the results. FinnGen endpoints with _COMORB-definition were left out of the results.

Release outputs

The following resources are released each release (with release number changing between releases):

File

Description

fg_r6_fg_r6.txt.gz

Colocalization between FinnGen endpoints

fg_r6_gtex.txt.gz

Colocalization between FinnGen endpoints and GTEx eQTL

fg_r6_generisk.txt.gz

Colocalization between FinnGen endpoints and lipid QTLs from GeneRISK study

fg_r6_ukbb.txt.gz

Colocalization between FinnGen endpoints and selected UKBB Biomarkers

fg_r6_ebi_eqtl_catalogue.txt.gz

Colocalization between FinnGen endpoints and EMBL-EBI eQTL catalogue

Each {source1}_{source2}.txt.gz file contains the colocalizations between the endpoints/QTL/phenotypes in the two resources. Each line on that file represents a colocalization of pthenotype 1 in source 1 with phenotype 2 in source 2 in a specific genomic region.

In addition to the datafiles, the following documentation is also included in each release:

Documentation

Description

methods.pdf

Description of colocalization method and data

data_dictionary.txt

Column description of all data files

readme.md

Colocalization release notes