Colocalizations in FinnGen
Last updated
Last updated
Our colocalization approach uses the probabilistic model for integrating GWAS and eQTL data presented in eCAVIAR (Hormozdiari et al. 2016). Compared to eCAVIAR, we are using SuSiE (Wang et al. 2019) to fine-map our inputs and provide an additional colocalization metric (CLPA).
Our goal is to extract a list of genomic regions that show colocalization between two phenotypes p1 and p2. Further, we assume that the summary statistics of p1 and p2 have been finemapped. The finemapping output for each phenotype contains three columns: the variant identifier (VAR), posterior inclusion probability (PIP), and the credible set (CS) identifier.
The Causal Posterior Probability (CLPP) is computed between two credible sets cs1 and cs2, with cs1 coming from a given phenotype p1 and cs2 coming from phenotype p2. CLPP is defined as follows: For vectors x and y, containing the PIP for variants in cs1 and cs2, respectively, CLPP is calculated by
This CLPP calculation is similar to equation 8 in Hormozdiari et al. 2016.
CLPP is dependent on the credible set size. By definition, any credible set size > 1 will yield a CLPP < 1.
We derived another colocalization metric called causal posterior agreement (CLPA) that is independent of credible set size.
The picture below shows how colocalizations are defined.
Example Comparison
This rough example shows why we mostly use CLPA since it is independent of sample size.
The colocalization is performed between FinnGen endpoints as well as between FinnGen endpoints and various QTL resources, as shown in the image below.
These resources are listed below:
FinnGen resources
The SuSiE finemapping results for the release were used as the FinnGen data.
Expression QTL datasets
- GTEx v8: SuSiE fine-mapping, 49 tissues, donors of mixed ancestry, Aguet et al. (2019, BioRxiv) (49 tissues only involve tissues with a sample size of n >= 50). Fine-mapping performed by Hilary Finucane, Jacob Ulirsch, Masahiro Kanai from the Finucane Lab. Effect size interpretation: change in normalised gene expression (sd units) per alternate allele. Normalization = inverse normal transformation.
- EMBL-EBI (European Bioinformatics Institute) eQTL catalogue datasets. eQTL data from 24 tissues/cell types, 16 RNAseq sources, 6 Microarray, SuSiE fine-mapping, donors of 88% European ancestry, Kerimov et al. (2020, BioRxiv). For RNAseq data, four quantification methods (gene expression, exon expression, transcript usage, txrevise event usage). Fine-mapping was performed by Kaur Alasoo and Nurlan Kerimov. Effect size interpretation: change in normalised gene expression (sd units) per alternate allele. Normalization = inverse normal transformation.
- FUSION study (RNAseq), muscle and adipose tissue.
- Kolberg: mega-analysis of immune cells from the microarray datasets.
Metabolom QTL datasets
- GeneRISK: 186 lipid species QTLs, SuSiE fine-mapping of Widen et al. (2020), 7632 Finnish samples. Effect size interpretation: change in standard deviation of the lipid species per alternate allele.
Biomarkers
- UK Biobank: 36 continuous endpoints, 57 biomarkers from UKBB prepared by Finucane lab, 361'194 White British samples, SuSiE fine-mapping. Effect size interpretation for quantitative traits: change in standard deviation of the normalized outcome per alternate allele. Effect size interpretation for binary traits increase in log(odds ratios) per alternate allele.
Only unique source1-source2-pheno1-pheno2-tissue2-quant2-locus_id1-locus_id2 combinations were included in the results. FinnGen endpoints with _COMORB-definition were left out of the results.
The following resources are released each release (with release number changing between releases):
File
Description
fg_r6_fg_r6.txt.gz
Colocalization between FinnGen endpoints
fg_r6_gtex.txt.gz
Colocalization between FinnGen endpoints and GTEx eQTL
fg_r6_generisk.txt.gz
Colocalization between FinnGen endpoints and lipid QTLs from GeneRISK study
fg_r6_ukbb.txt.gz
Colocalization between FinnGen endpoints and selected UKBB Biomarkers
fg_r6_ebi_eqtl_catalogue.txt.gz
Colocalization between FinnGen endpoints and EMBL-EBI eQTL catalogue
Each {source1}_{source2}.txt.gz file contains the colocalizations between the endpoints/QTL/phenotypes in the two resources. Each line on that file represents a colocalization of pthenotype 1 in source 1 with phenotype 2 in source 2 in a specific genomic region.
In addition to the datafiles, the following documentation is also included in each release:
Documentation
Description
methods.pdf
Description of colocalization method and data
data_dictionary.txt
Column description of all data files
readme.md
Colocalization release notes
We thank the following people for helping us assembling the QTL resources:
- Kaur Alasoo and Nurlan Kerimov provided us the fine-mapped EMBL-EBI eQTL catalogue datasets.
- Hilary Finucane, Jacob Ulirsch, Masahiro Kanai gave us access to their fine-mapped GTEx data.
Read more about colocalization and file format of the FinnGen colocalization results.