How to run colocalization pipeline
Describe how to run the new colocalization pipeline with coloc susie package
Last updated
Describe how to run the new colocalization pipeline with coloc susie package
Last updated
This pipeline takes the outputs from our finemapping pipeline, and perform colocalization among 571 resources we gathered, including all GWAS endpoints from FinnGen, UKB, eQTL catelogue, Generisk project, proteomics study from INTERVAL, UKB and FinnGen.
Data Source | Data type | Description |
---|---|---|
FinnGen-R12 | GWAS | all endpoints from FinnGen R12 |
GeneRisk | GWAS | GeneRISK Study is an ongoing prospective observational study focusing on genetic risk factors of cardiovascular diseases and on utilizing genetic information in preventing diseases. |
UKB-finucane | GWAS | Some endpoints from UKB shared from Masahiro. https://www.medrxiv.org/content/10.1101/2021.09.03.21262975v1 |
Alasoo_2018--macrophage_naive--ge | eQTL_Catalogue | expression QTL from eQTL catalogue (release 6), gathered from macrophage and based on gene expression, see eQTL catelogue website for more information |
... (other ~560 more items) | eQTL_Catelogue | Other resources from eQTL Catelogue indicated by the data source. eQTL catelogue assembled multiple data sources, e.g., tissue expression from GTEX. |
INTERVAL | Plasma-Proteomics | Proteomics QTL from INTERVAL |
UKB-PPP | Plasma-Proteomics | Proteomics QTL from UKBiobank (Olink) |
FIN-R12-Olink | Plasma-Proteomics | Proteomics QTL from FinnGen R12 (Olink) |
FIN-R12-Somascan | Plasma-Proteomics | Proteomics QTL from FinnGen R12 (Somascan) |
Download the meta data from finemapping pipeline.
Menu(Applications) -> Sandbox -> pipelines and find your successful finemapping run -> click download metadata (assumed to be located in Downloads/XXXX_metadata.json)
Submit the colocalization job in local terminal in the sandbox
Check the errors if there are some.
If no error occurs, pressing the Enter key at the terminal will open a browser to check the jobs. Refresh and look into your submitted job. The job is named "ColocSusieDirectMulti" with your user name, it takes some time to show due to reponse time for the backends in the sandbox.
Download results
The outputs are labeled as "ColocSusieDirectMulti.colocQC" in output of pipeline's job details. We only keep the H4.PP > 0.5 and valid credible set from both dataset (the threshold could be controled in the input). Future filtering should be performed based on your purpose to this output, e.g., H4.PP > 0.8 and overlapped region size. We could not provide a gold standard for this, as it is dependent on the study design and the aim for colocalization.
The raw results are listed in the "ColocSusieDirectMulti.coloc" without any filtering and merging.
"ColocSusieDirectMulti.hit": all the information for the top signals in the full colocalization results.
"ColocSusieDirectMulti.pairs": the overlapped region being run in the workflow.
Column | Description |
---|---|
dataset1 | generated from your trait_name and data_type |
dataset2 | Study--DataType in our resources |
trait1 | the trait name in your data |
trait2 | trait name / molecular phenotype name from our resources |
region1 | region in your data |
region2 | overlapped region in our resources |
cs1 | credible set in your data |
cs2 | credible set in our resources |
nsnps | total variants overlapped |
hit1 | top signal in your data |
hit2 | top signal in our resources |
PP.H4.abf | probability of colocalization between your data and our resources |
low_purity1 | the credible set is low purity or not in your data. (1 means low purity, 0, high purity) |
low_purity2 | the purity in our resources |
nsnps1 | number of variants in region from your data |
nsnps2 | number of variants in region from our resources |
cs1_log10bf | log10 bayes factor for the credible set in your data |
cs2_log10bf | log10 bayes factor for the credible set in our resources |
clpp | colocalization based on CLPP |
clpa | colocalization based on CLPA (min of PIP) |
cs1_size | size of the raw credible set in your data |
cs2_size | size of the raw credible set in our resources |
cs_overlap | size of the overlapped credible set |
topInOverlap | Indicator if a top variant (highest PIP) in each dataset is in the overlap region of finemapped regions of the 2 datasets. 1,1: both orginal top signal located in the overlapped region (expected reasonable coloc); 1,0 /0,1: only one top in the overlapped region; 0,0: both top signal are not in the overlapped. |
hit1_info | information of top signal in your data (beta, p-value) |
hit2_info | information of top signal in our resources (beta, p-value) |
Codes are available on github: https://github.com/FINNGEN/coloc.susie.direct