How to run colocalization pipeline
Describe how to run the new colocalization pipeline with coloc susie package
Last updated
Was this helpful?
Describe how to run the new colocalization pipeline with coloc susie package
Last updated
Was this helpful?
This pipeline takes the outputs from our finemapping pipeline, and perform colocalization among 571 resources we gathered, including all GWAS endpoints from FinnGen, UKB, eQTL catelogue, Generisk project, proteomics study from INTERVAL, UKB and FinnGen.
FinnGen-R12
GWAS
all endpoints from FinnGen R12
GeneRisk
GWAS
GeneRISK Study is an ongoing prospective observational study focusing on genetic risk factors of cardiovascular diseases and on utilizing genetic information in preventing diseases.
UKB-finucane
GWAS
Some endpoints from UKB shared from Masahiro. https://www.medrxiv.org/content/10.1101/2021.09.03.21262975v1
Alasoo_2018--macrophage_naive--ge
eQTL_Catalogue
expression QTL from eQTL catalogue (release 6), gathered from macrophage and based on gene expression, see eQTL catelogue website for more information
... (other ~560 more items)
eQTL_Catelogue
Other resources from eQTL Catelogue indicated by the data source. eQTL catelogue assembled multiple data sources, e.g., tissue expression from GTEX.
INTERVAL
Plasma-Proteomics
Proteomics QTL from INTERVAL
UKB-PPP
Plasma-Proteomics
Proteomics QTL from UKBiobank (Olink)
FIN-R12-Olink
Plasma-Proteomics
Proteomics QTL from FinnGen R12 (Olink)
FIN-R12-Somascan
Plasma-Proteomics
Proteomics QTL from FinnGen R12 (Somascan)
Download the meta data from finemapping pipeline.
Menu(Applications) -> Sandbox -> pipelines and find your successful finemapping run -> click download metadata (assumed to be located in Downloads/XXXX_metadata.json)
Submit the colocalization job in local terminal in the sandbox
Check the errors if there are some.
If no error occurs, pressing the Enter key at the terminal will open a browser to check the jobs. Refresh and look into your submitted job. The job is named "ColocSusieDirectMulti" with your user name, it takes some time to show due to reponse time for the backends in the sandbox.
Download results
The outputs are labeled as "ColocSusieDirectMulti.colocQC" in output of pipeline's job details. We only keep the H4.PP > 0.5 and valid credible set from both dataset (the threshold could be controled in the input). Future filtering should be performed based on your purpose to this output, e.g., H4.PP > 0.8 and overlapped region size. We could not provide a gold standard for this, as it is dependent on the study design and the aim for colocalization.
The raw results are listed in the "ColocSusieDirectMulti.coloc" without any filtering and merging.
"ColocSusieDirectMulti.hit": all the information for the top signals in the full colocalization results.
"ColocSusieDirectMulti.pairs": the overlapped region being run in the workflow.
dataset1
generated from your trait_name and data_type
dataset2
Study--DataType in our resources
trait1
the trait name in your data
trait2
trait name / molecular phenotype name from our resources
region1
region in your data
region2
overlapped region in our resources
cs1
credible set in your data
cs2
credible set in our resources
nsnps
total variants overlapped
hit1
top signal in your data
hit2
top signal in our resources
PP.H4.abf
probability of colocalization between your data and our resources
low_purity1
the credible set is low purity or not in your data. (1 means low purity, 0, high purity)
low_purity2
the purity in our resources
nsnps1
number of variants in region from your data
nsnps2
number of variants in region from our resources
cs1_log10bf
log10 bayes factor for the credible set in your data
cs2_log10bf
log10 bayes factor for the credible set in our resources
clpp
colocalization based on CLPP
clpa
colocalization based on CLPA (min of PIP)
cs1_size
size of the raw credible set in your data
cs2_size
size of the raw credible set in our resources
cs_overlap
size of the overlapped credible set
topInOverlap
Indicator if a top variant (highest PIP) in each dataset is in the overlap region of finemapped regions of the 2 datasets. 1,1: both orginal top signal located in the overlapped region (expected reasonable coloc); 1,0 /0,1: only one top in the overlapped region; 0,0: both top signal are not in the overlapped.
hit1_info
information of top signal in your data (beta, p-value)
hit2_info
information of top signal in our resources (beta, p-value)
Codes are available on github: https://github.com/FINNGEN/coloc.susie.direct