How to run finemapping pipeline

By finemapping, we mean identifying the variants with the highest probability of being causal from your GWAS summary statistics. In the Sandbox, we have a pipeline for doing this using summary statistics. The pipeline calculates the LD using LDstore2 and then using LD matrices and summary statistics finemaps each genome-wide significant loci (p < 5e-8) in given summary statistics using both FINEMAP and SUSIE. A detailed description of the pipeline can be found from github.

Alternatively, users can provide their own custom regions to finemap instead of the automatic selection for regions with genome-wide significant variants.

Example files for running finemapping pipeline can be found in sandbox from /finngen/library-green/scripts/finemap/:

  • .json file: finemap_inputs.json and finemap_DF11.json and finemap_DF11_custom_bed.json

  • .wdl- file: finemap.wdl

  • sub wdl- file: finemap_sub.wdl.zip

These example scripts are for running finemapping pipeline for two endpoints, I9_CORATHER and I9_MI_STRICT in FinnGEN DF7 (finemap_inputs.json) and in DF11 (finemap_DF11.json).

Once you have copied the example scripts, you need to edit finemap_inputs.json according to your needs. The parts you (may) need to edit are:

  • "finemap.sumstats_pattern": Path to a summary statistics file(s). Replace phenotype names in the file names with {PHENO}. Example summary statistics files for DF7: "gs://finngen-production-library-green/finngen_R7/finngen_R7_analysis_data/summary_stats/release/finngen_R7_{PHENO}.gz",

  • "finemap.phenolistfile": Path to a .txt file containing list of phenotypes to run, one phenotype on each row.

  • OPTIONAL FOR CUSTOM BED REGIONS. Omit in case default region selection is required. "finemap.bed_regions_file": Path to a .txt file containing list of bedfiles that give the regions to finemap. The bed files for each pheno should be given in the same order as in the phenolistfile. Use integers for chromosome names 1-23!! Example for DF11 can be found in /finngen/library-green/scripts/finemap/finemap_DF11_custom_bed.json

  • "finemap.phenotypes": Path to a phenotype-covariate file. Example for DF7 can be found at: /finngen/library-red/finngen_R7/phenotype_4.0/data/finngen_R7_cov_pheno_1.0.txt.gz

  • "finemap.ldstore_finemap.ldstore.sample": Path to a sample file corresponding to your bgen- file(s).

  • "finemap.ldstore_finemap.ldstore.bgen_pattern": Path to a bgen files. Replace {chrom} with {CHR}. An example for full DF7: /finngen/library-red/finngen_R7/bgen_2.0/data/finngen_R7_{CHR}.bgen

  • "finemap.ldstore_finemap.filter_and_summarize.snp_annot_file": Path to a variant annotation file. For R7, it can be found at: /finngen/library-green/finngen_R7/finngen_R7_analysis_data/annotations/R7_annotated_variants_v1.gz

  • "finemap.ldstore_finemap.filter_and_summarize.snp_annot_file_tbi": Path to a index file for variant annotation file. For R7, it can be found at: /finngen/library-green/finngen_R7/finngen_R7_analysis_data/annotations/R7_annotated_variants_v1.gz.tbi

For the next one's, make sure they correspond to your summary statistics file, these examples are for the released DF7 summary statistics:

  • "finemap.preprocess.rsid_col": "",

  • "finemap.preprocess.chromosome_col": "#chrom",

  • "finemap.preprocess.position_col": "pos",

  • "finemap.preprocess.allele1_col": "ref",

  • "finemap.preprocess.allele2_col": "alt",

  • "finemap.preprocess.freq_col": "af_alt",

  • "finemap.preprocess.beta_col": "beta",

  • "finemap.preprocess.se_col": "sebeta",

  • "finemap.preprocess.p_col": "pval",

  • "finemap.preprocess.delimiter": "TAB",

You can submit your job to pipelines via command line:

finngen-cli request-workflow --wdl /path/to/finemap.wdl \
    --input /path/to/finemap_inputs.json \
    --dependencies /path/to/finemap_sub.wdl.zip

Note: Remember to save the [WORKFLOW_ID] of your job for later monitoring and checking for the results! See also tips on how to find a pipeline job ID.

When your job is successfully completed, you can find your FINEMAP results in: /finngen/pipelines/cromwell/workflows/finemap/[WORKFLOW_ID]/call-ldstrore_finemap/shard-0/sub.ldstore_finemap/[sub_workflow_id]/call-finemap/shard-#/ (creates a sub-folder for each GWS locus)

From there, you can find for example:

  • .snp- file, in which are the results, such as the probability of being causal prob for each variant in the region

  • .log_sss- file, in which you can see the posterior probabilities for the credible sets in the region

  • in the glob-* subfolder, you can find your .cred*- files. From these, you can get your credible sets, as well some additional information on the credibe set, such as posterior probability and LD statistics among the variants in the set.

Results from SUSIE can be found at: /finngen/pipelines/cromwell/workflows/finemap/[WORKFLOW_ID]/call-ldstrore_finemap/shard-0/sub.ldstore_finemap/[sub_workflow_id]/call-susie/shard-#/

From there, in the .snp- file you can find the probabilities for being causal (prob), as well as the information on which variants are included in the credible set(s) (cs). (-1 represents as not included in any of the credible set)

Finemapping GWAS results from custom GWAS

There are a couple of things to consider if you want to finemap your GWAS results that have been run using custom GWAS tools.

  • First, your summary statistics need to be input to "finemap.sumstats_pattern" in the .json file can be found in: /finngen/library-green/finngen_R9/sandbox_custom_gwas/{PHENO}/{PHENO}.gz if you have used the R9 (for the earlier releases 6-8 the path is the same, replacing R9 with the corresponding release)

  • Second, you need to define the pheno_cov file in the json as "finemap.phenotypes". If you have used atlas for creating your phenotype and then custom GWAS tool to run GWAS for it, you have to go to your GWAS run results to find your pheno-cov file. If you have used earlier releases and thus SAIGE in the custom GWAS, you can find your file in: /finngen/saige/[WORKFLOW_ID]/call-prepare/combined_file.txt.gz, and if for R8/R9 and thus using regenie: /finngen/pipeline/ cromwell/workflows/regain /[WORKFLOW_ID]/call-prepare/combined_file.txt.gz . This is why it is important to save your WORKFLOW_ID!!! If you do not, however, have it, please have a look at this section.

Check also the recording at User's Meeting Feb 8th 2022 by Bridget Riley-Gillis which showed how to run Finemapping.

Last updated