Unmodifiable Finemapping pipeline

How to run finemapping with SuSiE and FINEMAP with the unmodifiable pipeline

Unmodifiable pipelines are predefined workflows that cannot be modified by the user. The advantage of running unmodifiable pipelines compared to modifiable pipelines is that you will get results directly to the green library and the User results PheWeb browser. No download requests are needed, because results of unmodifiable pipeline have been verified not to contain any individual-level data. Running the finemapping unmodifiable pipeline is very similar to running finemapping in the modifiable pipelines, with some small restrictions. The unmodifiable finemapping pipeline can be accessed in the sandbox from The pipelines app -> unmodifable workflow -> Unmodifiable Finemap DF12. For more information about the Pipelines tool, check the Pipelines tool documentation.

Finemapping

For more information about finemapping in general, see Finemapping or information about our finemapping pipeline outputs.

Inputs to change

The workflow has three compulsory inputs in its input json, the rest are optional.

The inputs are:

  • "finemap.sumstats_pattern": A pattern for you summary stat location. For example: If your endpoint was located in RED/user/endpoint.gz, the pattern would be RED/user/{PHENO}.gz

  • "finemap.phenolistfile": A plaintext file containing the endpoint names, one per line.

  • "finemap.phenotypes": A phenotype definition file containing endpoints to analyse. The file should be in the same format as the phenotype definition file used for the original GWAS scan.

In case you want to select the regions yourself instead of using automatic region selection, the following input will also have to be filled:

  • "finemap.bed_regions_file": A plaintext file containing paths to the region definitions for each of the endpoints. For region file format information, see Finemapping with custom regions in DF12.

The rest of the inputs are parameters to the analysis. In most cases, there is no need to adjust these parameters. Most of the below parameters control automatic region selection.

  • "finemap.preprocess.scale_se_by_pval": Whether to scale standard error by p-value. This will affect the finemapping results.

  • "finemap.preprocess.x_chromosome": Whether to include x chromosome or not. True by default.

  • "finemap.preprocess.window": The default finemapping window that is extended around genome-wide significant variants. The window is extended in both directions, meaning the default area that is finemapped around a significant variant is 3MB ( 1_500_000 basepairs x2). Overlapping regions will be merged. Please note that larger values will increase the computational resource usage, and can result in regions too large to finemap.

  • "finemap.preprocess.max_region_width": Maximum region size in automatic region selection. If the region selection produces larger regions by merging multiple regions, the region selection is tried again with a smaller window extended around each significant variant. Please note that larger values might result in regions that are too large to finemap.

  • "finemap.preprocess.window_shrink_ratio": Value to shrink window size with in case of too large regions. If the automatic region selection process encounters too large regions (larger than finemap.preprocess.max_region_width), the too large region has new windows extended from each significant variant, with each window being scaled by the window shrink ratio.

  • "finemap.set_variant_id_map_chr": Map chrX (or other chromosomes) to non-numeric chromosomes for the benefit of the pipeline.

  • "finemap.preprocess.p_threshold": Threshold for genome-wide significance. Making this larger will increase amount of regions to finemap.

  • "finemap.ldstore_finemap.n_causal_snps": Maximum amount of causal variants in a region. Finemapping will be able to identify N or less separate causal variants in a region. Note that increasing this will increase the resource usage of the finemapping algorithms.

  • "finemap.ldstore_finemap.susie.min_cs_corr": A "purity" threshold for the credible sets. Any credible set that contains a pair of variables with correlation less than this threshold will be filtered out and not reported.

  • "finemap.ldstore_finemap.filter_and_summarize.good_cred_r2": A "purity" threshold for the credible sets. Any credible set with minimum r2 correlation between the variants under this threshold will be considered a low-quality credible set.

  • finemap.ldstore_finemap.ldstore.enable_fuse": Enable GCS fuse. If fuse is supported by cromwell, it will reduce the amount of data that needs to be localized, reducing the amount of time spent in finemapping tasks. Current configuration of cromwell and backend (BATCH) do not support GCS fuse.

NOTE: Your summary statistic has to have the following column names:

  • chromosome column: "#chrom"

  • position column: "pos"

  • reference allele column: "ref"

  • alternate allele column: "alt"

  • alternate allele frequency column: "af_alt"

  • effect size column: "beta"

  • std error of effect column: "sebeta"

  • p-value column: "pval"

By default, the regenie and GATE pipeline outputs have these column names.

Outputs

The results will be automatically copied to the green library bucket specific for each data release: /finngen/library-green/finngen_R[RELEASE]/unmodifiable_pipelines/UnmodifiableFinemapDF[RELEASE]/workflow_id

E.g. for R12, to /finngen/library-green/finngen_R12/unmodifiable_pipelines/UnmodifiableFinemapDF12/workflow_id . The workflow_id will be the pipelines app ID for your job. You can also access the results outside the sandbox, see Accessing green data.

Last updated