Data description

File naming pattern and file structure

Summary association statistics

GWAS summary statistics (tab-delimited, bgzipped, genome build 38, tabix index files included) are named as {endpoint}.gz. For example, endpoint I9_CHD has I9_CHD.gz and I9_CHD.gz.tbi.

To learn more about the methods used, see section GWAS.

The {endpoint}.gz have the following structure:

Column name

Description

#chrom

chromosome on build GRCh38 (1-23)

pos

position in base pairs on build GRCh38

ref

reference allele

alt

alternative allele (effect allele)

rsids

variant identifier

nearest_genes

nearest gene(s) (comma separated) from variant

pval

p-value from regenie

mlogp

-log10(p-value)

beta

effect size (log(OR) scale) estimated with regenie for the alternative allele

sebeta

standard error of effect size estimated with regenie

af_alt

alternative (effect) allele frequency

af_alt_cases

alternative (effect) allele frequency among cases

af_alt_controls

alternative (effect) allele frequency among controls

Fine-mapping results

Two fine-mapping methods were used:

Fine-mapping results are tab-delimited and bgzipped.

SuSiE results have the following filename pattern:

  • {endpoint}.SUSIE.cred.bgz

  • {endpoint}.SUSIE.cred_99.bgz

  • {endpoint}.SUSIE.snp.bgz

FINEMAP results have the following filename pattern:

  • {endpoint}.FINEMAP.config.bgz

  • {endpoint}.FINEMAP.region.bgz

  • {endpoint}.FINEMAP.snp.bgz

To learn more about the methods used, see section Fine-mapping.

{endpoint}.SUSIE.cred.bgz contain credible set summaries from SuSiE fine-mapping for all genome-wide significant regions. {endpoint}.SUSIE.cred_99.bgz contain the 99% credible set summaries while the default is 95%. They have the following structure:

Column nameDescription

Column name

Description

trait

phenotype

region

region for which the fine-mapping was run

cs

running number for independent credible sets in a region

cs_log10bf

Log10 bayes factor of comparing the solution of this model (cs independent credible sets) to cs -1 credible sets

cs_avg_r2

Average correlation R2 between variants in the credible set

cs_min_r2

minimum r2 between variants in the credible set

low_purity

cs_size

how many snps does this credible set contain

{endpoint}.SUSIE.snp.bgz contain variant summaries with credible set information and have the following structure:

Column name

Description

trait

endpoint name

region

chr:start-end

v

variant identifier

rsid

rs variant identifier

chromosome

chromosome on build GRCh38 (1-22, X)

position

position in base pairs on build GRCh38

allele1

reference allele

allele2

alternative allele (effect allele)

maf

minor allele frequency

beta

effect size GWAS

se

standard error GWAS

p

p-value GWAS

mean

posterior expectation of true effect size

sd

posterior standard deviation of true effect size

prob

posterior probability of association

cs

identifier of 95% credible set (-1 = variant is not part of credible set)

lead_r2

r2 value to a lead variant (the one with maximum PIP) in a credible set

alphax

posterior inclusion probability for the x-th single effect (x := 1..L where L is the number of single effects (causal variants) specified; default: L = 10)

{endpoint}.FINEMAP.config.bgz contain summary fine-mapping variant configurations from FINEMAP method and have the following structure:

Column nameDescription

Column name

Description

trait

phenotype

region

region for which the fine-mapping was run

rank

rank of this configuration within a region

config

causal variants in this configuration

prob

probability across all n independent signal configurations

log10bf

log10 bayes factor for this configuration

odds

odds of this configuration

k

how many independent signals in this configuration

prob_norm_k

probability of this configuration within k independent signals solution

h2

snp heritability of this solution

h2_0.95CI

95% confidence interval limits of snp heritability of this solution

mean

marginalized shrinkage estimates of the posterior effect size mean

sd

marginalized shrinkage estimates of the posterior effect standard deviation

{endpoint}.FINEMAP.region.bgz contain summary statistics on number of independent signals in each region and have the following structure:

Column nameDescription

Column name

Description

trait

phenotype

region

region for which the fine-mapping was run

h2g

heritability of this region

h2g_sd

standard deviation of snp heritability of this region

h2g_lower95

lower limit of 95% CI for snp heritability

h2g_upper95

upper limit of 95% CI for snp heritability

log10bf

log bayes factor compared against null (no signals in the region)

prob_xSNP

columns for probabilities of different number of independent signals

expectedvalue

expectation (average) of the number of signals

{endpoint}.FINEMAP.snp.bgz has summary statistics of variants and into what credible set they may belong to. Columns:

Column nameDescription

Column name

Description

trait

phenotype

region

region for which the fine-mapping was run

v

variant

index

running index

rsid

rs variant identifier

chromosome

chromosome

position

position

allele1

reference allele

allele2

alternative allele

maf

alternative allele frequency

beta

original marginal effect size

se

original standard error

z

original zscore

prob

post inclusion probability

log10bf

log10 bayes factor

mean

marginalized shrinkage estimates of the posterior effect size mean

sd

marginalized shrinkage estimates of the posterior effect standard deviation

mean_incl

conditional estimates of the posterior effect size mean

sd_incl

conditional estimates of the posterior effect size standard deviation

p

original p-value

csx

credible set index for given number of causal variants x

Variant annotation

The variant annotation has measures (HWE, INFO, ...) listed per batch.

Gene-based burden test results of LoF variants

Loss of function (LoF) variants were generated from vcf files with VEP (https://github.com/Ensembl/ensembl-vep). LoF variants are defined as having consequences in the list [frameshift_variant,splice_donor_variant,stop_gained,splice_acceptor_variant]. Also, a max_maf (0.01) and minimum info score (0.8) filters are applied. Then a bgen file is formed by filtering chromosome based vcfs and merging them into a single file, allowing us to run the whole analysis in one data set. Then the bgen is passed to step 2 of regenie in burden mode, which uses the nulls from the standard GWAS runs.

## File structure

### Data

| File | Description |

|---|---|

|finngen_R8_lof_txt.gz | Merged results, sorted by mglop. |

|finngen_R8_lof_variants.txt | A tsv file with variant/geno/lof data used in the run. |

|finngen_R8_lof_sig_hits.txt | A summary of the results only including hits for mlogp > 3 and sorted by difference between mlogp and max(mlogp) of its variants.|

### Documentation

| File | Description |

|---|---|

|finngen_R8_lof.log| Merged logs of all runs.|

Last updated