Recessive GWAS results format

R12 recessive results file format

This page was last updated for R12.

General overview

We provide recessive GWA results for 2,469 of release 12's (R12) endpoints. The analysis was performed with REGENIE v3.3. For variants with alternate allele frequency larger than 0.5, the effect allele was switched to be the reference allele, in order to keep the effect allele's frequency below 0.5. For endpoints that included female samples, only female samples were used in chromosome X analysis. After the REGENIE analysis, recessive results were joined with earlier additive model results.

Data locations

Recessive REGENIE files

Tabix-indexed REGENIE results are available for R12 at the bucket location: gs://finngen-production-library-green/finngen_R12/finngen_R12_analysis_data/summary_stats_recessive/results

with 2 files per endpoint (PHENO):

Column
Description

#chrom

Chromosome

pos

Chromosomal position (GRCh38)

ref

Reference allele

alt

Alternative allele

pval_add

Additive association p-value

mlogp_add

Additive association -log10(p-value)

beta_add

Per-allele additive model effect size estimate on endpoint for alt allele

sebeta_add

Standard error of additive model effect size estimate

af_alt

Allele frequency of alt allele

af_alt_cases

Allele frequenct of alt allele in cases only

af_alt_controls

Allele frequenct of alt allele in controls

pval

Recessive association p-value

mlogp

Recessive association -log10(p-value)

beta

Effect size of recessive genotype for effect allele

sebeta

Standard error of recessive model effect size estimate

effect_allele

Effect allele used in recessive association analysis

flipped

If the allele frequency was >0.5, the reference allele was used as the effect allele. In that case, this value is 'True', otherwise 'False'

mlogp_diff

difference of -log10(p-value) between additive and recessive models

INFO

Variant INFO score

AC_Het

Heterozygote allele count

AC_Hom

Homozygote allele count

most_severe

Most severe consequence of the alternate allele

gene_most_severe

Gene in which most severe consequence is in

rsid

variant rsid(s)

b37_coord

variant coordinate (chrom:pos:ref:alt) in build GRCh19

EXOME_enrichment_nfsee

variant enrichment in Finnish population vs non-Finnish European (NFE) population. Based on gnomAD 2.1 genome data.

EXOME_enrichment_nfe

variant enrichment in Finnish population vs non-Finnish European (NFE) population. Based on gnomAD 2.1 genome data.

GENOME_enrichment_nfee

variant enrichment in Finnish population vs non-Finnish European (NFE) population. Based on gnomAD 2.1 genome data.

GENOME_enrichment_nfe

variant enrichment in Finnish population vs non-Finnish European (NFE) population. Based on gnomAD 2.1 genome data.

index

Index of variant in variant annotation file

Methods overview

Genotype data split into AF<0.5 and AF>0.5 datasets

We identified the variants with alternate alelle frequency larger than 0.5, and separated those variants to its own dataset.

Recessive REGENIE GWAS analysis

We performed standard GWA analyses using REGENIE (Mbatchou, J. et al., 2021) v3.3 for each endpoint separately. In the model-building step (step1), the standard set of FinnGen covariates (age, 10 genetic principal components, genotyping chip and legacy batch number) were used. The exceptions were endpoints E4_PORPHYNAS and Q17_MARFAN where legacy batch number was not included as a covariate.

A subset of the imputed variants with INFO score > 0.95 in all batches, <3% missingness and MAF > 1% were LD pruned with a 1.5Mb window and an r2 threshold of 0.2 to leave 188,153 well-imputed independent variants to build the model. A block size of 1,000 variants was used in step 1.

For all 2,469 included endpoints, recessive model association testing (in step 2 of REGENIE) was performed on variants with a minimum allele count (MAC) of 5 in the set of samples with non-missing data. As with the standard FinnGen R12 REGENIE GWAS runs, the approximate Firth test was applied to variants that had an initial P<0.01, with the standard error estimated from the effect size and the likelihood ratio test p-value (i.e. using flags --firth --approx --pThresh 0.01 --firth-se). In step 2, we used a block size of 400. For the genotype dataset with alternate allele frequency greater than 0.5, we omitted the regenie option --ref-first, forcing regenie to consider the reference allele as the effect allele for those variants.

The REGENIE results files were then filtered, removing variants with -log10(P)="NA" (where the Firth approximation had failed) .

Last updated