Recessive GWAS results format
R12 recessive results file format
This page was last updated for R12.
General overview
We provide recessive GWA results for 2,469 of release 12's (R12) endpoints. The analysis was performed with REGENIE v3.3. For variants with alternate allele frequency larger than 0.5, the effect allele was switched to be the reference allele, in order to keep the effect allele's frequency below 0.5. For endpoints that included female samples, only female samples were used in chromosome X analysis. After the REGENIE analysis, recessive results were joined with earlier additive model results.
Data locations
Recessive REGENIE files
Tabix-indexed REGENIE results are available for R12 at the bucket location: gs://finngen-production-library-green/finngen_R12/finngen_R12_analysis_data/summary_stats_recessive/results
with 2 files per endpoint (PHENO
):
#chrom
Chromosome
pos
Chromosomal position (GRCh38)
ref
Reference allele
alt
Alternative allele
pval_add
Additive association p-value
mlogp_add
Additive association -log10(p-value)
beta_add
Per-allele additive model effect size estimate on endpoint for alt allele
sebeta_add
Standard error of additive model effect size estimate
af_alt
Allele frequency of alt allele
af_alt_cases
Allele frequenct of alt allele in cases only
af_alt_controls
Allele frequenct of alt allele in controls
pval
Recessive association p-value
mlogp
Recessive association -log10(p-value)
beta
Effect size of recessive genotype for effect allele
sebeta
Standard error of recessive model effect size estimate
effect_allele
Effect allele used in recessive association analysis
flipped
If the allele frequency was >0.5, the reference allele was used as the effect allele. In that case, this value is 'True', otherwise 'False'
mlogp_diff
difference of -log10(p-value) between additive and recessive models
INFO
Variant INFO score
AC_Het
Heterozygote allele count
AC_Hom
Homozygote allele count
most_severe
Most severe consequence of the alternate allele
gene_most_severe
Gene in which most severe consequence is in
rsid
variant rsid(s)
b37_coord
variant coordinate (chrom:pos:ref:alt) in build GRCh19
EXOME_enrichment_nfsee
variant enrichment in Finnish population vs non-Finnish European (NFE) population. Based on gnomAD 2.1 genome data.
EXOME_enrichment_nfe
variant enrichment in Finnish population vs non-Finnish European (NFE) population. Based on gnomAD 2.1 genome data.
GENOME_enrichment_nfee
variant enrichment in Finnish population vs non-Finnish European (NFE) population. Based on gnomAD 2.1 genome data.
GENOME_enrichment_nfe
variant enrichment in Finnish population vs non-Finnish European (NFE) population. Based on gnomAD 2.1 genome data.
index
Index of variant in variant annotation file
Methods overview
Genotype data split into AF<0.5 and AF>0.5 datasets
We identified the variants with alternate alelle frequency larger than 0.5, and separated those variants to its own dataset.
Recessive REGENIE GWAS analysis
We performed standard GWA analyses using REGENIE (Mbatchou, J. et al., 2021) v3.3 for each endpoint separately. In the model-building step (step1), the standard set of FinnGen covariates (age, 10 genetic principal components, genotyping chip and legacy batch number) were used. The exceptions were endpoints E4_PORPHYNAS
and Q17_MARFAN
where legacy batch number was not included as a covariate.
A subset of the imputed variants with INFO score > 0.95 in all batches, <3% missingness and MAF > 1% were LD pruned with a 1.5Mb window and an r2 threshold of 0.2 to leave 188,153 well-imputed independent variants to build the model. A block size of 1,000 variants was used in step 1.
For all 2,469 included endpoints, recessive model association testing (in step 2 of REGENIE) was performed on variants with a minimum allele count (MAC) of 5 in the set of samples with non-missing data. As with the standard FinnGen R12 REGENIE GWAS runs, the approximate Firth test was applied to variants that had an initial P<0.01, with the standard error estimated from the effect size and the likelihood ratio test p-value (i.e. using flags --firth --approx --pThresh 0.01 --firth-se
). In step 2, we used a block size of 400. For the genotype dataset with alternate allele frequency greater than 0.5, we omitted the regenie option --ref-first
, forcing regenie to consider the reference allele as the effect allele for those variants.
The REGENIE results files were then filtered, removing variants with -log10(P)="NA" (where the Firth approximation had failed) .
Last updated