Recessive GWAS results format
R12 recessive results file format
This page was last updated for R12.
General overview
We provide recessive GWA results for 2,469 of release 12's (R12) endpoints. The analysis was performed with REGENIE v3.3. For variants with alternate allele frequency larger than 0.5, the effect allele was switched to be the reference allele, in order to keep the effect allele's frequency below 0.5. For endpoints that included female samples, only female samples were used in chromosome X analysis. After the REGENIE analysis, recessive results were joined with earlier additive model results.
Data locations
Recessive REGENIE files
Tabix-indexed REGENIE results are available for R12 at the bucket location: gs://finngen-production-library-green/finngen_R12/finngen_R12_analysis_data/summary_stats_recessive/results
with 2 files per endpoint (PHENO
):
Column | Description |
---|---|
| Chromosome |
| Chromosomal position (GRCh38) |
| Reference allele |
| Alternative allele |
| Additive association p-value |
| Additive association -log10(p-value) |
| Per-allele additive model effect size estimate on endpoint for alt allele |
| Standard error of additive model effect size estimate |
| Allele frequency of alt allele |
| Allele frequenct of alt allele in cases only |
| Allele frequenct of alt allele in controls |
| Recessive association p-value |
| Recessive association -log10(p-value) |
| Effect size of recessive genotype for effect allele |
| Standard error of recessive model effect size estimate |
| Effect allele used in recessive association analysis |
| If the allele frequency was >0.5, the reference allele was used as the effect allele. In that case, this value is 'True', otherwise 'False' |
| difference of -log10(p-value) between additive and recessive models |
| Variant INFO score |
| Heterozygote allele count |
| Homozygote allele count |
| Most severe consequence of the alternate allele |
| Gene in which most severe consequence is in |
| variant rsid(s) |
| variant coordinate (chrom:pos:ref:alt) in build GRCh19 |
| variant enrichment in Finnish population vs non-Finnish European (NFE) population. Based on gnomAD 2.1 genome data. |
| variant enrichment in Finnish population vs non-Finnish European (NFE) population. Based on gnomAD 2.1 genome data. |
| variant enrichment in Finnish population vs non-Finnish European (NFE) population. Based on gnomAD 2.1 genome data. |
| variant enrichment in Finnish population vs non-Finnish European (NFE) population. Based on gnomAD 2.1 genome data. |
| Index of variant in variant annotation file |
Methods overview
Genotype data split into AF<0.5 and AF>0.5 datasets
We identified the variants with alternate alelle frequency larger than 0.5, and separated those variants to its own dataset.
Recessive REGENIE GWAS analysis
We performed standard GWA analyses using REGENIE (Mbatchou, J. et al., 2021) v3.3 for each endpoint separately. In the model-building step (step1), the standard set of FinnGen covariates (age, 10 genetic principal components, genotyping chip and legacy batch number) were used. The exceptions were endpoints E4_PORPHYNAS
and Q17_MARFAN
where legacy batch number was not included as a covariate.
A subset of the imputed variants with INFO score > 0.95 in all batches, <3% missingness and MAF > 1% were LD pruned with a 1.5Mb window and an r2 threshold of 0.2 to leave 188,153 well-imputed independent variants to build the model. A block size of 1,000 variants was used in step 1.
For all 2,469 included endpoints, recessive model association testing (in step 2 of REGENIE) was performed on variants with a minimum allele count (MAC) of 5 in the set of samples with non-missing data. As with the standard FinnGen R12 REGENIE GWAS runs, the approximate Firth test was applied to variants that had an initial P<0.01, with the standard error estimated from the effect size and the likelihood ratio test p-value (i.e. using flags --firth --approx --pThresh 0.01 --firth-se
). In step 2, we used a block size of 400. For the genotype dataset with alternate allele frequency greater than 0.5, we omitted the regenie option --ref-first
, forcing regenie to consider the reference allele as the effect allele for those variants.
The REGENIE results files were then filtered, removing variants with -log10(P)="NA" (where the Firth approximation had failed) .
Last updated