Interpreting rare-variant analysis results

The table of results generated from Variant PheWas is most often sorted by p-value, with the smallest, most significant results at the top. Across all endpoints and individual codes there are therefore thousands of tests being run (though many endpoints/codes will be highly correlated with each other and therefore not producing independent outcomes).

When running any association against thousands of outcomes, interpreting such a table of results requires consideration of what results may be true positives, and which may likely be chance occurrences.

First, it is important to note the table contains exploratory results generated by Fisher’s exact test and not formal results, which would include ancestry principal components and other covariates such as age and sex. While its statistical properties are sound, this is not on its own a publication-ready analysis. If we were to roughly estimate that on the order of 5000 independent tests being run across codes, medications, and endpoints – then we might expect that the average table generated for a variant with no true biological effect might contain several results in the p- value range .001-.0001, representing nothing necessarily more than chance (that is, the expected most extreme results under the null hypothesis of no association).

Once results are observed with p<.0001, and particularly with p<.00001 (calculated as .05/5000, representing a Bonferroni-corrected significance threshold for 5000 tests), these represent observations unlikely to be made by chance at a variant with no biological consequence. Those variants, therefore, may be worth further investigation. The table flags these rough thresholds as a guide for users.

These significance thresholds are different from those suggested for common variants. If your variant appears at a higher rate in the population, use the recommended threshold in the p values section.

Last updated