What are QQ and Manhattan plots?
Last updated
Last updated
Most (if not all) GWAS scans results in 2 distinct plots that we use for checking for quality of your scan and the results themselves.
Based on this, we would suggest a few steps to determine the quality of your results in this section. The following text was taken from the COVID-19 Host Genetics Analysis Plan v1.1 (pages 9-11).
Quantile-Quantile (QQ) plot
The distribution of the analysed is linearly related to the expected values to some point. When there is a deviation from this, it would indicate that there are loci in your dataset which when higher than the expected normal distribution will hold significant (inflated QQ plot) or lower than the expected normal distribution (deflated QQ plot). Using the figure that was graciously adapted from the Analytic and Translational Genetics Unit Workshop 2020, we will describe interpretation of a QQ plot.
When your QQ plot is highly inflated (unadjusted in Figure): the model may need further adjustment i.e. population stratification.
Therefore, adjustment with the first 10 principal components (PCs) should adjust stratification out of your model. Another reason for inflation could be from the polygenic architecture in your model. As such, interpretation of the LD score regression intercept would shed light on this situation. Considering the LD score is directly estimating the polygenic effect, the LD score intercept should be approximately 1.0 (or not significant from 1.0). This would remain valid even in the presence of polygenicity. Any deviations from 1.0 would suggest for uncontrolled population stratification. In the Figure, the unadjusted model is highly inflated with a genomic inflation (𝛌) of 3.2; upon adjustment, the inflation value reduced to 1.2. You can calculate the genomic inflation with the following formula (in R):
You would generally want a lambda of > 1.0 and less than ~ 1.5. This threshold would imply that you may have significantly associated loci to your trait of interest.
When your QQ plot is deflated (example at end of this section): the model may have some rare variants or that the variance of the tests are insufficient for the model to work well. For the latter, permutations may be needed to improve the quality of your tested model.
A Manhattan plot that is considered “good” would have clear LD peaks with few sporadic points i.e. like the Manhattan skyline.
Other than observing significant loci from your association tests, Manhattan plots can also be used to determine if your statistical model needs further adjustments or a more strict quality control is needed. This could be determined by observing the “clarity” of the peaks.
If the points are sporadic and lonely loci like the plot below, this might indicate that a more strict quality control is needed. However, with small sample sizes, these sporadic points might simply be due to low allele frequencies variants. Inspect the frequency of those variants. However, If they are all < 1% (rare), then it is fine.
To show the relationship between QQ and Manhattan plots, we elaborate with further examples adapted from the Institute of Behavioral Science, University of Boulder, Colorado.
Shows sporadic and poor associations throughout the genome:
Everything is seemingly associated (but not) with the trait:
We welcome you to continue using the PheWeb interface knowing that the QQ plots and Manhattan plots have been carefully checked for quality prior to publishing these traits on the PheWeb browser.
The easiest way to visualize the results from your analysis would be to plot a Manhattan plot (named after the Manhattan, New York, skyline). An example of this would be the plot on the right where the x-axis are chromosomal positions and the y-axis is the from your association analysis. The highest associations will have the smallest p-values and therefore the highest height in the plot.