Using Polygenic Risk Scores
Last updated
Last updated
Large-scale genetic association studies comparing disease cases with controls have identified thousands of genetic loci associated with various diseases. Studies have been done for traits such as height, lipid levels, and educational attainment. Individually, the detected loci typically modify the disease risks only minimally, but their cumulative impact across the genome can be considerable.
Polygenic risk scores (PRSs) measure this cumulative genetic burden, and their effect on risk stratification and risk prediction has been shown for many diseases and traits.
Several approaches exist for developing PRS. The most simple methods to sum linearly the contribution of each regional peak, only focusing on low p-values. Newer methods use a more sophisticated approach that takes into account the whole genomic structure (linkage disequilibrium) and assigns a weight to each variant, increasing the weight of the most significant contributions and reducing the weight of irrelevant and statistically correlated signals to 0 (Figure 1). Once such weights have been calculated, one can proceed to sum all weights over the genome of the target population. Details on the statistical modeling underlying PRS generation can be found in Matti Pirinen’s GWAS course in topic 9 (‘Meta-analysis and summary statistics’).
Figure 1: General principle of newer methods used for building PRS
Ultimately, all PRS algorithms produce, for each individual, a score that is meaningless by itself, but that allows us to rank the individuals in terms of relative risk to each other. Common ways to present PRS effects include:
scaling the PRS to mean zero and a standard deviation of one, which allows one to show effect sizes by one standard deviation increase in the PRS. Also with this, the individuals’ PRS values can be interpreted in a similar way as for instance growth charts familiar to clinicians, and we can, for instance, say that “an individual has a PRS of +2.0SD”.
categorizing individuals into groups based on levels of PRS. No widely accepted categories exist, but some commonly used categories include quintiles or a comparison between individuals above the 90th percentile vs the rest of the distribution.
A reporting framework for PRS studies can be found at: Wand, H., Lambert, S.A., Tamburro, C. et al. Improving reporting standards for polygenic scores in risk prediction studies. Nature 591, 211–219 (2021). https://doi.org/10.1038/s41586-021-03243-6d
By summarizing common genetic effects across the genome, we capture germline genetic susceptibility to disease into a single measure, the PRS. The PRS can be used for several types of analyses, from understanding biological processes underlying diseases, to estimating their potential role as clinical tools for risk stratification and targeting individuals for risk mitigation. Examples of PRS use cases can be found from PRS studies using FinnGen data.
With improved and larger GWAS, PRS computations will continue to improve. Moreover, the methodologies used for generating PRS are constantly improving. An important limitation of PRS is that the majority of the research has been performed in individuals of European ancestry (Martin, A.R., Kanai, M., Kamatani, Y. et al. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat Genet 51, 584–591 (2019)), and an important goal for the field is to improve the diversity of PRS studies, including the development of methods that allow PRS modeling in individuals of admixed ancestry.
In FinnGen, we provide a large number of PRS already calculated for the community and are ready for use. Custom PRS for diseases and traits of interest can also be generated with the FinnGen PRS pipeline in the Sandbox environment. The current method used by FinnGen for generating PRS is: T Ge, CY Chen, Y Ni, YCA Feng, JW Smoller. Polygenic Prediction via Bayesian Regression and Continuous Shrinkage Priors. Nature Communications, 10:1776, 2019.
Timpson, N., Greenwood, C., Soranzo, N. et al. Genetic architecture: the shape of the genetic contribution to human traits and disease. Nat Rev Genet 19, 110–124 (2018). [A review describing the concept of genetic architecture, which is relevant for understanding and applying PRSs.]
Chatterjee, N., Shi, J. & García-Closas, M. Developing and evaluating polygenic risk prediction models for stratified disease prevention. Nat Rev Genet 17, 392–406 (2016). [A summary of the methodologies used for building, evaluating and applying risk prediction models that include information from genetic testing and environmental risk factors. New methods for building PRSs have been developed after this article was published, but the review is a great summary of general methodology and terminology used in PRS studies.]
Torkamani, A., Wineinger, N. E. & Topol, E. The personal and clinical utility of polygenic risk scores. Nat. Rev. Genet. 19, 581–590 (2018). [This review lays out the principles of PRS for clinical risk stratification in common diseases.]
Click here to read more how you can run PRS using FinnGen data\