GWAS Analysis
Last updated
Last updated
Genome-wide association studies, or GWAS, are one of the most common ways to analyze the statistical significance of genetic data. A GWAS statistically tests if a genetic variant occurs more frequently in cases than controls. A FinnGen GWAS looks at millions of variants across the whole genome.
These studies typically compare the effect of variants across the genome on a desired phenotype, and their respective effects and significance thereof. Steps to conduct a GWAS is as diagramatically shown in an excellent paper that we recommend reading by Uffelmann et al (2021) Nature Reviews Methods Primers:
One of the simplest models to model GWAS data is with a linear model
y = µ+xβ + ε, where:
● y is the phenotype
● x is the genotype, coded as either 0, 1, or 2:
0 meaning the individual has no copies of the variant gene or homozygous reference,
1 meaning they are heterozygous, and
2 meaning they are homozygous variant
● µ is the mean value of individuals without the variant
● β is the effect each copy of the variant has on the mean phenotype
● ε is a normally distributed error term (a good estimate for most biological data).
If you have all of these, running a base GWAS and getting a P-value in using a commonly used statistical programming tool, R is as simple as running:
Which will output information about your dataset. For more information about R, see the Getting Started with R section.
Note: All GWAS results for all available endpoints/phenotypes from FinnGen is available in FinnGen PheWeb.
GWAS Primer, the National Human Genome Research Institute, USA.
Matti Pirinen’s Genome-wide Association Studies (course code: LSI34002) course at the University of Helsinki provides an excellent introduction to the topic to those interested in a more in-depth look at running your own GWAS, and much of the background here was sourced from his course notes which are free to use.
Click here to read more about how you can run GWAS in Sandbox using FinnGen data