GWAS Analysis

Genome-wide association studies, or GWAS, are one of the most common ways to analyze the statistical significance of genetic data.

These studies typically compare the effect of variants across the genome on a desired phenotype, and their respective effects and significance thereof. Steps to conduct a GWAS is as diagramatically shown in an excellent paper that we recommend reading by Uffelmann et al (2021) Nature Reviews Methods Primers:

One of the simplest models to model GWAS data is with a linear model

y = µ+xβ + ε, where:

y is the phenotype

x is the genotype, coded as either 0, 1, or 2:

  • 0 meaning the individual has no copies of the variant gene or homozygous reference,

  • 1 meaning they are heterozygous, and

  • 2 meaning they are homozygous variant

● µ is the mean value of individuals without the variant

● β is the effect each copy of the variant has on the mean phenotype

● ε is a normally distributed error term (a good estimate for most biological data).

If you have all of these, running a base GWAS and getting a P-value in using a commonly used statistical programming tool, R is as simple as running:

lm.fit = lm(y ~ x)
summary(lm.fit)

Which will output information about your dataset. For more information about R, see the Getting Started with R section.

Note: All GWAS results for all available endpoints/phenotypes from FinnGen is available in FinnGen PheWeb.

Additional reading

Click here to read more about how you can run GWAS in Sandbox using FinnGen data

Last updated