Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
FinnGen research project is a public-private partnership combining genotype data from Finnish biobanks and digital health record data from Finnish health registries. FinnGen provides a unique opportunity to study genetic variation in relation to disease trajectories in an isolated population.
FinnGen is a growing project, aiming at 500,000 individuals in the end of 2023.
FinnGen results are subjected to one year embargo and, after that, available to the larger scientific community via the Pheweb browser or through data download.
To download FinnGen summary statistics you will need to fill the online form at this link. You will then receive an email containing the detailed instructions for downloading the data.
Release 10 contains
When using these results in publications, please remember to:
1) Acknowledge the FinnGen study. You can use the following text:
“We want to acknowledge the participants and investigators of the FinnGen study”
2) Cite our latest publication:
Kurki M.I., et al. FinnGen provides genetic insights from a well-phenotyped isolated population. Nature 2023 Jan;613(7944):508-518. doi: 10.1038/s41586-022-05473-8. Epub 2023 Jan 18.
Furthermore, if possible, include "FinnGen" as a keyword for your publication.
If you want to cite this website, use the following citation:
The manifest file with the link to all the downloadable summary stats is available at:
File naming pattern and file structure
GWAS summary statistics (tab-delimited, bgzipped, genome build 38, index files included) are named as {endpoint}.gz
. For example, endpoint I9_CHD
has I9_CHD.gz
and I9_CHD.gz.tbi
.
To learn more about the methods used, see section .
The {endpoint}.gz
have the following structure:
Two fine-mapping methods were used:
Fine-mapping results are tab-delimited and bgzipped.
SuSiE results have the following filename pattern:
{endpoint}.SUSIE.cred.bgz
{endpoint}.SUSIE.cred_99.bgz
{endpoint}.SUSIE.snp.bgz
FINEMAP results have the following filename pattern:
{endpoint}.FINEMAP.config.bgz
{endpoint}.FINEMAP.region.bgz
{endpoint}.FINEMAP.snp.bgz
{endpoint}.SUSIE.cred.bgz
contain credible set summaries from SuSiE fine-mapping for all genome-wide significant regions. {endpoint}.SUSIE.cred_99.bgz
contain the 99% credible set summaries while the default is 95%. They have the following structure:
{endpoint}.SUSIE.snp.bgz
contain variant summaries with credible set information and have the following structure:
{endpoint}.FINEMAP.config.bgz
contain summary fine-mapping variant configurations from FINEMAP method and have the following structure:
{endpoint}.FINEMAP.region.bgz
contain summary statistics on number of independent signals in each region and have the following structure:
{endpoint}.FINEMAP.snp.bgz
has summary statistics of variants and into what credible set they may belong to. Columns:
The {probeName}.gz
have the following structure:
ldstore --bcor FG_LD_chr1.bcor --incl-range 20000000-50000000 --table output_file_name.table
The variant annotation has measures (HWE
, INFO
, ...) listed per batch.
FinnGen individuals were with Illumina and Affymetrix chip arrays (Illumina Inc., San Diego, and Thermo Fisher Scientific, Santa Clara, CA, USA).
Chip genotype data were using the population-specific of 8,554 whole genomes.
Merged imputed genotype data is composed of 116 data sets that include samples from multiple cohorts.
Total number of individuals: 430,897
Total number of variants (merged set): 21,311,942
Reference assembly: GRCh38/hg38
Please use the following description when referring to our project:
The FinnGen study is a large-scale genomics initiative that has analyzed over 500,000 Finnish biobank samples and correlated genetic variation with health data to understand disease mechanisms and predispositions. The project is a collaboration between research organisations and biobanks within Finland and international industry partners.
When using these results in publications, please remember to:
Acknowledge the FinnGen study. You can use the following text:
“We want to acknowledge the participants and investigators of the FinnGen study”
Cite our latest publication:
Kurki, M.I., Karjalainen, J., Palta, P. et al. FinnGen provides genetic insights from a well-phenotyped isolated population. Nature 613, 508–518 (2023). https://doi.org/10.1038/s41586-022-05473-8
Furthermore, if possible, include "FinnGen" as a keyword for your publication.
If you want to cite this website, use the following citation:
To learn more about the methods used, see section .
Column name | Description |
---|
Column name | Description |
---|
Column name | Description |
---|
Column name | Description |
---|
pQTL summary statistics (tab-delimited, bgzipped, genome build 38, index files included) are named as {probeName}.gz
. For example, endpoint seq.9928.125 has seq.9928.125.gz
and seq.9928.125.gz.tbi
.
To learn more about the methods used, see section
Field | Description |
---|
Linkage disequilibrium (LD) was estimated from for each chromosome. Use the tool for further usage of the bcor files.
To learn more about the methods used, see section .
Column name | Description |
trait | phenotype |
region | region for which the fine-mapping was run |
cs | running number for independent credible sets in a region |
cs_log10bf | Log10 bayes factor of comparing the solution of this model (cs independent credible sets) to cs -1 credible sets |
cs_avg_r2 | Average correlation R2 between variants in the credible set |
cs_min_r2 | minimum r2 between variants in the credible set |
low_purity |
cs_size | how many snps does this credible set contain |
Column name | Description |
trait | endpoint name |
region | chr:start-end |
v | variant identifier |
rsid | rs variant identifier |
chromosome | chromosome on build GRCh38 ( |
position | position in base pairs on build GRCh38 |
allele1 | reference allele |
allele2 | alternative allele (effect allele) |
maf | minor allele frequency |
beta | effect size GWAS |
se | standard error GWAS |
p | p-value GWAS |
mean | posterior expectation of true effect size |
sd | posterior standard deviation of true effect size |
prob | posterior probability of association |
cs | identifier of 95% credible set (-1 = variant is not part of credible set) |
lead_r2 | r2 value to a lead variant (the one with maximum PIP) in a credible set |
alphax | posterior inclusion probability for the x-th single effect (x := 1..L where L is the number of single effects (causal variants) specified; default: L = 10) |
Column name | Description |
trait | phenotype |
region | region for which the fine-mapping was run |
rank | rank of this configuration within a region |
config | causal variants in this configuration |
prob | probability across all n independent signal configurations |
log10bf | log10 bayes factor for this configuration |
odds | odds of this configuration |
k | how many independent signals in this configuration |
prob_norm_k | probability of this configuration within k independent signals solution |
h2 | snp heritability of this solution |
h2_0.95CI | 95% confidence interval limits of snp heritability of this solution |
mean | marginalized shrinkage estimates of the posterior effect size mean |
sd | marginalized shrinkage estimates of the posterior effect standard deviation |
Column name | Description |
trait | phenotype |
region | region for which the fine-mapping was run |
h2g | heritability of this region |
h2g_sd | standard deviation of snp heritability of this region |
h2g_lower95 | lower limit of 95% CI for snp heritability |
h2g_upper95 | upper limit of 95% CI for snp heritability |
log10bf | log bayes factor compared against null (no signals in the region) |
prob_xSNP | columns for probabilities of different number of independent signals |
expectedvalue | expectation (average) of the number of signals |
Column name | Description |
trait | phenotype |
region | region for which the fine-mapping was run |
v | variant |
index | running index |
rsid | rs variant identifier |
chromosome | chromosome |
position | position |
allele1 | reference allele |
allele2 | alternative allele |
maf | alternative allele frequency |
beta | original marginal effect size |
se | original standard error |
z | original zscore |
prob | post inclusion probability |
log10bf | log10 bayes factor |
mean | marginalized shrinkage estimates of the posterior effect size mean |
sd | marginalized shrinkage estimates of the posterior effect standard deviation |
mean_incl | conditional estimates of the posterior effect size mean |
sd_incl | conditional estimates of the posterior effect size standard deviation |
p | original p-value |
csx | credible set index for given number of causal variants x |
CHR | chromosome for variants |
POS | BP of the variants |
ID | SNP name (CHR_POS_REF_ALT) |
REF | reference allele provided in FINNGEN imputed data |
ALT | alternative allele, this is the effect allele (aka. A1, effect allele, A0 in some software) |
ALT_FREQ | allele frequency of the alternative allele |
BETA | effect size in additive model |
SE | standard error of the effect size |
T_STAT | t statistics from PLINK2 |
P | p-value in association test |
log10_P | -log10(P) keep extra precision when P < 10^-308 |
N | per-SNP sample size for the SNP |
Hail v0.2
Cromwell-42
Wdltool-0.14
Plink 1.9 and 2.0
BCFtools 1.7 and 1.9
Eagle 2.3.5
Beagle 4.1 (version 27Jan18.7e1)
R 3.4.1 (packages: data.table 1.10.4, sm 2.2-5.4)
We included 2,408 endpoints in the analysis, which consisted of 2,405 binary endpoints and 3 quantitative endpoints (HEIGHT_IRN, WEIGHT_IRN, BMI_IRN). Endpoints with less than 50 cases among the 412,181 samples were excluded, as well as endpoints labeled with an OMIT tag in the endpoint definition file.
The quantitative endpoints HEIGHT and WEIGHT were acquired from minimum phenotype data. After that, phenotype BMI was formed from them, and all of them were inverse normal transformed.
For regenie step 1 LOCO prediction computation for each endpoint, we used age, sex, 10 PCs, Finngen 1 or 2 chip or legacy genotyping batch as covariates. For sex-specific phenotypes, sample sex was left out from the covariates. We excluded covariates that had less than 10 cases.
For calculating genetic relatedness in regenie step 1, we included variants 1) imputed with an INFO score > 0.95 in all batches and 2) > 97 % non-missing genotypes and 3) MAF > 1 %. The remaining variants were LD pruned with a 1.5Mb window and r2 threshold of 0.2. This resulted in a set of 228,119 well-imputed not rare variants for relatedness calculation.
We used a genotype block size of 1,000 in regenie step 1.
We ran association tests with regenie for each of the 2,408 endpoints for each variant with a minimum allele count of 5 among each phenotype’s cases and controls. We used the approximate Firth test for variants with an initial p-value of less than 0.01 and computed the standard error based on effect size and likelihood ratio test p-value (regenie options --firth --approx --pThresh 0.01 --firth-se).
Column name | Description |
| chromosome on build GRCh38 ( |
| position in base pairs on build GRCh38 |
| reference allele |
| alternative allele (effect allele) |
| variant identifier |
| nearest gene(s) (comma separated) from variant |
|
| -log10(p-value) |
|
|
| alternative (effect) allele frequency |
| alternative (effect) allele frequency among cases |
| alternative (effect) allele frequency among controls |
The files were created using from the Finnish panel v4.2.
The panel has been divided per chromosome. For example, to use the LD information in the first chromosome, FG_LD_chr1.bcor
would be the file to use.
number of samples: 3775
window size: 1500 kb
accuracy: low
number of threads: 96
LD threshold to include correlations: 0.05
can be downloaded via:
And an example to extract variant range 20 Mb - 50 Mb from chromosome 7 is as follows:
It is not preferred to use these LD estimate files for e.g. fine-mapping, since many of the fine-mapping methods (e.g. SuSiE) require in-sample LD information for good results!
Chip genotype data processing and QC Samples were genotyped with Illumina (Illumina Inc., San Diego, CA, USA) and Affymetrix arrays (Thermo Fisher Scientific, Santa Clara, CA, USA).
Genotype calls were made with GenCall and zCall algorithms for Illumina and AxiomGT1 algorithm for Affymetrix data.
Chip genotyping data produced with previous chip platforms and reference genome builds were lifted over to build version 38 (GRCh38/hg38) following the protocol described here:
In sample-wise quality control steps, individuals with ambiguous gender, high genotype missingness (>5%), excess heterozygosity (+-4SD) and non-Finnish ancestry were excluded. In variant-wise quality control steps, variants with high missingness (>2%), low HWE P-value (<1e-6) and low minor allele count (MAC<3) were excluded.
Before imputation, chip-genotyped samples were pre-phased with using the default parameters, except the number of conditioning haplotypes, which was set to 20,000.
The disease endpoints were defined using nationwide registries:
We harmonized over the International Classification of Diseases (ICD) revisions 8, 9 and 10, cancer-specific ICD-O-3, (NOMESCO) procedure codes, Finnish-specific Social Insurance Institute (KELA) drug reimbursement codes and ATC-codes.
These registries spanning decades were electronically linked to the cohort baseline data using the unique national personal identification numbers assigned to all Finnish citizens and residents.
A full list of FinnGen endpoints is for release 10.
The endpoints with fewer than 50 cases, and developmental “helper” endpoints were excluded from the final PheWas (“OMIT” tag in the endpoint definition file).
v4.2 consists of 8,554 WGS of Finnish individuals from 5 research cohorts from:
METSIM (PIs Markku Laakso and Mike Boehnke)
FINRISK (PI Pekka Jousilahti)
Corogene (PI Juha Sinisalo)
Biobank of Eastern Finland (PI Arto Mannermaa)
Finnish EUFAM Dyslipidemia Study (PIs Marja-Riitta Taskinen and Samuli Ripatti)
High-coverage (25x) WGS data used to develop the SISu v4.2 reference panel were generated at the McDonnell Genome Institute at Washington University (PIs Ira Hall and Nathan Stitziel).
We used regenie for release 10. Regenie's main advantages are fast leave-one-chromosome-out relatedness calculation which avoids proximal contamination, and use of an approximate Firth test which gives more reliable effect size estimates for rare variants.
We used regenie version 2.2.4.
Links:
We analyzed:
2,408 endpoints
2,405 binary endpoints
3 quantitative endpoints (HEIGHT_IRN, WEIGHT_IRN, BMI_IRN)
412,181 samples
230,310 females
181,871 males
21,311,942 variants
We included the following covariates in the model: sex, age, 10 PCs, Finngen chip version 1 or 2 , and legacy genotyping batch.
This is a description of the quality control procedures applied before running the GWAS.
The PCA for population structure has been run in the following way:
The sisu version 4.2 imputation panel is pruned iteratively, until a target number of SNPs is reached:
9,641,808 starting variants: only variants with a minimum info score of 0.9 in all batches are kept.
The script starts with [500.0, 50.0, 0.9] params in plink (window,step,r2). It then decreases 0.05 in r2 iteratively pruning the imputation panel until the threshold of 200,000 snps is reached. Once the SNP count falls under 200,000 the closest pruning is returned.
If the higher r2 is closer, 200,000 snps are randomly selected, else the last pruned snps are returned.
Plink flags used: --snps-only --chr 1-22 --max-alleles 2 --maf 0.01 .
For this run 180,037 snps are returned.
Then, FinnGen data was merged with the 1k genome project (1kgp) data, using the variants mentioned above. A round of PCA was performed and a bayesian algorithm was used to spot outliers. This process got rid of 14,547 FinnGen samples. The figure below shows the scatter plots for the first 3 PCs. Outliers, in green, are separated from the FinnGen red cluster.
While the method automatically detected as being outliers the 1kg samples with non European and southern European ancestries, it did not manage to exclude some samples with Western European origins. Since the signal from these samples would have been too small to allow a second round to be performed without detecting substructures of the Finnish population, another approach was used. The FinnGen samples that survived the first round were used to compute another PCA. The EUR and FIN 1kg samples were then projected onto the space generated by the first 3 PCs. Then, the centroid of each cluster was calculated and used to calculate the squared mahalanobis distance of each FinnGen sample to each of the centroids. Being the squared distance a sum of squared variables (with unitary variance, due to the mahalanobis distance), we could see it as a sum of 3 independent squared variables. This allowed us to map the squared distance into a probability (chi squared with 3 degrees of freedom). Therefore, for each cluster, a probability of being part of it was computed. Then, a threshold of 0.95 was used to exclude FinnGen samples whose relative chance of being part of the Finnish cluster was below the level. This method produced another 43 outliers. The figure below shows the first three principal components.
FIN 1kg samples are in purple, while EUR 1kgp samples are in Blue. Samples in green are FinnGen samples who are flagged as being non Finnish, while red ones are considered Finnish.
Then all pairs of FinnGen samples up to second degree were returned. The figure below shows the distribution of kinship values.
Then, the previously defined “non Finnish” samples were excluded and 2 algorithms were used to return a unique subset of unrelated samples:
one called greedy would continuously remove the highest degree node from the network of relations, until no more links are left in the network.
one called native, based on a native implementation of python’s networkx package, performed on each subgraph of the network.
The largest independent set of either algorithm would be used to keep those sample, while flagging the others as “outliers” for the final PCA.
Then, the subset of outliers who also belong to the set of duplicates/twins was identified.
To compute the final step the Finngen samples were ultimately separated in three groups:
259,801 inliers: unrelated samples with Finnish ancestry.
153,927 outliers: non duplicate samples with Finnish ancestries, but who are also related to the inliers.
17,169 rejected samples: either of non Finnish ancestry or are twins/duplicates with relations to other samples.
Finally, the PCA for the inliers was calculated, and then outliers were projected on the same PC space, allowing to calculate covariates for a total of 413,728 samples.
Of the 413,728 non-duplicate population inlier samples from PCA, we excluded 1,543 samples from analysis because of missing minimum phenotype data, and 5 samples because of failing sex check with F thresholds of 0.4 and 0.7. A total of 412,181 samples were used for core analysis. There are 230,310 females and 181,871 males among these samples.
p-value from
effect size (log(OR) scale) estimated with for the alternative allele
standard error of effect size estimated with
(Risteys = intersection in Finnish) allows browsing of the FinnGen data at the phenotype level, including endpoint definitions, statistics about number of individuals, gender distribution, and longitudinal relationships. Please also note the R10 specific page
Documentation from the original developers of the algorithm can be found here: .
The HLA data was imputed from R10 genotype data, using HIBAG models created by Jarmo Ritari from the Finnish Blood Bank. More information can be found in the repository:
https://github.com/FRCBS/HLA-imputation
as well as in the publication:
Ritari J, Hyvä rinen K, Clancy J, FinnGen, Partanen J, Koskela S. Increasing accuracy of HLA imputation by a population-specific reference panel in a Finngen biobank cohort. NAR Genomics and Bioinformatics, Volume 2, Issue 2, June 2020, lqaa030, https://doi.org/10.1093/nargab/lqaa030
Genotype data was constructed from the dosage data using PLINK 2.
A snp-stats report was generated with qctool
Association testing was performed using Regenie 2.2.4. Same settings were used as in the core GWAS analysis. See the Association tests page for more information.
A summary was created from the regenie summary statistic outputs. This summary contains the most significant variant (by p-value) for each phenotype. Pheweb links to phenotype and gene pages have been added as additional columns.
For matters related to this documentation, send us an email to finngen-info@helsinki.fi.
for the latest updates on the project as well as additional background information please consider visiting the study website https://www.finngen.fi/en or follow FinnGen on twitter @FinnGen_FI.
If you want to host FinnGen summary statistics on your website, please get in contact with us at: humgen-servicedesk@helsinki.fi.
Gene-based burden test results of loss of function variants (LoFs).
Loss of function (LoF) variants were generated from vcf files with VEP (https://github.com/Ensembl/ensembl-vep). LoF variants are defined as having consequences in the list [frameshift_variant,splice_donor_variant,stop_gained,splice_acceptor_variant]. Also, a max_maf (0.01) and minimum info score (0.8) filters are applied. This left us with 579 genes with 2+ LoF variants that can be used for the association tests.
We used 2,405 binary phenotypes in the analyses.
We used as inputs the nulls already calculated for GWAS
Tests are performed with regenie --step2 in burden mode using a max mask (i.e. using the maximum number of ALT alleles across sites)
Genotype imputation was done with the population-specific SISu v4.2 reference panel.
The reference panel variant call set was produced with the GATK HaplotypeCaller algorithm by following GATK best practices for variant calling.
Genotype-, sample- and variant-wise QC was carried out iteratively by using the Hail framework v0.2 and the resulting high-quality WGS data for 8,554 individuals were phased with Eagle 2.3.5 as described in the previous section.
Genotype imputation was carried out by using the population-specific SISu v4.2 imputation reference panel with Beagle 4.1 (version 27Jan18.7e1) as described in the following protocol: dx.doi.org/10.17504/protocols.io.xbgfijw.
Post-imputation quality control involved checking the expected conformity of the imputation INFO-value distribution, MAF differences between the target dataset and the imputation reference panel and checking chromosomal continuity of the imputed genotype calls.
We used two state-of-the-art methods, FINEMAP (Benner, C. et al., 2016; Benner, C. et al., 2018) and SuSiE (Wang, G. et al., 2020) to fine-map genome-wide significant loci in FinnGen endpoints.
Briefly, there are three main steps:
For each genome-wide significant locus (default configuration: P < 5e-8), we define a fine-mapping region by taking a 3 Mb window around a lead variant (and merge regions if they overlap). If a merged window exceeds 10MB, we iteratively shrink the window by 10%, until the merged window fits into 10MB or is split into merged windows that each fit into 10MB. We preprocess an input GWAS summary statistics into separate files per region for the following steps.
We compute in-sample dosage LD using LDstore2 for each fine-mapping region.
With the inputs of summary statistics and in-sample LD from the steps 1-2, we conduct fine-mapping using FINEMAP and SuSiE with the maximum number of causal variants in a locus L = 10.
Due to too many variants and too large LD regions the following endpoints were run with a refined region selection algorithm that limits the region's size to a set maximum length:
G6_MUSDYST
G6_MYONEU
Q17_CONGEN_MALFO_GALLB_BILE_DUCTS_LIVER
Q17_OTHER_CONGEN_MALFO_DIGES_SYSTEM1
Q17_OTHER_CONGEN_MALFO_SKIN
RX_STATIN
BMI_IRN
HEIGHT_IRN
The "Credible Sets"-table on a phenotype page in the R10 PheWeb browser shows the SuSiE-fine-mapped credible sets of that phenotype. The variant shown per credible set is the maximum PIP (posterior inclusion probability) variant of that credible set. In addition to the causal variants, variants that were in sufficient LD (Pearson r^2 > 0.05), had a small enough p-value (pval < 0.01), and were close enough to the lead variant (distance to lead variant < 1.5 megabases) were clumped together with the credible set. Variants have been compared against GWAS Catalog and annotated. The LD grouping, annotation and GWAS Catalog comparison were done using the autoreporting pipeline.
The columns of the table are explained below:
Compiled summary stats for proteomics QTL in imputed SNPs on FINNGEN. Two platforms are included, Olink (619 samples across 2925 proteins) and SomaScan (828 samples across 7596 proteins).
pQTL: The association results are from PLINK2 and in unrelated samples only (Relatedness cutoff based on genotype correlation r > 0.0625 for HapMap3 SNPs). Due to different sample size, the SNP set are not same between those two dataset.
Finemap: see Fine-mapping page, we input the summary data from pQTL
autoreporting: annotation of independent hits lead variants with e.g. gwas catalog, putative causal coding variants in credible set, gnomad allele frequencies
See the "Software" section for the software or pipeline we used.
/Somascan: from Somascan V4.1 assay, 828 unrelated samples from R12 imputed genotype
/Olink: from Olink Explore 3072 library, 619 unrelated samples from R12 imputed genotype
File: all_gene_info.txt
Olink: Other annotations from Olink as indicated by the column name.
Somascan: Other annotations from SomaScan as indicated by the column name. Note: NA means no matching, but keep the probe in the data to be complete.
Assocation: PLINK2 v2.00a3.3LM AVX2 Intel (3 Jun 2022)
Finemap: FINNGEN/finemapping-pipeline, revised on R10, hash e0792ea
Coloc: FINNGEN/colocalization, revised on R10, hash 2e0632c
Autoreporting: FINNGEN/autoreporting, revised on hash 9dbea66
Timeline for releases:
[1] samples used for PheWAS.
Field | Description |
---|---|
Field | Description |
---|---|
Release
Date release to partners
Date release to public
Total sample size [1]
R2
Q4 2018 (Nov)
Q1 2020
96,499
R3
Q2 2019 (May)
Q2 2020
135,638
R4
Q4 2019 (Oct)
Q4 2020
176,899
R5
Q2 2020 (March)
Q2 2021
218,792
R6
Q3 2020
Q1 2022
260,405
R7
Q2 2021
Q2 2022
309,154
R8
Q3 2021
Q4 2022
342,499
R9
Q1 2022
Q2 2023
377,277
R10
Q3 2022
Q4 2023
412,181
R11
Q1 2023
~Q1 2024
~445,000
R12
Q3 2023
~Q3 2024
~500,000
Column name
Explanation
top PIP variant
variant with largest PIP int he credible set. Click the arrow to the left of the variant to show the credible set variants.
CS quality
This column shows whether the credible set is well-formed. a 'true' value means that the credible set is likely trustworthy, and a 'false' value means that the credible set is likely not trustworthy.
chromosome
The chromosome in which the credible set lies.
position
The position of the lead variant
p-value
p-value of the top PIP variant.
-log10(p)
-log10(p-value)
effect size (beta)
effect size of the top PIP variant.
Finnish Enrichment
Finnish enrichment of the top PIP variant.
Alternate allele frequency
alternate allele frequency of the top PIP variant.
Lead Variant Gene
A probable gene of the top PIP variant.
# coding in cs
number of coding variants in the credible set. Hover over the number to see the variant, the consequence, and the correlation (pearsonr squared) to the lead variant.
# credible variants
number of variants in the credible set.
Credible set bayes factor (log10)
The bayes factor related to the credible set.
CS matching Traits
Number of matches found in GWAS Catalog for the credible set variants. Hover over the number to see the trait, as well as the associated variant's LD (pearsonr squared) to the lead variant.
LD Partner Traits
Number of matches found in GWAS Catalog to the group of credible variants and variants in LD with the top PIP variant.Hover over the numbr to see the trait, as well as the associated variant's LD (pearsonr squared) to the lead variant.
CHR
chromosome for variants
POS
BP of the variants
ID
SNP name (CHR_POS_REF_ALT)
REF
reference allele provided in FINNGEN imputed data
ALT
alternative allele, this is the effect allele (aka. A1, effect allele, A0 in some software)
ALT_FREQ
allele frequency of the alternative allele
BETA
effect size in additive model
SE
standard error of the effect size
T_STAT
t statistics from PLINK2
P
p-value in association test
log10_P
-log10(P) keep extra precision when P < 10^-308
N
per-SNP sample size for the SNP
Probe
probe name in the data/results gene
geneName
mapping to the gene name (GeneCode V43)
CHR
gene CHR
start
gene start position
end
gene end position
strand
gene strand