FinnGen Data Freezes and Releases
During the active sample collection phase during FinnGen 1 and 2, the Data Freeze (DF) happened twice a year in February and August. At these times, FinnGen produced Data Release (R) of updated genotype and phenotype data files to the Sandbox.
During FinnGen 3 we no longer increase the number of samples, but plan to update the health register data three times on Feb 2025 (R13), Feb 2026 (R14) and Feb 2027 (R15).
About 3 months after each release, the FinnGen Core Analysis team releases a set of FinnGen core analysis results to the Google Cloud storage green bucket (gs://finngen-production-library-green
): containing, but not limited to: GWAS summary statistics, finemapping, colocalization, and autoreporting results.
Note: you will not be able to run GWAS immediately when the first genotype and phenotype files are released because the analysis team needs to generate the covariate files for the new release. This usually takes place sometime midway between the releases and when they are ready it is announced at Users' meeting and on FinnGen Community Slack.
Timeline of data releases: sample size and endpoints
Data Release | Date release to partners | Date release to public | Total sample size [1] | Total endpoints | Core endpoints [2] |
---|---|---|---|---|---|
R2 | Q4 2018 | Q1 2020 | 96,499 | 1,485 | 1,485 |
R3 | Q2 2019 | Q2 2020 | 135,638 | 2,737 | 1,801 |
R4 | Q4 2019 | Q4 2020 | 176,899 | 3,452 | 2,444 |
R5 | Q2 2020 | Q2 2021 | 218,792 | 3,858 | 2,803 |
R6 | Q3 2020 | Q3 2021 | 260,405 | 3,995 | 2,861 |
R7 | Q1 2021 | Q1 2022 | 309,154 | 4,149 | 3,095 |
R8 | Q3 2021 | Q3 2022 | 342,499 | 4,431 | 2,202 |
R9 | Q1 2022 | Q2 2023 | 377,277 | 4,526 | 2,272 |
R10 | Q3 2022 | Q4 2023 | 412,181 | 4,519 | 2,408 |
R11 | Q1 2023 | ~Q2 2024 | ~445,000 | 4,415 | 2,444 |
R12 | Q3 2023 | ~Q4 2024 | ~480,000 | 4,421 | 2,469 |
R13 | Q1 2024 | ~Q1 2025 | ~500,000 | NA |
[1] total endpoint definitions [2] endpoints used for core GWAS and PheWAS.
Timeline of data releases: N in imputation, endpoints, and registry data
Release | Individuals with imputed genotypes | Individuals with minimum phenotypes | Individuals with endpoints | Individuals with detailed longitudinal data | Latest version (endpoint and detailed longitudinal data) |
---|---|---|---|---|---|
R1 | 52,295 | 53,866 | 53,866 | - | v4.0 |
R2 | 102,739 | 103,695 | 103,695 | - | v2.0 |
R3 | 146,630 | 152,796 | 152,796 | - | v1.0 |
R4 | 183,694 | 196,849 | 198,328 | 211,154 | v2.0 |
R5 | 224,737 | 224,650 | 224,566 | 224,438 | v3.0 |
R6 | 271,341 | 271,343 | 269,718 | 271,123 | v2.1 |
R7 | 321,464 | 321,464 | 321,302 | 320,953 | v4.0 and v2.0 |
R8 | 356,213 | 356,138 | 356,077 | 356,082 | v4.0 and v3.0 |
R9 | 392,649 | 392,649 | 392,423 | 329,539 | v1.0 |
R10 | 430,897 | 430,885 | 429,209 | 429,861 | v1.0 |
R11 | 473,681 | 473,681 | 473,681 | 473,580 | v1.0 |
R12 | 520,210 | 520,210 | 520,210 | 520,105 | v1.0 |
[1] total endpoint definitions [2] endpoints used for core GWAS and PheWAS.
Number of individuals with genotypes and phenotypes in FinnGen Data Releases. Starting from R5 the phenotype data has been filtered by genotyped individuals. Endpoint data includes individuals who have baseline data. Detailed longitudinal data includes only those individuals who have register data available. Some individuals in the phenotype data have been removed in QC steps. Data files can be found from Sandbox: /finngen/library-red/
Following data files are released to Sandbox per each Data Release
other registry data files (periodically updated and released to Sandbox)
Core Analysis Results files (released to FinnGen Production Library Green per each Data Release)
Here is the expected schedule for the next data freeze file releases.
Last updated