FinnGen Data Freezes and Releases

During the active sample collection phase during FinnGen 1 and 2, the Data Freeze (DF) happened twice a year in February and August. At these times, FinnGen produced Data Release (R) of updated genotype and phenotype data files to the Sandbox.

During FinnGen 3 we no longer increase the number of samples, but plan to update the health register data three times on Feb 2025 (R13), Feb 2026 (R14) and Feb 2027 (R15).

About 3 months after each release, the FinnGen Core Analysis team releases a set of FinnGen core analysis results to the Google Cloud storage green bucket (gs://finngen-production-library-green): containing, but not limited to: GWAS summary statistics, finemapping, colocalization, and autoreporting results.

Note: you will not be able to run GWAS immediately when the first genotype and phenotype files are released because the analysis team needs to generate the covariate files for the new release. This usually takes place sometime midway between the releases and when they are ready it is announced at Users' meeting and on FinnGen Community Slack.

Timeline of data releases: sample size and endpoints

Data ReleaseDate release to partnersDate release to publicTotal sample size [1]Total endpointsCore endpoints [2]

R2

Q4 2018

Q1 2020

​96,499​​

1,485

1,485

R3

Q2 2019

Q2 2020

135,638

2,737

1,801

R4

Q4 2019

Q4 2020

176,899

3,452

2,444

R5

Q2 2020

Q2 2021

218,792

3,858

2,803

R6

Q3 2020

Q3 2021

260,405

3,995

2,861

R7

Q1 2021

Q1 2022

309,154

4,149

3,095

R8

Q3 2021

Q3 2022

342,499

4,431

2,202

R9

Q1 2022

Q2 2023

377,277

4,526

2,272

R10

Q3 2022

Q4 2023

412,181

4,519

2,408

R11

Q1 2023

~Q2 2024

~445,000

4,415

2,444

R12

Q3 2023

~Q4 2024

~480,000

4,421

2,469

R13

Q1 2024

~Q1 2025

~500,000

NA

[1] total endpoint definitions [2] endpoints used for core GWAS and PheWAS.

Timeline of data releases: N in imputation, endpoints, and registry data

ReleaseIndividuals with imputed genotypesIndividuals with minimum phenotypesIndividuals with endpointsIndividuals with detailed longitudinal dataLatest version (endpoint and detailed longitudinal data)

R1

52,295

53,866

53,866

-

v4.0

R2

102,739

103,695

103,695

-

v2.0

R3

146,630

152,796

152,796

-

v1.0

R4

183,694

196,849

198,328

211,154

v2.0

R5

224,737

224,650

224,566

224,438

v3.0

R6

271,341

271,343

269,718

271,123

v2.1

R7

321,464

321,464

321,302

320,953

v4.0 and v2.0

R8

356,213

356,138

356,077

356,082

v4.0 and v3.0

R9

392,649

392,649

392,423

329,539

v1.0

R10

430,897

430,885

429,209

429,861

v1.0

R11

473,681

473,681

473,681

473,580

v1.0

R12

520,210

520,210

520,210

520,105

v1.0

[1] total endpoint definitions [2] endpoints used for core GWAS and PheWAS.

Number of individuals with genotypes and phenotypes in FinnGen Data Releases. Starting from R5 the phenotype data has been filtered by genotyped individuals. Endpoint data includes individuals who have baseline data. Detailed longitudinal data includes only those individuals who have register data available. Some individuals in the phenotype data have been removed in QC steps. Data files can be found from Sandbox: /finngen/library-red/

Following data files are released to Sandbox per each Data Release

Here is the expected schedule for the next data freeze file releases.

Last updated