Data Masking/Blurring of Visit Dates

All FinnGen individual-level data is pseudo-anonymised: Personal Identity codes (PICs) are replaced by FinnGen IDs, and only pseudo-anonymised individual-level data can be found in Sandbox.

< 5 cases rules

In DF1-DF7 all register codes with less that five cases within detailed longitudinal data, and all endpoints with less than 5 cases in endpoint and longitudinal endpoint data have been removed from the data.

DF8v3 onwards all register codes in detailed longitudinal data and all endpoints in endpoint and endpoint longitudinal data, also those with less than 5 cases, are included in the data released to the Sandbox.

Randomized event days

In order to protect individual-level data, exact event days cannot be released with phenotype data. Exact event dates are randomized to an approximated event day (APPROX_EVENT_DAY) by adding +/- 1-15 days (offset) to the exact event day.

The number added to the exact event day is consistent within individual (individual-specific), meaning that the same number (offset) is added to all events of the individual.

  • Until DF10, offset is not consistent across registers. The APPROX_DAY is usually calculated separately in each register (eg. reproductive history data vs. service sector data). However in the detailed longitudinal data the same individual-specific offset is used for particular individual in all registers included in the data.

  • From DF11 forward offset is consistent across registers. Same offset per person (consistent for all event of 1 person) is used for all FinnGen register files.

Last updated