Detailed longitudinal data

This page has been last updated for R11.

Service sector data contains detailed longitudinal data with additional columns.

Sandbox directory

Detailed longitudinal data is available in the following Sandbox directory:

/finngen/library-red/finngen_R[RELEASE]/phenotype_1.0/

Note, this data also available in Atlas and OMOP common data model.

Data files

The data is available in the following file:

data/finngen_R{RELEASE]_detailed_longitudinal_1.0.txt.gz

The data file has eleven columns:

Column

Description

FINNGEN ID

Sample ID

SOURCE

Register source

EVENT_AGE

Individual's age at the event to two decimals

APPROX_EVENT_DAY

A randomized event date: +/- 1-15 days are added to the confidential exact event date

CODE1 - CODE4

Register source specific codes and other information

ICDVER

ICD-code version: ICD8/9/10, ICD-O-3

CATEGORY

Register code sets (vocabularies)

INDEX

Register index number. The same INDEX value within a register means that the codes have been given in the same hospital visit, or are, for example, from the same drug purchase event.

Detailed information about the columns is available in the following file:

finngen_R[RELEASE]_detailed_longitudinal_readme_1.0.txt

This is the main register data in FinnGen and contains health register codes from different sources. The data is called longitudinal because it contains several events/entries of codes for the same individual recorded at different times.

Detailed longitudinal data was presented in the FinnGen data users meeting on 12th January 2021 and can be explored using Atlas in the Sandbox.

The data is used as input for the Endpointter to determine which individuals are included in each of FinnGen's phenotypic endpoints:

A mock example of the detailed longitudinal data file is shown below:

Sources

The detailed longitudinal data file contains codes from the following registries:

Source

Description

Types of codes

PURCH

Kela drug purchase register

Medication codes

REIMB

Kela drug reimbursement register

Medication codes

INPAT

Inpatient Hilmo register

Diagnosis codes

OPER_IN

Inpatient Hilmo register - operations

Operation codes

OUTPAT

Specialist outpatient Hilmo register

Diagnosis codes

OPER_OUT

Specialist outpatient Hilmo - register operations

Operation codes

PRIM_OUT

Primary health care outpatient visits

Diagnosis and operation codes

CANC

Cancer register

Cancer codes

DEATH

Cause of death register

Cause of death codes

Image below demonstrates how variables in the national health registries end up to the columns in the detailed longitudinal data.

Codes

Information stored in the CODE1 - CODE4 column depends on the register source:

Source

Description

Codes

PURCH

Kela drug purchase register

CODE1: ATC code

CODE2: Kela reimbursment code

CODE3: Product number

CODE4: Number of packages

REIMB

Kela drug reimbursement register

CODE1: Kela reimbursment code

CODE2: ICD code

INPAT OUTPAT

Inpatient Hilmo register

Specialist outpatient Hilmo register

CODE1: symptom code

CODE2: cause code (e.g. CODE1 could be dementia associated with Alzheimer’s disease with CODE2 as Alzheimer's disease)

CODE3: ATC code for drug's adverse effect

CODE4: duration of stay

OPER_IN

OPER_OUT

Inpatient Hilmo register - operations

Specialist outpatient Hilmo register - operations

CODE1: operation code

PRIM_OUT

Primary health care outpatient visits

CODE1: diagnosis or operation code

CODE2: symptom code

CODE3: ATC code for drug's adverse effect

CANC

Cancer register

CODE1: ICD-0-3 topography

CODE2: ICD-0-3 morphology

CODE3: ICD-0-3 behaviour

DEATH

Cause of death register

CODE1: cause of death

Categories

The register code sets (vocabularies) are stored in the CATEGORY column:

Detailed information about register code sets is available from:

For diagnosis codes:

  • 0 at the end of the CATEGORY variable means the main diagnosis code (e.g. 0 in ICD0 and NOM0)

  • 1:N at the end of the CATEGORY variable refers to side diagnoses (e.g. 1:N in ICD1:N, NOM1:N)

Register data availability dates

Data is available from the register start date until the end of the register-specific follow-up date. The follow-up dates are available here and in the following file:

finngen_R[RELEASE]_detailed_longitudinal_readme_1.0.txt

The start and follow-up dates differ between registries and FInnGen data releases and you should take this into account in your analyses. Register start and follow-up dates are shown below for Data Freeze 6:

Further information

Last updated