Minimum longitudinal data

This page has been last updated for R11.

Sandbox directory

The minimum longitudinal data file is available as a separate file in the following Sandbox directory:

/finngen/library-red/finngen_R[RELEASE]/phenotype_1.0/

Data files

This data is available in the following file:

data/finngen_R{RELEASE]_minimum_longitudinal_1.0.txt.gz

Arctic, Eastern Finland and THL biobanks have provided this data. The file contains the following columns:

Column

Description

SAMPLE_AGE_1 - SAMPLE_AGE_2

Age at DNA sample collection (years), 2 observations

SEX

Gender (male/female/NA)

HEIGHT_1 - HEIGHT_7

Height (cm), 7 observations

HEIGHT_1_AGE - HEIGHT_7_AGE

Age at height measurement (years), 7 observations

WEIGHT_1 - WEIGHT_18

Weight (kg), 18 observations

WEIGHT_1_AGE - WEIGHT_18_AGE

Age at weight measurement (years), 18 observations

SMOKE2_1 - SMOKE2_2

Smoking status 2-categories (yes/no) , 2 observations

SMOKE3_1 - SMOKE3_2

Smoking status 3-categories (current/former/never), 2 observations

SMOKE5_1 - SMOKE5_2

Smoking status 5-categories (current/occasional/quitter/former/never) , 2 observations

SMOKE_1_AGE - SMOKE_2_AGE

Age at the moment of the smoking survey (years) , 2 observations

The data contains multiple observations of the same variables as in the minimum extended phenotype data file (age, sex, height, weight, smoking, and date variables).

DNA samples have been collected at one or two different dates as well as smoking information. Every individual has at least one DNA sample date. Few individuals might have more than 30 observations from height and weight variables but the vast majority have less than seven observations from height and less than 18 from weight so those amounts have been chosen for height and weight observations in the final data file. If a biobank does not know the exact date for the DNA sample, they have been instructed to estimate the DNA sample date from the age variable. If the day is missing from the DNA sample date, it is estimated as 15.mm.yyyy. If also the month is missing, though en the DNA sample date is estimated as30.06.yyyy.

The sample age is calculated from the DNA sample date and birthday. Biobanks have reported height date, weight date and smoking date for each observation. The observations can have different time periods between each other ranging from days to years between observations. The age and sex sent by biobanks have been compared with the values from the DNA sample date and personal identity code. If these values differ then clarification has been requested from the biobank. Other QC checks have also been done. For example, a BMI needs to be between 10-80 kg/m2kg/m^2 when checking height and weight, and dates cannot be from the future. Biobanks have been asked to provide all available information about smoking.

Last updated