Minimum longitudinal data
This page has been last updated for R11.
Sandbox directory
The minimum longitudinal data file is available as a separate file in the following Sandbox directory:
/finngen/library-red/finngen_R[RELEASE]/phenotype_1.0/
Data files
This data is available in the following file:
data/finngen_R{RELEASE]_minimum_longitudinal_1.0.txt.gz
Arctic, Eastern Finland and THL biobanks have provided this data. The file contains the following columns:
Column | Description |
SAMPLE_AGE_1 - SAMPLE_AGE_2 | Age at DNA sample collection (years), 2 observations |
SEX | Gender (male/female/NA) |
HEIGHT_1 - HEIGHT_7 | Height (cm), 7 observations |
HEIGHT_1_AGE - HEIGHT_7_AGE | Age at height measurement (years), 7 observations |
WEIGHT_1 - WEIGHT_18 | Weight (kg), 18 observations |
WEIGHT_1_AGE - WEIGHT_18_AGE | Age at weight measurement (years), 18 observations |
SMOKE2_1 - SMOKE2_2 | Smoking status 2-categories (yes/no) , 2 observations |
SMOKE3_1 - SMOKE3_2 | Smoking status 3-categories (current/former/never), 2 observations |
SMOKE5_1 - SMOKE5_2 | Smoking status 5-categories (current/occasional/quitter/former/never) , 2 observations |
SMOKE_1_AGE - SMOKE_2_AGE | Age at the moment of the smoking survey (years) , 2 observations |
The data contains multiple observations of the same variables as in the minimum extended phenotype data file (age, sex, height, weight, smoking, and date variables).
DNA samples have been collected at one or two different dates as well as smoking information. Every individual has at least one DNA sample date. Few individuals might have more than 30 observations from height and weight variables but the vast majority have less than seven observations from height and less than 18 from weight so those amounts have been chosen for height and weight observations in the final data file. If a biobank does not know the exact date for the DNA sample, they have been instructed to estimate the DNA sample date from the age variable. If the day is missing from the DNA sample date, it is estimated as 15.mm.yyyy
. If also the month is missing, though en the DNA sample date is estimated as30.06.yyyy.
The sample age is calculated from the DNA sample date and birthday. Biobanks have reported height date, weight date and smoking date for each observation. The observations can have different time periods between each other ranging from days to years between observations. The age and sex sent by biobanks have been compared with the values from the DNA sample date and personal identity code. If these values differ then clarification has been requested from the biobank. Other QC checks have also been done. For example, a BMI needs to be between 10-80 when checking height and weight, and dates cannot be from the future. Biobanks have been asked to provide all available information about smoking.
Last updated