Minimum extended phenotype data
This page has been last updated for R12.
The minimum extended phenotype data file was introduced in Data Release 11. It contains data previously released in three separate files: minimum phenotype data, cohort data and baseline data file.
Sandbox directory
The minimum extended phenotype data file is available as a separate file in the following Sandbox directory:
/finngen/library-red/finngen_R[RELEASE]/phenotype_1.0/
Data files
This data is available in the following file:
data/finngen_R{RELEASE]_minimum_extended_1.0.txt.gz
The samples in the file are in the same order as in the genotype data files. The file contains the following columns:
APPROX_BIRTH_DATE was first released in FinnGen data release 11. BMI, CURRENT_SMOKER and EVER_SMOKER were first released in FinnGen data release 12.
Column
Description
FINNGENID
Sample ID
BL_YEAR
Year of DNA sample collection
BL_AGE
Age at DNA sample collection (years)
SEX
Gender (male/female/NA)
HEIGHT
Height (cm)
HEIGHT_AGE
Age at height measurement (years)
WEIGHT
Weight (kg)
WEIGHT_AGE
Age at weight measurement (years)
BMI
Body mass index
SMOKE2
Smoking status 2-categories (yes/no)
SMOKE3
Smoking status 3-categories (current/former/never)
SMOKE5
Smoking status 5-categories (current/occasional/quitter/former/never)
SMOKE_AGE
Age at the moment of the smoking survey (years)
CURRENT_SMOKER
cases: SMOKE3 variable category="current", controls: SMOKE3 variable category="never"
EVER_SMOKER
cases: SMOKE3 variable category="current"/"former", controls: SMOKE3 variable category="never"
regionofbirth
Regional councils numbers for region of birth according to Finnish Minister of the Interior (21-categories)
regionofbirthname
Name of the region of birth (21-categories) (1-Uusimaa 2- Varsinais-Suomi 4-Satakunta 5-Kanta Häme 6-Pirkanmaa 7-Päijät Häme 8-Kymenlaakso 9-South Karelia 10-Etelä Savo 11-Pohjois Savo 12-North Karelia 13-Central Finland 14-South Ostrobothnia 15-Ostrobothnia 16-Central Ostrobothnia 17-North Ostrobothnia 18-Kainuu 19-Lapland 20-Åland 200-Abroad 9999-Region ceded to Soviet)
moveabroad
If the person has moved abroad 3-categories (yes/no/NA)
NUMBER_OF_OFFSPRING
Number of biological children
COHORT
Biobank collection name
FU_END_AGE
Age at the end of the follow up; Age at the time when register follow-up ends in registers; age of death if individual has died, age of age of emigration if a person has moved abroad.
DEATH
Death; 1=death by the end of death registry, 0=alive at the end of death registry.
DEATH_AGE
Age at death; Age of death if individual has died, age of age of emigration if a person has moved abroad, or age at the time when register follow-up ends in registers.
DEATH_YEAR
Year of death
APPROX_BIRTH_DATE
Randomized birth day (within +/- 1-15 days)
*In the DF12 minimum extended data: AGE_AT_DEATH_OR_END_OF_FOLLOW_UP and DEATH_FU_AGE columns replaced column FU_END_AGE containing the same information.
If biobanks do not know the exact DNA sample collection date, they have been instructed to estimate it using age and the birth date extracted from the Finnish personal identity code. If the sample collection day is missing, the date is estimated to be 15.mm.yyyy
. If the month is missing, the sample collection date is estimated to be 30.06.yyyy
. In some cases, the sample collection date is not available and is impossible to estimate reliably.
Biobanks are instructed to report dates for height, weight, and smoking status. These dates are compared against calculated values from the DNA sample collection date and the birth date extracted from the Finnish personal identity code. If values differ, clarification is asked from the biobank.
Biobanks have been asked to provide all available information about smoking. Some biobanks send information if an individual is a current smoker, while some provide more detailed smoking information.
Many other quality checks are performed as well. For example, a BMI needs to be between 10-80 kg/m2 when checking height and weight, and dates cannot be from the future. The sex reported by biobanks is compared against the personal identity code. If values differ, clarification is asked from the biobank.
Further information
Last updated