Minimum extended phenotype data
This page has been last updated for R12.
The minimum extended phenotype data file was introduced in Data Release 11. It contains data previously released in three separate files: minimum phenotype data, cohort data and baseline data file.
Sandbox directory
The minimum extended phenotype data file is available as a separate file in the following Sandbox directory:
/finngen/library-red/finngen_R[RELEASE]/phenotype_1.0/
Data files
This data is available in the following file:
data/finngen_R{RELEASE]_minimum_extended_1.0.txt.gz
The samples in the file are in the same order as in the genotype data files. The file contains the following columns:
APPROX_BIRTH_DATE was first released in FinnGen data release 11. BMI, CURRENT_SMOKER and EVER_SMOKER were first released in FinnGen data release 12.
Column | Description |
FINNGENID | Sample ID |
BL_YEAR | Year of DNA sample collection |
BL_AGE | Age at DNA sample collection (years) |
SEX | Gender (male/female/NA) |
HEIGHT | Height (cm) |
HEIGHT_AGE | Age at height measurement (years) |
WEIGHT | Weight (kg) |
WEIGHT_AGE | Age at weight measurement (years) |
BMI | Body mass index |
SMOKE2 | Smoking status 2-categories (yes/no) |
SMOKE3 | Smoking status 3-categories (current/former/never) |
SMOKE5 | Smoking status 5-categories (current/occasional/quitter/former/never) |
SMOKE_AGE | Age at the moment of the smoking survey (years) |
CURRENT_SMOKER | cases: SMOKE3 variable category="current", controls: SMOKE3 variable category="never" |
EVER_SMOKER | cases: SMOKE3 variable category="current"/"former", controls: SMOKE3 variable category="never" |
regionofbirth | Regional councils numbers for region of birth according to Finnish Minister of the Interior (21-categories) |
regionofbirthname | Name of the region of birth (21-categories) (1-Uusimaa 2- Varsinais-Suomi 4-Satakunta 5-Kanta Häme 6-Pirkanmaa 7-Päijät Häme 8-Kymenlaakso 9-South Karelia 10-Etelä Savo 11-Pohjois Savo 12-North Karelia 13-Central Finland 14-South Ostrobothnia 15-Ostrobothnia 16-Central Ostrobothnia 17-North Ostrobothnia 18-Kainuu 19-Lapland 20-Åland 200-Abroad 9999-Region ceded to Soviet) |
moveabroad | If the person has moved abroad 3-categories (yes/no/NA) |
NUMBER_OF_OFFSPRING | Number of biological children |
COHORT | Biobank collection name |
FU_END_AGE | Age at the end of the follow up; Age at the time when register follow-up ends in registers; age of death if individual has died, age of age of emigration if a person has moved abroad. |
DEATH | Death; 1=death by the end of death registry, 0=alive at the end of death registry. |
DEATH_AGE | Age at death; Age of death if individual has died, age of age of emigration if a person has moved abroad, or age at the time when register follow-up ends in registers. |
DEATH_YEAR | Year of death |
APPROX_BIRTH_DATE | Randomized birth day (within +/- 1-15 days) |
*In the DF12 minimum extended data: AGE_AT_DEATH_OR_END_OF_FOLLOW_UP and DEATH_FU_AGE columns replaced column FU_END_AGE containing the same information.
If biobanks do not know the exact DNA sample collection date, they have been instructed to estimate it using age and the birth date extracted from the Finnish personal identity code. If the sample collection day is missing, the date is estimated to be 15.mm.yyyy
. If the month is missing, the sample collection date is estimated to be 30.06.yyyy
. In some cases, the sample collection date is not available and is impossible to estimate reliably.
Biobanks are instructed to report dates for height, weight, and smoking status. These dates are compared against calculated values from the DNA sample collection date and the birth date extracted from the Finnish personal identity code. If values differ, clarification is asked from the biobank.
Biobanks have been asked to provide all available information about smoking. Some biobanks send information if an individual is a current smoker, while some provide more detailed smoking information.
Many other quality checks are performed as well. For example, a BMI needs to be between 10-80 kg/m2 when checking height and weight, and dates cannot be from the future. The sex reported by biobanks is compared against the personal identity code. If values differ, clarification is asked from the biobank.
Further information
Last updated