Minimum extended phenotype data

This page has been last updated for R12.

The minimum extended phenotype data file was introduced in Data Release 11. It contains data previously released in three separate files: minimum phenotype data, cohort data and baseline data file.

Sandbox directory

The minimum extended phenotype data file is available as a separate file in the following Sandbox directory:

/finngen/library-red/finngen_R[RELEASE]/phenotype_1.0/

Data files

This data is available in the following file:

data/finngen_R{RELEASE]_minimum_extended_1.0.txt.gz

The samples in the file are in the same order as in the genotype data files. The file contains the following columns:

APPROX_BIRTH_DATE was first released in FinnGen data release 11. BMI, CURRENT_SMOKER and EVER_SMOKER were first released in FinnGen data release 12.

Column

Description

FINNGENID

Sample ID

BL_YEAR

Year of DNA sample collection

BL_AGE

Age at DNA sample collection (years)

SEX

Gender (male/female/NA)

HEIGHT

Height (cm)

HEIGHT_AGE

Age at height measurement (years)

WEIGHT

Weight (kg)

WEIGHT_AGE

Age at weight measurement (years)

BMI

Body mass index

SMOKE2

Smoking status 2-categories (yes/no)

SMOKE3

Smoking status 3-categories (current/former/never)

SMOKE5

Smoking status 5-categories (current/occasional/quitter/former/never)

SMOKE_AGE

Age at the moment of the smoking survey (years)

CURRENT_SMOKER

cases: SMOKE3 variable category="current", controls: SMOKE3 variable category="never"

EVER_SMOKER

cases: SMOKE3 variable category="current"/"former", controls: SMOKE3 variable category="never"

regionofbirth

Regional councils numbers for region of birth according to Finnish Minister of the Interior (21-categories)

From Digital and Population Data Services Agency (DVV)

regionofbirthname

Name of the region of birth (21-categories) (1-Uusimaa 2- Varsinais-Suomi 4-Satakunta 5-Kanta Häme 6-Pirkanmaa 7-Päijät Häme 8-Kymenlaakso 9-South Karelia 10-Etelä Savo 11-Pohjois Savo 12-North Karelia 13-Central Finland 14-South Ostrobothnia 15-Ostrobothnia 16-Central Ostrobothnia 17-North Ostrobothnia 18-Kainuu 19-Lapland 20-Åland 200-Abroad 9999-Region ceded to Soviet)

From Digital and Population Data Services Agency (DVV)

moveabroad

If the person has moved abroad 3-categories (yes/no/NA)

From Digital and Population Data Services Agency (DVV)

NUMBER_OF_OFFSPRING

Number of biological children

From Digital and Population Data Services Agency (DVV)

COHORT

Biobank collection name

FU_END_AGE

Age at the end of the follow up; Age at the time when register follow-up ends in registers; age of death if individual has died, age of age of emigration if a person has moved abroad.

DEATH

Death; 1=death by the end of death registry, 0=alive at the end of death registry.

DEATH_AGE

Age at death; Age of death if individual has died, age of age of emigration if a person has moved abroad, or age at the time when register follow-up ends in registers.

DEATH_YEAR

Year of death

APPROX_BIRTH_DATE

Randomized birth day (within +/- 1-15 days)

*In the DF12 minimum extended data: AGE_AT_DEATH_OR_END_OF_FOLLOW_UP and DEATH_FU_AGE columns replaced column FU_END_AGE containing the same information.

If biobanks do not know the exact DNA sample collection date, they have been instructed to estimate it using age and the birth date extracted from the Finnish personal identity code. If the sample collection day is missing, the date is estimated to be 15.mm.yyyy. If the month is missing, the sample collection date is estimated to be 30.06.yyyy. In some cases, the sample collection date is not available and is impossible to estimate reliably.

Biobanks are instructed to report dates for height, weight, and smoking status. These dates are compared against calculated values from the DNA sample collection date and the birth date extracted from the Finnish personal identity code. If values differ, clarification is asked from the biobank.

Biobanks have been asked to provide all available information about smoking. Some biobanks send information if an individual is a current smoker, while some provide more detailed smoking information.

Many other quality checks are performed as well. For example, a BMI needs to be between 10-80 kg/m2 when checking height and weight, and dates cannot be from the future. The sex reported by biobanks is compared against the personal identity code. If values differ, clarification is asked from the biobank.

Further information

Extracting minimum phenotype data by biobank

DNA isolation protocols by biobank

Last updated