Endpoint and endpoint longitudinal data

This page has been last updated for R12.

Sandbox directory

The endpoint and endpoint longitudinal data files are available in the following Sandbox directory:

/finngen/library-red/finngen_R[RELEASE]/phenotype_1.0/

Data files

This endpoint data is available in the following file:

data/finngen_R{RELEASE]_endpoint_1.0.txt.gz

This endpoint longitudinal data is available in the following file:

data/finngen_R{RELEASE]_endpoint_longitudinal_1.0.txt.gz

Endpoints

FinnGen endpoints have been created to comprehensively cover the spectrum of medical diagnoses from all medical areas. They follow the treelike subtyping system of the ICD-10 classification system, where classification starts from the anatomy-based upper-level chapters and then divides step-by-step into more detailed subtypes (here is an example tree for ICD10).

In FinnGen, this treelike subtyping was followed to a level of detail that was seen as reasonable in the context of GWAS and PheWAS.

The endpoints are named starting with the first character of the ICD-10 diagnosis code, followed by the number of the ICD-10 chapter and ending with an endpoint name that is:

as short as possible
explanatory enough to enable one to reason what the endpoint is

For example, Parkinson's disease:

6th ICD-10 chapter Diseases of the nervous system
ICD-10 code G20
endpoint nameG6_PARKINSON

In addition to endpoints using names derived from the ICD-classification as explained above, we also created endpoints according to specific requests. For example, specific definitions according to the age of onset, or combinatory endpoints covering entities with subtypes from different chapters of the ICD-system like autoimmune diseases or alcohol-related disorders. The final set of medical speciality-specific endpoint definitions was approved by the FinnGen clinical expert groups of leading experts in their respective medical fields. These clinical groups are formed of Finnish physicians and in some cases include subject experts from pharma as well.

The ICD-10 codes were then matched to the Finnish versions of the ICD-9 and ICD-8, when possible. When the ICD-9 and ICD-8 versions didn’t meet the level of detail of the ICD-10 version, but the level of these older systems was detailed enough, the subtyping of the endpoints was left at this level. When the unmatched more detailed level of the ICD-10 was reasoned, the endpoint was created according to ICD-10 and the endpoint was left unharmonized to ICD-9 and/or ICD-8.

In most cases, the same ICD-codes were used for the hospital and cause-of-death registries, but when differences were well reasoned, differentiating classifications were used for these registries.

In addition to the ICD-codes, and ATC-codes, KELA drug reimbursement codes and operation codes were used in the endpoint definitions. These were used only when the codes were specific to the endpoint. Medications or operations were not used to define the endpoints if they also have other indications.

For cancers, to maximize specificity, we used the cancer registry-specific ICD-0-3 system, and ICD-10 codes were only used in the cause-of-death registry.

In general, we prefer specificity over sensitivity in the endpoint definition parameters, as in GWAS with highly unbalanced case-control ratios with more controls than cases: we do not want to dilute the control groups with lower probability cases.

If needed age or sex-specific “pre-condition” parameters were used in the endpoint definition. Also, “condition”-parameters, excluding participants with or without other specific endpoints were used when appropriate. For example, all type 1 diabetes/ulcerative colitis cases were excluded from the type 2 diabetes/Crohn disease cases and vice versa.

Finally, the treelike form of the endpoint catalog was formed by including all the subtypes of the upper-level endpoint to the INCLUDE column, making cases of the included lower-level subtype endpoints cases in the upper-level endpoint.

Creating the endpoint catalog is an ongoing process, and new endpoints can be added to or removed from the catalog.

Endpoint data file

ENDPOINT_X_AGE; starting from DF12: ENDPOINT_X_FU_AGE
- cases: individual's age at the first event variable for the ENDPOINT
- controls: age at the end of the follow up;
  - age at the time when register follow-up ends in registers
  - age of death if individual has died
  - age of age of emigration if a person has moved abroad
ENDPOINT_X_NEVT is the number of events for the individual for the ENDPOINT X
ENDPOINT_X_EXALLC and ENDPOINT_X_EXMORE. Some endpoints have a stricter delimitation of the controls where controls are removed based on certain criteria. For example, in the_EXALLC endpoints, all cancers have been removed from the controls, and in the _EXMORE endpoints, a stricter control cut-off has been used. Excluded individuals will be “NA” in the data.
FU_END_AGE / AGE_AT_DEATH_OR_END_OF_FOLLOWUP / DEATH_AGE is the individual's age at the end of the follow-up. It is the age of death if the individual has died, the age of emigration if a person has moved abroad, or the age at the time when register follow-up ends in most of the registers.

The endpoint data file contains the following columns:

Column

Description

FINNGENID

FinnGen ID

BL_YEAR

Year of DNA sample collection

BL_AGE

Age at DNA sample collection

FU_END_AGE*

Age at the end of the follow-up

age at the time when register follow-up ends in registers

age of death if individual has died

age of age of emigration if a person has moved abroad

SEX

Gender (male/female/NA)

ENDPOINT_X

Endpoint name

ENDPOINT_X_AGE/ENDPOINT_X_FU_AGE

Cases: individual's age at the first event variable for the ENDPOINT

Controls: age at the end of the follow up;
- age at the time when register follow-up ends in registers
- age of death if individual has died
- age of age of emigration if a person has moved abroad

ENDPOINT_X_YEAR

Year of onset

ENDPOINT_NEVT

Number of events

ENDPOINT_X_EXMORE

Endpoint-specific control definition, only for selected endpoints: More stringent control definition

ENDPOINT_X_EXALLC

Endpoint-specific control definition, only for selected endpoints: All cancer cases have been excluded from controls

*In the DF12 endpoint data: AGE_AT_DEATH_OR_END_OF_FOLLOW_UP and DEATH_FU_AGE columns replaced column FU_END_AGE containing the same information.

Endpoint longitudinal data file

In the endpoint data file, the first recorded events for each endpoint are given. Ergo, the event age is the age at the first event e.g., the first ICD10 code of the endpoint definition.

In the endpoint longitudinal data, all events of the endpoint are given instead, meaning that all event ages and source registries of the individual codes are recorded.

Below are described all columns in the endpoint longitudinal file.

Field

Description

FINNGENID

FinnGen ID

EVENT_TYPE

Source register

EVENT_AGE

Age at the event

EVENT_YEAR

Year at the event

ICDVER

ICD version

ENDPOINT_X

Endpoint name

In the endpoint longitudinal data, you may see the following register abbreviation (equivalent abbreviation in the detailed longitudinal data file shown in parenthesis):