Interpretation of Endpoint Definition file

This document explains how to attribute an endpoint to events in the detailed longitudinal data using the rules from the endpoint definition file (latest version at FinnGen: Clinical Endpoints). Have a look at the list of gotchas at the end of this document for some specificities that are easy to miss at first.

Each endpoint is defined by a set of rules, given as one line in the endpoint definition file. The detailed longitudinal file contains health events (rows in that file) that will be looked up against these rules. Each rule will add or remove events to the list of candidate events. Once all rules have been applied, the remaining candidate events are attributed to the endpoint.

When explaining the rules, the following terms are used:

  • Endpoint: occurrence of a health event defined by rules that match on the health register data.

  • Candidate events: list of events that could be attributed to the endpoint. This list grows and shrinks as the endpoint rules are applied.

  • Consider: add event to the list of candidate events.

  • Discard: remove event from the list of candidate events.

Overview of the Endpoint Definition File

The endpoint definition file version 1.3 has the following metadata columns:

Column nameExplanation

NAME

naming: Reference name in the FinnGen endpoint data

LONGNAME

naming: Descriptive name

Latin

naming: Latin name

TAGS

categorisation: List of categories the endpoint belongs to

LEVEL

categorisation: Level in the ICD-10 hierarchy

OMIT

categorisation: Is a core GWAS? (NA: yes, 1 or 2: no)

PARENT

categorisation: Parent in the ICD-10 hierarchy

version

changelog: introduced in data freeze

Modification_date

changelog: date of last modification

Modified_by

changelog: author of last modification

Modification_reason

changelog: purpose of modification

Special

free text notes

The rules are defined by the following columns in the endpoint definition file: (Click on a value in "Column name" or "Extra rules", where available, to be directed to further details that follow the table)

Column name

Purpose

Extra rules

SEX

Filter at the FINNGENID level

INCLUDE

Use other endpoints to find events

PRE_CONDITIONS

Filter at the event level

CONDITIONS

Filter at the FINNGENID level

Inclusion lookup

ICD-10

PRIM_OUT

Inclusion lookup

NOMESCO

PRIM_OUT

Diagnosis selection hint

INPAT, OUTPAT

Inclusion lookup

ATC

INPAT, OUTPAT

Inclusion lookup

ICD-10

INPAT, OUTPAT

Inclusion lookup

ICD-9

INPAT, OUTPAT

Inclusion lookup

ICD-8

INPAT, OUTPAT

Exclusion lookup

ICD-10

INPAT, OUTPAT

Exclusion lookup

ICD-9

INPAT, OUTPAT

Exclusion lookup

ICD-8

INPAT, OUTPAT

Diagnosis selection hint

DEATH

Inclusion lookup

ICD-10

DEATH

Inclusion lookup

ICD-9

DEATH

Inclusion lookup

ICD-8

DEATH

Exclusion lookup

ICD-10

DEATH

Exclusion lookup

ICD-9

DEATH

Exclusion lookup

ICD-8

DEATH

Inclusion lookup

NOMESCO

OPER_IN, OPER_OUT

Inclusion lookup

Finnish hospital league

OPER_IN, OPER_OUT

Inclusion lookup

Demanding heart patient, old codes

OPER_IN, OPER_OUT

Inclusion lookup

Demanding heart patient, new codes

OPER_IN, OPER_OUT

Inclusion lookup

KELA reimbursement code

REIMB

Inclusion lookup

ICD-10, ICD-9

REIMB

Additional requirement hint

PURCH

Inclusion lookup

ATC

PURCH

Additional requirement hint

PURCH

Inclusion lookup

VNRO

PURCH

Inclusion lookup

ICD-O-3 topography

CANC

Exclusion lookup

ICD-O-3 topography

CANC

Inclusion lookup

ICD-O-3 morphology

CANC

Exclusion lookup

ICD-O-3 morphology

CANC

Inclusion lookup

ICD-O-3 behavior

CANC

Event Rules

OUTPAT_ICD

Consider events where:

  • SOURCE: is PRIM_OUT

  • and CATEGORY: contains ICD

  • and CODE1: matches the OUTPAT_ICD regex

OUTPAT_OPER

Consider events where:

  • SOURCE: is PRIM_OUT

  • and CATEGORY: starts with OP

  • and the OUTPAT_OPER regex matches CODE1

HD_MAINONLY

Values

  • YES: only look at events with CATEGORY: 0 for the rules of HD_ICD_10, HD_ICD_9, HD_ICD_8, HD_ICD_10_EXCL, HD_ICD_9_EXCL and HD_ICD_8_EXCL

  • NA: (nothing to filter)

This rule states to look only into the main diagnosis for hospital discharge events (as opposed to side diagnoses, where CATEGORY is not 0).

HD_ICD_10_ATC

Consider events where:

  • SOURCE: is INPAT or OUTPAT

  • and the HD_ICD_10_ATC regex matches CODE3

This rule must be applied by looking for events that match both this rule and the HD_ICD_10 rule at the same time.

For example, an endpoint definition with HD_ICD_10 = E610 and HD_ICD_10_ATC = ANY will match an event that has:

  • SOURCE: INPAT or OUTPAT

  • and ICDVER: 10

  • and HD_ICD_10 regex matches CODE1 or CODE2

  • and any code in CODE3 (but there must be a code there, it cannot be empty)

HD_ICD_10

Consider events where:

  • SOURCE: is INPAT or OUTPAT

  • and the HD_ICD_10 regex matches CODE1 or CODE2

  • and ICDVER: is 10

HD_ICD_9

Consider events where:

  • SOURCE: is INPAT or OUTPAT

  • and the HD_ICD_9 regex matches CODE1 or CODE2

  • and ICDVER: is 9

HD_ICD_8

Consider events where:

  • SOURCE: is INPAT or OUTPAT

  • and the HD_ICD_8 regex matches CODE1 or CODE2

  • and ICDVER: is 8

HD_ICD_10_EXCL

Discard events where:

  • SOURCE: is INPAT or OUTPAT

  • and the HD_ICD_10_EXCL regex matches CODE1 or CODE2

  • and ICDVER: is 10

HD_ICD_9_EXCL

Discard events where:

  • SOURCE: is INPAT or OUTPAT

  • and the HD_ICD_9_EXCL regex matches CODE1 or CODE2

  • and ICDVER: is 9

HD_ICD_8_EXCL

Discard events where:

  • SOURCE: is INPAT or OUTPAT

  • and the HD_ICD_8_EXCL regex matches CODE1 or CODE2

  • and ICDVER: is 8

COD_MAINONLY

Values

  • YES: only look at events with CATEGORY: U or I for the rules of COD_ICD_10, COD_ICD_9, COD_ICD_8, COD_ICD_10_EXCL, COD_ICD_9_EXCL, and COD_ICD_8_EXCL

  • NA: (nothing to filter)

This rule states to look only into the main diagnosis for cause of death events (CATEGORY: U for underlying and I for immediate cause of death, as opposed to contributing cause of death CATEGORY: starts with c).

COD_ICD_10

Consider events where:

  • SOURCE: is DEATH

  • and the COD_ICD_10 regex matches CODE1 or CODE2

  • and the ICDVER: is 10

COD_ICD_9

Consider events where:

  • SOURCE: is DEATH

  • and the COD_ICD_9 regex matches CODE1 or CODE2

  • and the ICDVER: is 9

COD_ICD_8

Consider events where:

  • SOURCE: is DEATH

  • and the COD_ICD_8 regex matches CODE1 or CODE2

  • and the ICDVER: is 8

COD_ICD_10_EXCL

Discard events where:

  • SOURCE: is DEATH

  • and the COD_ICD_10_EXCL regex matches CODE1 or CODE2

  • and ICDVER: is 10

COD_ICD_9_EXCL

Discard events where:

  • SOURCE: is DEATH

  • and the COD_ICD_9_EXCL regex matches CODE1 or CODE2

  • and ICDVER: is 9

COD_ICD_8_EXCL

Discard events where:

  • SOURCE: is DEATH

  • and the COD_ICD_8_EXCL regex matches CODE1 or CODE2

  • and ICDVER: is 8

OPER_NOM

Consider events where:

  • SOURCE: is OPER_IN or OPER_OUT

  • and the OPER_NOM regex matches CODE1

  • and CATEGORY: contains NOM

OPER_HL

Consider events where:

  • SOURCE: is OPER_IN or OPER_OUT

  • and the OPER_HL regex matches CODE1

  • and CATEGORY: contains FHL

OPER_HP1

Consider events where:

  • SOURCE: is OPER_IN or OPER_OUT

  • and the OPER_HP1 regex matches CODE1

  • and CATEGORY: contains HPO

OPER_HP2

Consider events where:

  • SOURCE: is OPER_IN or OPER_OUT

  • and the OPER_HP1 regex matches CODE1

  • and CATEGORY: contains HPN

KELA_REIMB

Consider events where:

  • SOURCE: is REIMB

  • and KELA_REIMB regex matches CODE1

KELA_REIMB_ICD

Consider events where:

  • SOURCE: is REIMB

  • and KELA_REIMB_ICD regex matches CODE2

This rule must be applied by looking for events that match both this rule and the KELA_REIMB rule at the same time.

KELA_ATC_NEEDOTHER

Values

  • NA: 3 events or more of the KELA_ATC rule are needed to attribute the endpoint

  • SINGLE_OK: 1 event or more of KELA_ATC rule are needed to attribute the endpoint

  • YES: the KELA_ATC rule is not sufficient by itself, another rule must be matching to attribute the endpoint

This rule sets additional requirements on the KELA_ATC rule.

KELA_ATC

Consider events where:

  • SOURCE: is PURCH

  • and KELA_ATC regex matches CODE1

KELA_VNRO

This rule is not used.

KELA_VNRO_NEEDOTHER

This rule is not used.

CANC_TOPO

Consider events where:

  • SOURCE: is CANC

  • and the CANC_TOPO regex matches CODE1

CANC_TOPO_EXCL

Discard events where:

  • SOURCE: is CANC

  • and the CANC_TOPO_EXCL regex matches CODE1

CANC_MORPH

Consider events where:

  • SOURCE: is CANC

  • and the CANC_MORPH regex matches CODE2

CANC_MORPH_EXCL

Discard events where:

  • SOURCE: is CANC

  • and the CANC_MORPH_EXCL regex matches CODE2

CANC_BEHAV

Consider events where:

  • SOURCE: is CANC

  • and the CANC_TOPO regex matches CODE3

INCLUDE

Value

  • other endpoint names, separated by |

Attribute the current endpoint to an individual if it has at least one of the endpoints in INCLUDE.

PRE_CONDITIONS

Value

  • condition on EVENT_AGE or EVENT_YEAR

  • EMERG: (unused, nothing to do)

  • NA: (nothing to do)

Discard events not matching PRE_CONDITIONS from the list of candidate events.

This rule usually applies a filter on age or year at the event. It filters out some events from the existing list of candidate events.

CONDITIONS

An individual must fit the CONDITIONS rule to be attributed the endpoint.

SEX

Values

  • 1: only keep males

  • 2: only keep females

  • NA: (nothing to filter, the endpoint is not sex-specific)

This filter should be applied as the last filter.

Extra rules

any-code

When the rule is written as ANY, then the event must have a code for the given rule, but the actual code has no importance.

This rule is useful when matching an event against multiple rules, for example:

  • HD_ICD_10: K250

  • and HD_ICD_10_ATC: ANY

This example requires that an event has any ATC code and at the same time has the ICD-10 code K250. The endpoint will match drug-induced events since it requires there is an ATC code, but the actual ATC code doesn't matter.

match-prefix

The rule must match starting from the beginning of its value, in regex terms it means the rule value has to be prepended with a ^. This modified rule is then used as a regex.

For example, a match-prefix rule with a value of I21 matches I2100 but doesn't match AEI21.

cause-symptom

An ampersand & between two codes indicates a cause-symptom pair (specific to Finnish ICD-10). In that case, both the cause code and the symptom code must be found in the same event.

For example, HD_ICD_10 = M07&L405 will match an event that has both M07 (in CODE1 or CODE2) and L405 (in CODE1 or CODE2).

mode

A rule value starting with a percent sign % indicates a mode rule. The event will be considered only if the code is the most common amongst its sibling ICD codes for an individual.

For example %J450 would match events of an individual only if J450 is the most common code among the codes starting with J45.

canc-all

When an endpoint has multiple cancer rules (from CANC_TOPO, CANC_TOPO_EXCL, CANC_MORPH, CANC_MORPH_EXCL, CANC_BEHAV) then it is not enough to match only one of them: all cancer rules that are defined must be satisfied by the event.

mark-no-code

The mark $!$ is used to state that someone has checked and there is no suitable code for this endpoint in a given registry.

For example, if an endpoint has HD_ICD_9 with a value of $!$ then it means someone has gone through the whole Finnish ICD-9 and reported that there is no code that can be from that.

Gotchas

  • One single event can span multiple rows in the detailed longitudinal data files: events are unique by (FINNGENID, SOURCE, INDEX), but not by row. Rows with the same values for FINNGENID, SOURCE, INDEX must be looked at as one single event when performing look-ups.

  • The ICD-10, ICD-9 and ICD-8 used by FinnGen are specific Finnish versions which differ slightly from the international ones. This means for example that the ICD-10 found in FinnGen data are a bit different from the WHO ICD-10 or the US ICD-10-CM.

  • In the FinnGen data, the ICD-O-3 is used for cancer codes.

  • The dot . and the comma , are not present in the codes in the FinnGen files, e.g. J45.1 would be J451 in the endpoint definition file and the detailed longitudinal file.

  • For rules that are regexes: a dot . means "any character" and not an actual dot.

  • Endpoints with specific control rules are not documented here (yet!)

Appendix: list of registries

Name in FinnGen data (SOURCE)

Registry description

CANC

Cancer

DEATH

Cause of death

INPAT

HILMO inpatient

OPER_IN

HILMO inpatient (operations)

OUTPAT

HILMO specialist outpatient

OPER_OUT

HILMO specialist outpatient (operations)

PRIM_OUT

AvoHILMO: primary care outpatient

PURCH

Kela drug purchase

REIMB

Kela drug reimbursement

Appendix: coding systems and translations

Glossary

  • Kela: the Social Insurance Institution of Finland

  • HILMO: Finnish care registers for health care

Last updated