Interpretation of Endpoint Definition file
This document explains how to attribute an endpoint to events in the detailed longitudinal data using the rules from the endpoint definition file (latest version at FinnGen: Clinical Endpoints). Have a look at the list of gotchas at the end of this document for some specificities that are easy to miss at first.
Each endpoint is defined by a set of rules, given as one line in the endpoint definition file. The detailed longitudinal file contains health events (rows in that file) that will be looked up against these rules. Each rule will add or remove events to the list of candidate events. Once all rules have been applied, the remaining candidate events are attributed to the endpoint.
When explaining the rules, the following terms are used:
Endpoint: occurrence of a health event defined by rules that match on the health register data.
Candidate events: list of events that could be attributed to the endpoint. This list grows and shrinks as the endpoint rules are applied.
Consider: add event to the list of candidate events.
Discard: remove event from the list of candidate events.
Overview of the Endpoint Definition File
The endpoint definition file version 1.3 has the following metadata columns:
Column name | Explanation |
---|---|
| naming: Reference name in the FinnGen endpoint data |
| naming: Descriptive name |
| naming: Latin name |
| categorisation: List of categories the endpoint belongs to |
| categorisation: Level in the ICD-10 hierarchy |
| categorisation: Is a core GWAS? (NA: yes, 1 or 2: no) |
| categorisation: Parent in the ICD-10 hierarchy |
| changelog: introduced in data freeze |
| changelog: date of last modification |
| changelog: author of last modification |
| changelog: purpose of modification |
| free text notes |
The rules are defined by the following columns in the endpoint definition file: (Click on a value in "Column name" or "Extra rules", where available, to be directed to further details that follow the table)
Column name | Purpose | Extra rules | ||
| Filter at the FINNGENID level | – | – | – |
| Use other endpoints to find events | – | – | – |
| Filter at the event level | – | – | – |
| Filter at the FINNGENID level | – | – | – |
Inclusion lookup | ICD-10 |
| ||
Inclusion lookup | NOMESCO |
| ||
Diagnosis selection hint | – |
| – | |
Inclusion lookup | ATC |
| ||
Inclusion lookup | ICD-10 |
| ||
Inclusion lookup | ICD-9 |
| ||
Inclusion lookup | ICD-8 |
| ||
Exclusion lookup | ICD-10 |
| ||
Exclusion lookup | ICD-9 |
| ||
Exclusion lookup | ICD-8 |
| ||
Diagnosis selection hint | – |
| – | |
Inclusion lookup | ICD-10 |
| ||
Inclusion lookup | ICD-9 |
| ||
Inclusion lookup | ICD-8 |
| ||
Exclusion lookup | ICD-10 |
| ||
Exclusion lookup | ICD-9 |
| ||
Exclusion lookup | ICD-8 |
| ||
Inclusion lookup | NOMESCO |
| ||
Inclusion lookup | Finnish hospital league |
| ||
Inclusion lookup | Demanding heart patient, old codes |
| ||
Inclusion lookup | Demanding heart patient, new codes |
| ||
Inclusion lookup | KELA reimbursement code |
| ||
Inclusion lookup | ICD-10, ICD-9 |
| ||
Additional requirement hint | – |
| – | |
Inclusion lookup | ATC |
| ||
Additional requirement hint | – |
| – | |
Inclusion lookup | VNRO |
| – | |
Inclusion lookup | ICD-O-3 topography |
| ||
Exclusion lookup | ICD-O-3 topography |
| ||
Inclusion lookup | ICD-O-3 morphology |
| ||
Exclusion lookup | ICD-O-3 morphology |
| ||
Inclusion lookup | ICD-O-3 behavior |
|
Event Rules
OUTPAT_ICD
Consider events where:
SOURCE
: isPRIM_OUT
and
CATEGORY
: containsICD
and
CODE1
: matches theOUTPAT_ICD
regex
OUTPAT_OPER
Consider events where:
SOURCE
: isPRIM_OUT
and
CATEGORY
: starts withOP
and the
OUTPAT_OPER
regex matchesCODE1
HD_MAINONLY
Values
YES
: only look at events withCATEGORY
:0
for the rules ofHD_ICD_10
,HD_ICD_9
,HD_ICD_8
,HD_ICD_10_EXCL
,HD_ICD_9_EXCL
andHD_ICD_8_EXCL
NA
: (nothing to filter)
This rule states to look only into the main diagnosis for hospital discharge events (as opposed to side diagnoses, where CATEGORY
is not 0
).
HD_ICD_10_ATC
Consider events where:
SOURCE
: isINPAT
orOUTPAT
and the
HD_ICD_10_ATC
regex matchesCODE3
This rule must be applied by looking for events that match both this rule and the HD_ICD_10
rule at the same time.
For example, an endpoint definition with HD_ICD_10
= E610
and HD_ICD_10_ATC
= ANY
will match an event that has:
SOURCE
:INPAT
orOUTPAT
and
ICDVER
: 10and
HD_ICD_10
regex matchesCODE1
orCODE2
and any code in
CODE3
(but there must be a code there, it cannot be empty)
HD_ICD_10
Consider events where:
SOURCE
: isINPAT
orOUTPAT
and the
HD_ICD_10
regex matchesCODE1
orCODE2
and
ICDVER
: is 10
HD_ICD_9
Consider events where:
SOURCE
: isINPAT
orOUTPAT
and the
HD_ICD_9
regex matchesCODE1
orCODE2
and
ICDVER
: is 9
HD_ICD_8
Consider events where:
SOURCE
: isINPAT
orOUTPAT
and the
HD_ICD_8
regex matchesCODE1
orCODE2
and
ICDVER
: is 8
HD_ICD_10_EXCL
Discard events where:
SOURCE
: isINPAT
orOUTPAT
and the
HD_ICD_10_EXCL
regex matchesCODE1
orCODE2
and
ICDVER
: is 10
HD_ICD_9_EXCL
Discard events where:
SOURCE
: isINPAT
orOUTPAT
and the
HD_ICD_9_EXCL
regex matchesCODE1
orCODE2
and
ICDVER
: is 9
HD_ICD_8_EXCL
Discard events where:
SOURCE
: isINPAT
orOUTPAT
and the
HD_ICD_8_EXCL
regex matchesCODE1
orCODE2
and
ICDVER
: is 8
COD_MAINONLY
Values
YES
: only look at events withCATEGORY
:U
orI
for the rules ofCOD_ICD_10
,COD_ICD_9
,COD_ICD_8
,COD_ICD_10_EXCL
,COD_ICD_9_EXCL
, andCOD_ICD_8_EXCL
NA
: (nothing to filter)
This rule states to look only into the main diagnosis for cause of death events (CATEGORY
: U
for underlying and I
for immediate cause of death, as opposed to contributing cause of death CATEGORY
: starts with c
).
COD_ICD_10
Consider events where:
SOURCE
: isDEATH
and the
COD_ICD_10
regex matchesCODE1
orCODE2
and the
ICDVER
: is 10
COD_ICD_9
Consider events where:
SOURCE
: isDEATH
and the
COD_ICD_9
regex matchesCODE1
orCODE2
and the
ICDVER
: is 9
COD_ICD_8
Consider events where:
SOURCE
: isDEATH
and the
COD_ICD_8
regex matchesCODE1
orCODE2
and the
ICDVER
: is 8
COD_ICD_10_EXCL
Discard events where:
SOURCE
: isDEATH
and the
COD_ICD_10_EXCL
regex matchesCODE1
orCODE2
and
ICDVER
: is 10
COD_ICD_9_EXCL
Discard events where:
SOURCE
: isDEATH
and the
COD_ICD_9_EXCL
regex matchesCODE1
orCODE2
and
ICDVER
: is 9
COD_ICD_8_EXCL
Discard events where:
SOURCE
: isDEATH
and the
COD_ICD_8_EXCL
regex matchesCODE1
orCODE2
and
ICDVER
: is 8
OPER_NOM
Consider events where:
SOURCE
: isOPER_IN
orOPER_OUT
and the
OPER_NOM
regex matchesCODE1
and
CATEGORY
: containsNOM
OPER_HL
Consider events where:
SOURCE
: isOPER_IN
orOPER_OUT
and the
OPER_HL
regex matchesCODE1
and
CATEGORY
: containsFHL
OPER_HP1
Consider events where:
SOURCE
: isOPER_IN
orOPER_OUT
and the
OPER_HP1
regex matchesCODE1
and
CATEGORY
: containsHPO
OPER_HP2
Consider events where:
SOURCE
: isOPER_IN
orOPER_OUT
and the
OPER_HP1
regex matchesCODE1
and
CATEGORY
: containsHPN
KELA_REIMB
Consider events where:
SOURCE
: isREIMB
and
KELA_REIMB
regex matchesCODE1
KELA_REIMB_ICD
Consider events where:
SOURCE
: isREIMB
and
KELA_REIMB_ICD
regex matchesCODE2
This rule must be applied by looking for events that match both this rule and the KELA_REIMB
rule at the same time.
KELA_ATC_NEEDOTHER
Values
NA
: 3 events or more of theKELA_ATC
rule are needed to attribute the endpointSINGLE_OK
: 1 event or more ofKELA_ATC
rule are needed to attribute the endpointYES
: theKELA_ATC
rule is not sufficient by itself, another rule must be matching to attribute the endpoint
This rule sets additional requirements on the KELA_ATC
rule.
KELA_ATC
Consider events where:
SOURCE
: isPURCH
and
KELA_ATC
regex matchesCODE1
KELA_VNRO
This rule is not used.
KELA_VNRO_NEEDOTHER
This rule is not used.
CANC_TOPO
Consider events where:
SOURCE
: isCANC
and the
CANC_TOPO
regex matchesCODE1
CANC_TOPO_EXCL
Discard events where:
SOURCE
: isCANC
and the
CANC_TOPO_EXCL
regex matchesCODE1
CANC_MORPH
Consider events where:
SOURCE
: isCANC
and the
CANC_MORPH
regex matchesCODE2
CANC_MORPH_EXCL
Discard events where:
SOURCE
: isCANC
and the
CANC_MORPH_EXCL
regex matchesCODE2
CANC_BEHAV
Consider events where:
SOURCE
: isCANC
and the
CANC_TOPO
regex matchesCODE3
INCLUDE
Value
other endpoint names, separated by
|
Attribute the current endpoint to an individual if it has at least one of the endpoints in INCLUDE
.
PRE_CONDITIONS
Value
condition on
EVENT_AGE
orEVENT_YEAR
EMERG
: (unused, nothing to do)NA
: (nothing to do)
Discard events not matching PRE_CONDITIONS
from the list of candidate events.
This rule usually applies a filter on age or year at the event. It filters out some events from the existing list of candidate events.
CONDITIONS
An individual must fit the CONDITIONS
rule to be attributed the endpoint.
SEX
Values
1
: only keep males2
: only keep femalesNA
: (nothing to filter, the endpoint is not sex-specific)
This filter should be applied as the last filter.
Extra rules
any-code
When the rule is written as ANY
, then the event must have a code for the given rule, but the actual code has no importance.
This rule is useful when matching an event against multiple rules, for example:
HD_ICD_10
:K250
and
HD_ICD_10_ATC
:ANY
This example requires that an event has any ATC code and at the same time has the ICD-10 code K250
. The endpoint will match drug-induced events since it requires there is an ATC code, but the actual ATC code doesn't matter.
match-prefix
The rule must match starting from the beginning of its value, in regex terms it means the rule value has to be prepended with a ^
. This modified rule is then used as a regex.
For example, a match-prefix rule with a value of I21
matches I2100
but doesn't match AEI21
.
cause-symptom
An ampersand &
between two codes indicates a cause-symptom pair (specific to Finnish ICD-10). In that case, both the cause code and the symptom code must be found in the same event.
For example, HD_ICD_10
= M07&L405
will match an event that has both M07
(in CODE1
or CODE2
) and L405
(in CODE1
or CODE2
).
mode
A rule value starting with a percent sign %
indicates a mode rule. The event will be considered only if the code is the most common amongst its sibling ICD codes for an individual.
For example %J450
would match events of an individual only if J450
is the most common code among the codes starting with J45
.
canc-all
When an endpoint has multiple cancer rules (from CANC_TOPO
, CANC_TOPO_EXCL
, CANC_MORPH
, CANC_MORPH_EXCL
, CANC_BEHAV
) then it is not enough to match only one of them: all cancer rules that are defined must be satisfied by the event.
mark-no-code
The mark $!$
is used to state that someone has checked and there is no suitable code for this endpoint in a given registry.
For example, if an endpoint has HD_ICD_9
with a value of $!$
then it means someone has gone through the whole Finnish ICD-9 and reported that there is no code that can be from that.
Gotchas
One single event can span multiple rows in the detailed longitudinal data files: events are unique by (
FINNGENID
,SOURCE
,INDEX
), but not by row. Rows with the same values forFINNGENID
,SOURCE
,INDEX
must be looked at as one single event when performing look-ups.The ICD-10, ICD-9 and ICD-8 used by FinnGen are specific Finnish versions which differ slightly from the international ones. This means for example that the ICD-10 found in FinnGen data are a bit different from the WHO ICD-10 or the US ICD-10-CM.
In the FinnGen data, the ICD-O-3 is used for cancer codes.
The dot
.
and the comma,
are not present in the codes in the FinnGen files, e.g.J45.1
would beJ451
in the endpoint definition file and the detailed longitudinal file.For rules that are regexes: a dot
.
means "any character" and not an actual dot.Endpoints with specific control rules are not documented here (yet!)
Appendix: list of registries
Name in FinnGen data ( | Registry description |
| Cancer |
| Cause of death |
| HILMO inpatient |
| HILMO inpatient (operations) |
| HILMO specialist outpatient |
| HILMO specialist outpatient (operations) |
| AvoHILMO: primary care outpatient |
| Kela drug purchase |
| Kela drug reimbursement |
Appendix: coding systems and translations
Where to find the translation file for phenotype data, documentation from the FinnGen Handbook
Glossary
Kela: the Social Insurance Institution of Finland
HILMO: Finnish care registers for health care
Last updated