VNR code mapping to RxNorm
Details of VNR codes in finngen R6 mapped to standard RxNorm
The VNR codes are Nordic country-specific codes known as the Nordic Article Number. The VNR codes are 6-digit codes ranging from 000001-199999 and 370000-599999. They are assigned to all human medicines, veterinary medicines, herbal medicines, and traditional herbal medicines. Numbers outside this range are called National Article Numbers which are used differently depending on the country.
RxNorm terminology on the other hand, is US specific terminology and provides normalized names for medications allows linking to many drug vocabularies commonly used in the US market.
We have mapped the Nordic country-specific VNR codes to RxNorm in FinnGen R6. Although the initial mapping was performed in R6 and is located in library-green in finngen_R6 folder the mapping can be used with any Data Freeze/Release.
The mapping and readme are located:
/finngen/library-green/finngen_R6/finngen_R6_medical_codes/fgVNR.tsv
/finngen/library-green/finngen_R6/finngen_R6_medical_codes/fgVNR_readme.txt
How the VNR code to RxNorm mapping was done:
VNR codes originating from different sources within FinnGen were combined into single table called 'OriginalVNR'
Additional information on VNR codes with missing drug name, strength, ingredient information were requested from Pharmaceutical Information Centre (Lääketietokeskus). This information is stored in the table 'ltklVNR'
Missing ingredient information of VNR codes was filled using ATC codes.
Administration routes, dosage forms and units were created for codes in both 'OriginalVNR' + 'ltklVNR' tables.
We processed source text format of Package, Substance, Substance strength, Administration Route and Dosage Form for both 'OriginalVNR' + 'ltklVNR' tables.
We used OHDSI drugmapping tool to map the parsed VNR code information to map to standard RxNorm
More details till step 5 from can be found in github repository.
Description of the columns of fgVNR.tsv are shown in below table.
VNR
INT64
six-digit VNR code.
518
ATC
STRING
ATC group code
N05AH04
MedicineName
STRING
Commerical Name
SEROQUEL
AdministrationRouteSourceTextFI
STRING
Administration route in text format as in source.
Suun kautta
AdministrationRoute
STRING
Valid value for administration route.
Oral use
DosageFormSourceTextFI
STRING
Dosage form in text format as in source.
tabletti, kalvopäällysteinen
DosageForm
STRING
Valid value for dosage form.
film-coated tablet
PackageSourceTextFI
STRING
Package info in text format as in source.
10 FOL
PackageSize
FLOAT64
Size of package in float format.
10
PackageFactor
INT64
Factor of package in float format.
1
PackageUnit
STRING
A valid unit value.
fol
SubstanceSourceTextFI
STRING
List of substances as in source.
quetiapine
Substance
STRING
Substance name. one row per substance.
quetiapine
SubstanceStrengthTextFI
STRING
Substance's strength in text format as in source.
25+100+200 mg
Strength
STRING
Mapped or fixed or split substance strength. If not then source strength used.
100 mg
SubstanceStrengthNumenatorValue
FLOAT64
Substance's strength value in numerator in float format.
100
SubstanceStrengthNumenatorUnit
STRING
A valid unit value.
mg
SubstanceStrengthDeominatorValue
FLOAT64
Substance's strength value in denominator in float format.
1
SubstanceStrengthDeominatorUnit
STRING
A valid unit value
1
ValidRange
BOOL
True if VNR is en the valid range (less than 200000 or between 370000 and 599999)).
TRUE
Source
STRING
From which table the code was taken.
ltklVNR or "originalVNR"
Status
STRING
How well the medicine has been processed
incomplete_dosageForm
VNRnew
STRING
A temporary VNR code created for drugs with single substance multiple strength values. Temporary VNR code will have letters a or b or c attached to the end.
000518a
calculateTotalStrength_message
STRING
How well the strength has been processed.
correct or "missmatch"
TotalStrength
FLOAT64
Total Strength of the drug which is PackageSize * PackageFactor * Dosage
10 * 1 * 100 = 1000
TotalStrengthUnit
STRING
Total Strength valid unit
mg
n_codes
INT64
Frequency of the VNR code
260
Dosage
FLOAT64
SubstanceStrengthNumenatorValue/SubstanceStrengthDeominatorValue
100/1 = 100
DosageUnit
STRING
A valid unit value
mg
MedicineNameFull
STRING
Commerical Name, Dosage Form and SubstanceStrength
SEROQUEL 25+100+200 mg
The fgVNR.tsv file was used as the input for OHDSI drugmapping tool. The tool requires VNR code with substance information followed by dosage form and drug strength. If no substance information is present then there will be no mapping.
DrugMapping tool requires a Common Data Model (CDM) database with vocabulary data. To create the CMD, we:
Extracted CDM database schema from OHDSI common data model for Version 5.3.2 of CDM.
Created the CDM database schema in a PostgreSQL server Version 14.2
Changes were made in the CDM V5.3.2 SQL files generated from OHDSI common data model due to PostgreSQL server Version is > 9.
Downloaded the Vocabulary data of Default vocab list + Addition vocabularies for "Dosage Form" from Athena.
Uploaded the Vocabulary data from Athena to the PostgreSQL CDM v5.3.2 database
Once the CDM database was up and running from PostgreSQL, we started setting up the DrugMapping Tool
Information regarding the possible Clinical drug mapping possible along with total number of input drugs
Input
Value
Total Drugs
15,928
Drugs with non-missing VNR Codes
15,902
Unique VNR codes
14,655
Drugs with non-missing VNR codes + Ingredient Codes
13,876
Drugs with non-missing VNR codes + Ingredient Codes + dosage form
12,897
Drugs with non-missing VNR codes + Ingredient Codes + dosage form + dosage value
12,644
Drugs with non-missing VNR codes + Ingredient Codes + dosage form + dosage + dosage unit
12,638
After fixing the Input file for DrugMapping Tool, it created three intermediary files
Ingredient Name Translation File
Unit Mapping File
Dose Form Mapping File
All the three intermediary files need to be filled carefully
Ingredient Name Translation File - Simplest to fill using the input file
Unit Mapping File
Source units were to be mapped to standard units such as 'mg' and 'mL'. Example
SourceUnitDrugCountRecordCountFactorTargetUnitComment%
303
1109084
0.01
mg/mg
IU
309
400416
1
[U]
U
25
360
1
[U]
g
160
1614186
1000
mg
g/l
8
1112
1
mg/mL
mg
20905
68649900
1
mg
mg/days
111
54182
0,289
mg/h
mg/h
5
455
1
mg/h
milli.IU
73
478234
1000000
[U]
ml
5
4885
1
mL
ug/puffs
6
212536
0.001
mg/{actuat}
Dose Form Mapping File
First thing is to extract all the dose form in domain "Drug" with concept_class "Dose Form" from all the vocabularies in the CDM database.
Second thing is to extract "relationship_id" of "Source - RxNorm eq" from CONCEPT_RELATIONSHIP table for all non-standard "Dose Form" from "additional vocabularies".
Match the cells in "DoseFrom" to the extracted standard dose forms which was only 49 out of 147 dose forms.
Manually filled out 85 dose forms with only 13 dose forms missing having low frequency. Example of filled dose form file can be seen below
DoseFormDrugCountPriorityconcept_idconcept_nameCommentsBASIC CREAM
11
19082224
Topical Cream
BATH ADDITIVE
1
19082228
Topical Solution
BODY LOTION
1
CAPSULE
279
0
19082168
Oral Capsule
Standard
CAPSULE
279
1
19021887
Capsule
Non-Standard
CAPSULE, HARD
664
19082168
Oral Capsule
The result of DrugMapping Tool after carefully filling out all three files can be shown below
Percentage of possible drugs mapped is 12,089 of 12,638 (95.6%)
Source drugs mapped to Clinical Drug
12089 of 14692 (82.283%)
Source drugs mapped to Clinical Drug Form
562 of 14692 (3.825%)
Source drugs mapped to Clinical Drug Comp
354 of 14692 (2.409%)
Source drugs mapped to Ingredient
588 of 14692 (4.002%)
Source drugs mapped Splitted
74 of 14692 (0.504%)
Source drugs mapped Splitted Incomplete
3 of 14692 (0.02%)
Source drugs mapped Total
13670 of 14692 (93.044%)
Source drugs mapped to None
1022 of 14692 (6.956%)
Last updated