VNR code mapping to RxNorm
Details of VNR codes in finngen R6 mapped to standard RxNorm
The VNR codes are Nordic country-specific codes known as the Nordic Article Number. The VNR codes are 6-digit codes ranging from 000001-199999 and 370000-599999. They are assigned to all human medicines, veterinary medicines, herbal medicines, and traditional herbal medicines. Numbers outside this range are called National Article Numbers which are used differently depending on the country.
RxNorm terminology on the other hand, is US specific terminology and provides normalized names for medications allows linking to many drug vocabularies commonly used in the US market.
We have mapped the Nordic country-specific VNR codes to RxNorm in FinnGen R6. Although the initial mapping was performed in R6 and is located in library-green in finngen_R6 folder the mapping can be used with any Data Freeze/Release.
The mapping and readme are located:
/finngen/library-green/finngen_R6/finngen_R6_medical_codes/fgVNR.tsv
/finngen/library-green/finngen_R6/finngen_R6_medical_codes/fgVNR_readme.txt
How the VNR code to RxNorm mapping was done:
VNR codes originating from different sources within FinnGen were combined into single table called 'OriginalVNR'
Additional information on VNR codes with missing drug name, strength, ingredient information were requested from Pharmaceutical Information Centre (Lääketietokeskus). This information is stored in the table 'ltklVNR'
Missing ingredient information of VNR codes was filled using ATC codes.
Administration routes, dosage forms and units were created for codes in both 'OriginalVNR' + 'ltklVNR' tables.
We processed source text format of Package, Substance, Substance strength, Administration Route and Dosage Form for both 'OriginalVNR' + 'ltklVNR' tables.
We used OHDSI drugmapping tool to map the parsed VNR code information to map to standard RxNorm
More details till step 5 from can be found in github repository.
Description of the columns of fgVNR.tsv are shown in below table.
Column Name | Column Type | Description | Example |
---|---|---|---|
VNR | INT64 | six-digit VNR code. | 518 |
ATC | STRING | ATC group code | N05AH04 |
MedicineName | STRING | Commerical Name | SEROQUEL |
AdministrationRouteSourceTextFI | STRING | Administration route in text format as in source. | Suun kautta |
AdministrationRoute | STRING | Valid value for administration route. | Oral use |
DosageFormSourceTextFI | STRING | Dosage form in text format as in source. | tabletti, kalvopäällysteinen |
DosageForm | STRING | Valid value for dosage form. | film-coated tablet |
PackageSourceTextFI | STRING | Package info in text format as in source. | 10 FOL |
PackageSize | FLOAT64 | Size of package in float format. | 10 |
PackageFactor | INT64 | Factor of package in float format. | 1 |
PackageUnit | STRING | A valid unit value. | fol |
SubstanceSourceTextFI | STRING | List of substances as in source. | quetiapine |
Substance | STRING | Substance name. one row per substance. | quetiapine |
SubstanceStrengthTextFI | STRING | Substance's strength in text format as in source. | 25+100+200 mg |
Strength | STRING | Mapped or fixed or split substance strength. If not then source strength used. | 100 mg |
SubstanceStrengthNumenatorValue | FLOAT64 | Substance's strength value in numerator in float format. | 100 |
SubstanceStrengthNumenatorUnit | STRING | A valid unit value. | mg |
SubstanceStrengthDeominatorValue | FLOAT64 | Substance's strength value in denominator in float format. | 1 |
SubstanceStrengthDeominatorUnit | STRING | A valid unit value | 1 |
ValidRange | BOOL | True if VNR is en the valid range (less than 200000 or between 370000 and 599999)). | TRUE |
Source | STRING | From which table the code was taken. | ltklVNR or "originalVNR" |
Status | STRING | How well the medicine has been processed | incomplete_dosageForm |
VNRnew | STRING | A temporary VNR code created for drugs with single substance multiple strength values. Temporary VNR code will have letters a or b or c attached to the end. | 000518a |
calculateTotalStrength_message | STRING | How well the strength has been processed. | correct or "missmatch" |
TotalStrength | FLOAT64 | Total Strength of the drug which is PackageSize * PackageFactor * Dosage | 10 * 1 * 100 = 1000 |
TotalStrengthUnit | STRING | Total Strength valid unit | mg |
n_codes | INT64 | Frequency of the VNR code | 260 |
Dosage | FLOAT64 | SubstanceStrengthNumenatorValue/SubstanceStrengthDeominatorValue | 100/1 = 100 |
DosageUnit | STRING | A valid unit value | mg |
MedicineNameFull | STRING | Commerical Name, Dosage Form and SubstanceStrength | SEROQUEL 25+100+200 mg |
The fgVNR.tsv file was used as the input for OHDSI drugmapping tool. The tool requires VNR code with substance information followed by dosage form and drug strength. If no substance information is present then there will be no mapping.
DrugMapping tool requires a Common Data Model (CDM) database with vocabulary data. To create the CMD, we:
Extracted CDM database schema from OHDSI common data model for Version 5.3.2 of CDM.
Created the CDM database schema in a PostgreSQL server Version 14.2
Changes were made in the CDM V5.3.2 SQL files generated from OHDSI common data model due to PostgreSQL server Version is > 9.
Downloaded the Vocabulary data of Default vocab list + Addition vocabularies for "Dosage Form" from Athena.
Uploaded the Vocabulary data from Athena to the PostgreSQL CDM v5.3.2 database
Once the CDM database was up and running from PostgreSQL, we started setting up the DrugMapping Tool
Information regarding the possible Clinical drug mapping possible along with total number of input drugs
Input
Value
Total Drugs
15,928
Drugs with non-missing VNR Codes
15,902
Unique VNR codes
14,655
Drugs with non-missing VNR codes + Ingredient Codes
13,876
Drugs with non-missing VNR codes + Ingredient Codes + dosage form
12,897
Drugs with non-missing VNR codes + Ingredient Codes + dosage form + dosage value
12,644
Drugs with non-missing VNR codes + Ingredient Codes + dosage form + dosage + dosage unit
12,638
After fixing the Input file for DrugMapping Tool, it created three intermediary files
Ingredient Name Translation File
Unit Mapping File
Dose Form Mapping File
All the three intermediary files need to be filled carefully
Ingredient Name Translation File - Simplest to fill using the input file
Unit Mapping File
Source units were to be mapped to standard units such as 'mg' and 'mL'. Example
SourceUnit DrugCount RecordCount Factor TargetUnit Comment %
303
1109084
0.01
mg/mg
IU
309
400416
1
[U]
U
25
360
1
[U]
g
160
1614186
1000
mg
g/l
8
1112
1
mg/mL
mg
20905
68649900
1
mg
mg/days
111
54182
0,289
mg/h
mg/h
5
455
1
mg/h
milli.IU
73
478234
1000000
[U]
ml
5
4885
1
mL
ug/puffs
6
212536
0.001
mg/{actuat}
Dose Form Mapping File
First thing is to extract all the dose form in domain "Drug" with concept_class "Dose Form" from all the vocabularies in the CDM database.
Second thing is to extract "relationship_id" of "Source - RxNorm eq" from CONCEPT_RELATIONSHIP table for all non-standard "Dose Form" from "additional vocabularies".
Match the cells in "DoseFrom" to the extracted standard dose forms which was only 49 out of 147 dose forms.
Manually filled out 85 dose forms with only 13 dose forms missing having low frequency. Example of filled dose form file can be seen below
DoseForm DrugCount Priority concept_id concept_name Comments BASIC CREAM
11
19082224
Topical Cream
BATH ADDITIVE
1
19082228
Topical Solution
BODY LOTION
1
CAPSULE
279
0
19082168
Oral Capsule
Standard
CAPSULE
279
1
19021887
Capsule
Non-Standard
CAPSULE, HARD
664
19082168
Oral Capsule
The result of DrugMapping Tool after carefully filling out all three files can be shown below
Percentage of possible drugs mapped is 12,089 of 12,638 (95.6%)
Source drugs mapped to Clinical Drug
12089 of 14692 (82.283%)
Source drugs mapped to Clinical Drug Form
562 of 14692 (3.825%)
Source drugs mapped to Clinical Drug Comp
354 of 14692 (2.409%)
Source drugs mapped to Ingredient
588 of 14692 (4.002%)
Source drugs mapped Splitted
74 of 14692 (0.504%)
Source drugs mapped Splitted Incomplete
3 of 14692 (0.02%)
Source drugs mapped Total
13670 of 14692 (93.044%)
Source drugs mapped to None
1022 of 14692 (6.956%)
Last updated