How-to guides
How to use Athena to find a test by name?
Go to OHDSI Athena using this link: Athena with lab test filters.
Make sure to have lab test filters enabled:
Domain: Measurement
Concept: Standard
Class: Lab test
Vocabulary: LOINC
Use the search box at the top the page to look up a test name, for example: "testosterone"
Find the most relevant OMOP Concept IDs from the search results. The OMOP Concept IDs are in the left-most column
ID
You can now lookup these OMOP Concept IDs in the FinnGen Kanta lab values data.
How to load the data file using R?
Start your Sandbox IVM with the machine type that has 128 GB of memory
Copy the Parquet release file to your home folder.
In the terminal, load R with lmod:
module load tidyverse/4.3.1
Start
R
and then use the pre-installed arrow package to load the data file:library(arrow)
read_parquet("finngen_R12_kanta_lab_1.0.parquet")
How to load the full dataset using Python pandas?
Start your Sandbox IVM with the machine type that has 128 GB of memory.
Copy the Parquet release file to your home folder.
In the terminal, load Python with lmod:
module load fg-python/3.11.7
Start
python
and then use pandas to load the data file:import pandas as pd df = pd.read_parquet("finngen_R12_kanta_lab_1.0.parquet")
How to load the dataset for only one OMOP Concept ID using Python pandas?
The Parquet file format allows for efficient data filtering, and Python pandas can take advantage of this:
Start your Sandbox IVM with a machine type that has at least 16GB of memory.
Copy the Parquet release file to your home folder.
In the terminal, load Python with lmod:
module load fg-python/3.11.7
Start
python
and then use pandas to load the data file:import pandas as pd
df_omop_id = pd.read_parquet("finngen_R12_kanta_lab_1.0.parquet", filters=[("OMOP_CONCEPT_ID", "==", "<your-omop-concept-id>")])
How to access the data using BigQuery SQL?
The Kanta lab values data is available in BigQuery in the table finngen-production-library.sandbox_tools_r12.kanta_r12_v1
.
Here is how to access it using Python, though it also possible using R.
Start your Sandbox IVM.
In the terminal, load Python with lmod:
module load fg-python/3.11.7
Start
python
and use the following code to print the first 10 lines of the data:
You can change the above query to select different columns and add different conditions to extract the data.
How to look up what data is available for a test without using the Sandbox?
FinnGen provides a pulic tool to easily look up a test, without requiring Sandbox access. Results with a low individual count are not shown, for this you need to query the data directly in the Sandbox.
At the top of the page, select the tab "Mapping status per test+unit combination".
Use the filters to look up your test:
If you have the short code name for the test, use the the text field for the "TestCode [unit]" column.
If you have a descriptive name, use the text field for the "Omop Name" column.
Each row in the search results correspond to the mapping of a TestCode and a unit to an OMOP ConceptId. Results are provided with statistics such as: number of events, test outcome percentages, measured value percentiles, percentage of missing values.
How to look up a test in the data from the Sandbox?
As an example, let's take the test ID 23133 from HUSLAB, wich stands for the B -PGx-D
test.
We will use BigQuery with Python in this guide, but it's also possible using BigQuery with R, or using the Parquet file, or using the TSV-gzipped file.
Looking up the test by TEST_ID
TEST_ID
Start your Sandbox IVM.
In the terminal, load Python with lmod:
module load fg-python/3.11.7
Start
python
and setup the BigQuery connection:from google.cloud import bigquery
client = bigquery.Client()
Then make this query to see how many rows and people there is in the data, split on the different
TEST_NAME
s associated with ourTEST_ID
of interest: 23133
This prints a table where we see that there are ~1.2k rows for ~1.2k people with the test name b-pgx-d
and the test ID 23133.
Extending the look up by using the TEST_NAME
from the result
TEST_NAME
from the resultStart your Sandbox IVM.
In the terminal, load Python with lmod:
module load fg-python/3.11.7
Start
python
and setup the BigQuery connection:from google.cloud import bigquery
client = bigquery.Client()
Then make this query to see how many rows and people there are in the data for the
TEST_NAME
s containingb-pgx-d
:
This prints a table showing that they are ~1.2k rows with TEST_NAME = 'b-pgx-d'
and TEST_ID = '23133'
, and ~200 rows with TEST_NAME = 'b-pgx-d'
and TEST_ID = '54976'
.
Last updated