You can copy paste from below explanation or take the code directly from the file itself.
Patient timeline in FinnGen from starting event recorded to the last/latest event.
The query is very simple as to extract whole information of a patient from the detailed longitudinal table. However, the detailed longitudinal table contains ICD, drug, reimbursement and other codes but no information what that code means. To get the proper mapping of what the overall theme of a each event means, we developed a code mapper SQL script with details here. The same SQL script is also exposed as JAVASCRIPT function codeProcess within Sandbox. Also, the selection of a FINNGENID is randomized just for this case study purpose.
#
import os, sys
from google.cloud import bigquery
import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt
# Connect to client
client = bigquery.Client()
# Query
query = """ WITH temp AS (
SELECT ssdl.*,
# The function sandbox.codeProcess returns processed CODE1, CODE2 and CODE3 along with vocabulary_id
sandbox.codeProcess(TO_JSON_STRING(ssdl), 'CODE1_CODE2', 'MORPO_BEH_TOPO', 'REIMB', 'ATC', 5, 5, 5, 7, 5).*
FROM `finngen-production-library.sandbox_tools_r10.finngen_r10_service_sector_detailed_longitudinal_v1` AS ssdl
# Here we are randomly selecting a FinnGen patient FINNGENID. You can replace this with a particular FINNGENID
WHERE FINNGENID = (
SELECT DISTINCT FINNGENID
FROM `finngen-production-library.sandbox_tools_r10.finngen_r10_service_sector_detailed_longitudinal_v1`
WHERE RAND() < 10/(SELECT COUNT(*) FROM `finngen-production-library.sandbox_tools_r10.finngen_r10_service_sector_detailed_longitudinal_v1`)
LIMIT 1
)
# The results of patient are ordered by the date
ORDER BY ssdl.APPROX_EVENT_DAY, ssdl.SOURCE
)
# The processed codes are mapped to English names
SELECT t.*,fgc.concept_class_id,fgc.name_en,fgc.name_fi
FROM temp AS t
LEFT JOIN `finngen-production-library.medical_codes.fg_codes_info_v1` as fgc
ON t.vocabulary_id = fgc.vocabulary_id AND
t.FG_CODE1 IS NOT DISTINCT FROM fgc.FG_CODE1 AND
t.FG_CODE2 IS NOT DISTINCT FROM fgc.FG_CODE2 AND
t.FG_CODE3 IS NOT DISTINCT FROM fgc.FG_CODE3
"""
# Job configuration
job_config = bigquery.QueryJobConfig()
# Run the queries
query_result = client.query(query,job_config=job_config)
Save the query result to dataframe
columnNames = list(list(query_result)[0].keys())
temporaryList = []
for row in query_result:
temporaryList.append(list(row))
# Convert the temporaryList to dataframe using pandas package
query_result_dataframe = pd. DataFrame(temporaryList, columns = columnNames)
# Change the data type of date column to datetime format
query_result_dataframe['APPROX_EVENT_DAY'] = pd.to_datetime(query_result_dataframe['APPROX_EVENT_DAY'])
Use the patient information stored in dataframe to plot a patient timeline