BigQuery Python - Case Study - Sex different - Tornado plot

In this we detail a scenario for how you plot sex differences distributed across different age bins within a FinnGen endpoint.

Location of the script

/finngen/library-green/scripts/code_snippets/codeSnippet_sexDifference_endpoint.py

You can copy-paste from the below explanation or take the code directly from the file itself.

From the samples in the F5_ALZHDEMENT endpoint, it would be interesting to see what the age range of male and female samples looks like. The first thing is to extract the sex and age information and then proceed to plot them.

You can get the sex information of the FINNGENIDs in the column SEX from the table r10_cov_v1_fid in the dataset sandbox_tools_r10 in the project finngen-production-library. You can extract the information using the below query

# Import packages
import os, sys
from google.cloud import bigquery
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Connect to client
client = bigquery.Client()

# Run the query
query = """ WITH endpointSubSet AS(
                  SELECT FINNGENID
                  FROM `finngen-production-library.sandbox_tools_r10.endpoint_cohorts_r10_v1`
                  WHERE ENDPOINT = 'F5_ALZHDEMENT'
            )
            SELECT ESS.FINNGENID AS FINNGENID,
                   DL.EVENT_AGE AS EVENT_AGE,
                   COV.sex AS SEX
            FROM endpointSubSet AS ESS
            JOIN `finngen-production-library.sandbox_tools_r10.finngen_r10_service_sector_detailed_longitudinal_v1` as DL
            ON ESS.FINNGENID = DL.FINNGENID
            JOIN `finngen-production-library.sandbox_tools_r10.covariates_r10_v1` as COV
            ON ESS.FINNGENID = COV.fid
            WHERE DL.CODE1 LIKE '%F00%' # ICD code of Alzheimer's Dementia endpoint
        """
# Job configuration
job_config = bigquery.QueryJobConfig()
# Run the query
query_result = client.query(query,job_config=job_config)

Save the results to a data frame

columnNames = list(list(query_result)[0].keys())
temporaryList = []
for row in query_result:
      temporaryList.append(list(row))
# Convert the temporaryList to dataframe using pandas package
query_result_dataframe = pd.DataFrame(temporaryList, columns = columnNames)

Before plotting, we need to create age bins for male and female

# Create Age bins and add the age bins to respective EVENT_AGE
ageBins = [0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100]
query_result_dataframe['AgeBin'] = pd.cut(query_result_dataframe['EVENT_AGE'],bins=ageBins)

# We need to count male and female population in the respective age bins
ageBinSexCount = query_result_dataframe.pivot_table(['EVENT_AGE'],index=['AgeBin'], columns='SEX',aggfunc=len, fill_value=0)
# Change the column names to more readable format
ageBinSexCount.columns = ['_'.join(str(s).strip() for s in col if s) for col in ageBinSexCount.columns]
ageBinSexCount.reset_index(inplace=True)
ageBinSexCount = ageBinSexCount.rename(columns={'EVENT_AGE_female':'Female','EVENT_AGE_male':'Male'})

Plot the sex difference in different age bins

# The Male counts will be on the negative side of x-axis so have to change the values to negative
ageBinSexCount['Male'] =  ageBinSexCount['Male'] / -1
# Plot parameters you can change
large = 22; med = 16; small = 12
params = {'axes.titlesize': large,
       'legend.fontsize': med,
       'figure.figsize': (16, 10),
       'axes.labelsize': med,
       'axes.titlesize': med,
       'xtick.labelsize': med,
       'ytick.labelsize': med,
       'figure.titlesize': large}
plt.rcParams.update(params)
# Plot background style. You can use different styles
plt.style.use('seaborn-whitegrid')
sns.set_style("white")
# This is just for Python Jupyter notebook where you want the plot to be inline. You can ignore this for now
%matplotlib inline
# The age bins will be changed to decreasing order as old age will be on top and young age will be bottom
ages = list(reversed(ageBinSexCount['AgeBin'].tolist()))
# Initiate the plot
plt.rcParams["figure.figsize"] = (15, 8)
ax1 = sns.barplot(x='Male', y='AgeBin', data=ageBinSexCount, order=ages, palette="Blues")
ax2 = sns.barplot(x='Female', y='AgeBin', data=ageBinSexCount, order=ages, palette="Greens")
plt.title("Sex distribution in age bins of ALZHDEMENT endpoint")
plt.xticks(ticks=[-10000, -5000, -2500, -1000, 0, 1000, 2500, 5000, 10000],
    labels = ['10k', '5k', '2.5k', '1k', '0', '1k', '2.5k', '5k', '10k']
   )
plt.grid()
plt.xlabel("Male/Female")
# You can also save the plot
plt.savefig('/home/ivm/sexdistribution_pyramid.png')

Last updated