BigQuery Connection - Python

This coding example contains instructions on how to connect to BigQuery with the Python programming language in the Sandbox. It also gives examples of:

different ways of displaying the results from the query execution
how to configure a job to run as a query
how to list datasets and tables in a project

Location of the script and readme in the Sandbox

/finngen/library-green/scripts/code_snippets/

Run the whole and script and prints out results of multiple queries

python3 /finngen/library-green/scripts/code_snippets/bigquery_python.py

You can copy paste from below explanation or take the code directly from the file itself.

Connecting to BigQuery using Python package

# Import system packages
import os, sys
# Import bigquery
from google.cloud import bigquery
# Import pandas package
import pandas as pd

# Connection to BigQuery
client = bigquery.Client()

Once the connection is established you can list the datasets

# As example, list datasets in "finngen-production-library" project
client = bigquery.Client(project = 'finngen-production-library')

# list_datasets() function will return all datasets in the project
datasets = list(client.list_datasets())
# Print the datasets in the project
if datasets: # Proceed if there are any datasets
	print('Datasets in the project {}'.format(client.project))
	for dataset in datasets:
		print('{}'.format(dataset.dataset_id))
else:
	print('No datasets found in the project {}'.format(client.project))

Similarly, you can list tables in a particular dataset or all datasets.

# list_tables() function will return all tables in the dataset
# As example, list tables in dataset "sandbox_tools_r10" in "finngen-production-library" project
dataset = 'finngen-production-library.sandbox_tools_r10'
tables = list(client.list_tables(dataset))
# Print the tables in the dataset
if tables: # Proceed if there are any tables
	print('Tables in the dataset {}'.format(dataset))
	for table in tables:
		print('{}'.format(table.table_id))
else:
	print('No tables found in the dataset {}'.format(dataset))

Running a simple query to select FINNGENID, SOURCE, APPROX_EVENT_DAY from the table finngen_r10_service_sector_detailed_longitudinal in the dataset sandbox_tools_r10 within the project finngen-production-library. Only select 10 rows from the table.

# Connection to BigQuery
client = bigquery.Client()

query = """ SELECT FINNGENID, SOURCE, APPROX_EVENT_DAY
            FROM `finngen-production-library.sandbox_tools_r10.finngen_r10_service_sector_detailed_longitudinal_v1`
	    LIMIT 10;
     	"""
# To execute the query, you need job configuration where you can pass job details
job_config = bigquery.QueryJobConfig()
# Execute the query
query_result = client.query(query,job_config=job_config)

You can print the results of the query in list format.

# Print the results in list format
#  Print he column names
print('{}'.format(list(list(query_result)[0].keys())))
#  Print the 10 rows 
for row in query_result:
	print(list(row))

You can also print the results in dataframe format.

# Save the query result into a dataframe and print the dataframe.
# We need pandas package to make this work and we have loaded the package
columnNames = list(list(query_result)[0].keys())
temporaryList = []
for row in query_result:
	temporaryList.append(list(row))
# Convert the temporaryList to dataframe using pandas package
query_result_dataframe = pd. DataFrame(temporaryList, columns = columnNames)
# Print the dataframe
query_result_dataframe

More complex queries can be found in the script file. Also, you can use anaconda environment to connect to BigQuery in the Sandbox. Details on how to use anaconda python in Sandbox are here.

PreviousCase study – defining simple cohorts using medical codes for running case-control GWAS NextBigQuery Python - Downstream analysis - Active Ingredient - Bar plot

Last updated 6 months ago