BigQuery Connection – R

This is a short tutorial on how to use BigQuery SQL to access the FinnGen register data in the sandbox using R. Relational databases have many benefits over flat text files. For the end-user, the main benefit is the quick and optimized execution of complex and large data wrangling tasks using a simple language, the SQL. We can process large amounts of data without need for large amounts of memory or computing power as the server-side backend will do all the heavy-lifting. In FinnGen, reading the full detailed longitudinal data from a text file takes about half an hour, and requires the virtual machine with the largest memory capacity. In contrast, if you use BigQuery, you can filter and re-structure your data with SQL, and your R program receives swiftly just the data it needs, and the memory requirement depends on your data subset.

Initalizing BigQuery connection

Suppose you'd like to know how many rows there are in the longitudinal data table. The following simple R program will output a single row, single column data frame containing the number of rows.

library(bigrquery)

# projectid is your sandbox name (check the URL in your web browser)
# please see gargle and bigrquery documentation for the next two settings
projectid <- "fg-production-sandbox-4"
options(gargle_oauth_cache = FALSE)
bq_auth(scopes = "https://www.googleapis.com/auth/bigquery")

# SQL query as a multiline character string
sql <- paste0(
    "SELECT COUNT(*) ",
    "FROM finngen-production-library.",
    "sandbox_tools_r10.service_sector_detailed_longitudinal_v2"
)

# execute the query, download the result
tb <- bq_project_query(projectid, sql)
df <- bq_table_download(tb)

print(df)

To see which tables are available in BigQuery use the following.

sql<-"
SELECT table_schema, column_name, data_type
FROM finngen-production-library.sandbox_tools_r10.INFORMATION_SCHEMA.COLUMNS
WHERE table_name = 'finngen_r10_minimum_v1'
LIMIT 50
"

tb <- bq_project_query(projectid, sql)
df <- bq_table_download(tb)

df 

More examples and detailed tutorial script's location in the Sandbox

Below is a path to the R Markdown file in Sandbox. This R Markdown file can be converted to an html page using "knit" command. The R Markdown is a plain text file, so you can copy code snippets that you need.

/finngen/library-green/scripts/code_snippets/BigQuery_Templates_release.Rmd

Last updated