BigQuery Connection – R
This is a short tutorial on how to use BigQuery SQL to access the FinnGen register data in the sandbox using R. Relational databases have many benefits over flat text files. For the end-user, the main benefit is the quick and optimized execution of complex and large data wrangling tasks using a simple language, the SQL. We can process large amounts of data without need for large amounts of memory or computing power as the server-side backend will do all the heavy-lifting. In FinnGen, reading the full detailed longitudinal data from a text file takes about half an hour, and requires the virtual machine with the largest memory capacity. In contrast, if you use BigQuery, you can filter and re-structure your data with SQL, and your R program receives swiftly just the data it needs, and the memory requirement depends on your data subset.
Initalizing BigQuery connection
Suppose you'd like to know how many rows there are in the longitudinal data table. The following simple R program will output a single row, single column data frame containing the number of rows.
To see which tables are available in BigQuery use the following.
More examples and detailed tutorial script's location in the Sandbox
Below is a path to the R Markdown file in Sandbox. This R Markdown file can be converted to an html page using "knit" command. The R Markdown is a plain text file, so you can copy code snippets that you need.
/finngen/library-green/scripts/code_snippets/BigQuery_Templates_release.Rmd
Last updated