Case study – All register data for a person

There are two variations of this example. The first example gets the data for a list of FINNGENIDs and adds English descriptions of codes to data by using FinnGenUtilsR package, and the second example reads the list of FINNGENIDs from a text file having a column titled FINNGENID (as one of the columns).

Get all the data for a list of FINNGENID's

In this version, the full data is fetched for a given list of FINNGENID's (which may be a single id as well).

# helper function
# returns a data frame with all the data for given FINNGENID's

get_all_data <- function(...){  
  finngenids <- list(...)  
  sql<-paste0(  
    "SELECT * ",  
    "FROM finngen-production-library.",
    ".sandbox_tools_r10.", 
    "finngen_r10_service_sector_detailed_longitudinal_v2 ",  
    "WHERE FINNGENID IN (", paste0("'", finngenids, "'", collapse = ","), ")"   
  )
  tb <- bq_project_query(projectid, sql)  
}  

# give a list of FINNGENIDs here (these are fake ID's)
tb <- get_all_data('FG00001', 'FG00002', 'FG00003')

# extend with code translation by using FinnGenUtilsR package
df <- FinnGenUtilsR::fg_bq_append_code_info_to_longitudinal_data(
  projectid, tb,
  "finngen-production-library.medical_codes.fg_codes_info_v1"
)

df_with_codes <- bq_table_download(df) %>% 
  arrange(EVENT_AGE) %>% 
  mutate(AGE = round(EVENT_AGE)) %>% 
  select(FINNGENID, AGE, SOURCE, name_en, CODE1, CODE2, CODE3)

# # for a small data set we can use an interactive table
# DT::datatable(df_with_codes)

# this works even with large data sets
df_with_codes

Get all the data for a file of FINNGENID's

The second variation reads the FINNGENID's from a file which is more convenient if you have many of them. FINNGENIDs are given as a tab-separated text file (.tsv) where there must be a column named "FINNGENID". This can be a file generated by other FinnGen tools, e.g. Cohort Operations.

# create an example data file with fake FINNGENID's
finngenids <- tribble(
  ~FINNGENID, ~OTHER_COLUMN, ~YET_ANOTHER_COLUMN,
  'FG00002', 1, 2,
  'FG00003', 3, 4,
  'FG00004', 5, 6
)
write_tsv(finngenids, "FINNGENIDS.TSV")

get_all_data <- function(...){  
  finngenids <- read_tsv(filename, show_col_types = FALSE) %>% pull(FINNGENID)  
  sql<-paste0(  
    "SELECT * ",  
    "FROM finngen-production-library.",
    ".sandbox_tools_r10.", 
    "finngen_r10_service_sector_detailed_longitudinal_v2 ",  
    "WHERE FINNGENID IN (", paste0("'", finngenids, "'", collapse = ","), ")"   
  )
  tb <- bq_project_query(projectid, sql)  
} 
# give FINNGENIDs here
tb <- get_all_data("FINNGENIDS.TSV")

df <- FinnGenUtilsR::fg_bq_append_code_info_to_longitudinal_data(
  projectid, tb,
  "finngen-production-library.medical_codes.fg_codes_info_v1"
)

df_with_codes <- bq_table_download(df) %>% 
  arrange(EVENT_AGE) %>% 
  mutate(AGE = round(EVENT_AGE)) %>% 
  select(FINNGENID, AGE, SOURCE, name_en, CODE1, CODE2, CODE3)

# # for a small data set we can use an interactive table
# DT::datatable(df_with_codes)

# this works with larger data sets
df_with_codes

Location of the script's in the Sandbox

Below is a path to the R Markdown file in Sandbox. This R Markdown file can be converted to an html page using "knit" command. The R Markdown is a plain text file, so you can copy code snippets that you need.

/finngen/library-green/scripts/code_snippets/BigQuery_Templates_release.Rmd

Last updated