Can I select only the columns needed for my analysis to import into RStudio?

FinnGen data is not only available as flat files, but also available via the database BigQuery in two formats

  • the same format as the longitudinal data file

  • the OHDSI Common Data Model formatted (this format also has the advantage that other side data -births, height, weight, smoking, kidney dialysis - is mapped into it)

Addressing the data via BigQuery will be much faster and lower memory than using the flat file. If you want to continue with the flatfile, the following tips may help.

FinnGen files take up a lot of memory, so it's advisable to understand what you're importing beforehand and will save you time and hassle. You can select what columns you'd like to import for your analysis (e.g. in the Sandbox Terminal Emulator) and save them in your working directory using a command like:

zcat /path/yourFileName.txt.gz | cut -f 1,3,10-50,80 | gzip > selectedColumns.txt.gz

This command selects columns 1, 3, from 10 to 50, and 80 from "yourFileName.txt.gz" and exports them to a new gzipped file.

Last updated