Managing memory in Sandbox and data filtering tips
Last updated
Last updated
Using optimal machine size for tasks performed will save costs as Sandbox billing is also based on the size of the machine used.
The 'Basic Machine' (1 vCPU, 3.75 GB) is good for standard use like navigating the Sandbox, building cohorts in Atlas, and starting pipelines. Loading phenotype data in R needs a lot of memory and 'Rather Big Machine' (16 vCPUs, 104 GB).
Saving data to your home disk /home/ivm/ in Sandbox consumes the space in home disk that is not dependent on the IVM size. Checking the space in home disk and Resizing the home disk.
It is possible to consume more memory than there is in your IVM. When memory runs out IVM gets very slow or stuck. If your IVM is unresponsive you may force your IVM to shut down after you can continue working normally by creating a new IVM (from the ‘Start machine’ button, see figure above).
To force IVM to shut down see if the start button in the left sidebar is available and click it. If the start button is not available contact humgen-servicedesk@helsinki.fi. Admin at the service desk can force your IVM to shut down. After the IVM is terminated you can continue working normally by creating a new IVM. Note! Forcing IVM to shut down will cause loss of all unsaved data. In the worst case forcing IVM to shut down may corrupt your persistent disk causing loss of all data at your /home/ivm folder.
To plan memory usage, you can check how much memory there is in your IVM. Open Terminal Emulator and type free -m
To check memory and cpu usage per process type in Terminal top
and q
to exit.
Reading big files like phenotype, genotype, or longitudinal data into RStudio will consume a lot of memory and requires the ‘Rather Big Machine’ (16 vCPUs, 104 GB). You can check how much memory RStudio session is currently using and how much you have left from the memory usage widget in RStudio Environments. Here for example the RStudio session is currently using 232 MiB. For a detailed report of memory usage click the small triangle to see a drop-down menu and select "Memory Usage Report". Here current session is using 43% of the memory while 57% of the memory is free.
Filtering data with Unix commands consumes considerably less memory than filtering data with R. For example, filtering with RStudio needs loading e.g. detailed longitudinal data to RStudio and consequently ‘Rather Big Machine’. On the contrary, the same filtering can be done with ‘Basic Machine’ using Terminal. After the data is prefiltered in Terminal it may be loaded to R/RStudio for further analyses possibly with Basic Machine.
For example, to filter with Linux command for J45 (ICD10 code for Asthma) in Terminal
zcat path/to/finngen_R8_detailed_longitudinal.txt.gz | grep J45 > my_result_file.txt
The filtered file containing all rows with the text “J45” will appear in your /home/ivm directory. The result file can be loaded to R/RStudio and continue analyzing there. To load the pre-filtered table in R/Rstudio
library(R.utils)
my_result_file = fread("/home/ivm/my_result_file.txt", data.table = FALSE)
NB!! If you filter at the command line be careful in R to check the code set. For example, F29 = psychosis in ICD10 and eye discomfort in ICPC2 so you will get both sets filtering simply like this at the command line and will need to check in R that the code set is correct.
We may not need all the columns in the file to perform our analyses. Subsetting 10 columns to 5 columns will cut the size of the file in half.
To head columns
To select columns
Home disk is the users' private disk (/home/ivm/ folder in Sandbox) where users can save their own files. No other users besides the account owner have access to the private home disk. By default, the size of the home disk is 10 GB. The amount of space in the home disk is not dependent on the IVM size (Basic, Advanced, or Rather Big Machine).
To check the size and amount of space in your home disk type in Terminal
df -h /home
The output will give the size of the home disk, used space, available space, percent of space used, and the folder
If the space in home disk is running out it is recommended to free space by removing unneeded files and folders e.g. with the rm
command in Terminal. Note that the rm
command is irreversible. Be careful when using rm
as restoration of removed files and folders is not possible after rm
command. Using -i
flag option will prompt before removal.
To remove a file
rm -i my_file.txt
To remove a folder and all of its content
rm -ri my_folder
It is also possible to resize the home disk to enable more space for the user's files and folders.
The trash bin may hold a lot of files consuming home disk space. Make sure to clear the trash bin from time to time.
Docker images and containers can take lot of space in user home disk. These most commonly accumulate when running Cohort Operation and other container based applications in Sandbox. Original container images are stored in shared cloud repository and pulled there automatically to the IVM when running the application like Cohort Operation hence these can be relatively safely removed from user home disk to free disk space. Below you can find few relevant code examples to manage docker containers and images. More information can be found from docker web pages and from this handbook page. To manage docker resources in the IVM "docker >resource< prune" command is very help full (see details here). Following code would clean the docker resources (images, containers, and networks) from the IVM.
docker system prune
Note that the change is permanent! Once you have upgraded your home disk size, you can’t reduce it.
By default, the size of the home disk is 10 GB. Open Terminal and type
df -h | grep home
Then close IVM, resize home disk size up to 20 GB, and start the smallest IVM again Note that once done you can’t revert this action. Your smallest IVM will permanently be 20 GB instead of 10 GB and it will cost accordingly.
After the home disk is resized repeating the command df -h | grep home
shows that IVM now has in total 20 G of memory from which 95 M is used and 19 G is free.
To see how many CPU type lscpu | grep 'CPU(s):'
. Note that CPU has increased from 1 to 2.
Before you resize your home disk, please consider that it can’t be reverted and it will affect on your IVM costs.
You can resize your home disk from the front page of the Virtual Machine from