Running analyses in your IVM vs. Pipelines
Sandbox Pipelines are used for large-scale analysis and enable the use of parallelization and custom-sized virtual machines.
Overview
Using Pipelines requires some knowledge of the WDL language
Using the scatter function the user can call (call section in the wdl) as many VMs as are available on the Google cloud.
The VM size can be customized in runtime settings in the wdl
It is possible to submit multiple pipeline runs simultaneously
Workflows are encoded using workflow definition language (wdl)
it is possible to run calls in a parallel or serial manner
Tasks in each call are defined in the task section of the wdl
Pipeline jobs are always batch jobs so the entire pipeline must be coded in a single workflow
pipelines cannot be used interactively
Differences
Although there are many differences in pipeline and IVM usage the underlying commands to do analysis are essentially the same.
In pipeline the commands are just translated from workflow language to bash or another coding language.
Input localization and delocalization to the VM called by wdl language are encoded with special variable type “File”.
Details about wdl language are available at:
Terra support / WDL Documentation
You can launch the Pipelines from the Sandbox menu Applications>FinnGen>Pipelines or via the command line.
Last updated