Running analyses in your IVM vs. Pipelines

Sandbox Pipelines are used for large-scale analysis and enable the use of parallelization and custom-sized virtual machines.

Overview

  • Using Pipelines requires some knowledge of the WDL language

  • Using the scatter function the user can call (call section in the wdl) as many VMs as are available on the Google cloud.

  • The VM size can be customized in runtime settings in the wdl

  • It is possible to submit multiple pipeline runs simultaneously

  • Workflows are encoded using workflow definition language (wdl)

  • it is possible to run calls in a parallel or serial manner

  • Tasks in each call are defined in the task section of the wdl

  • Pipeline jobs are always batch jobs so the entire pipeline must be coded in a single workflow

  • pipelines cannot be used interactively

Differences

Although there are many differences in pipeline and IVM usage the underlying commands to do analysis are essentially the same.

In pipeline the commands are just translated from workflow language to bash or another coding language.

Input localization and delocalization to the VM called by wdl language are encoded with special variable type “File”.

Details about wdl language are available at:

Terra support / WDL Documentation

You can launch the Pipelines from the Sandbox menu Applications>FinnGen>Pipelines or via the command line.

Last updated