Biokepi: Use the demo

Using The Demo Application

Once all set with Ketrew's mini-tutorial, one can submit a few Biokepi “example” pipelines to that same Ketrew daemon.

The application is meant to show how to program with the library, especially how to build pipelines with Biokepi.Pipeline.

For now the demo looks like a somatic variant calling pipeline (hence the use of the words “tumor,” “normal,” etc.). They have not been debugged enough to be used in production.

It is configured with environment variables. It will run the pipelines on a given machine, accessed through SSH. Unfortunately within the demo one cannot specify a scheduler interface.

Variables to set:

  • BIOKEPI_DATASET_NAME: a name for the dataset to be analyzed. This is used as a namespace for file-naming and separating work environments from other analyses run by Biokepi on target machine.
  • BIOKEPI_NORMAL_R1, BIOKEPI_NORMAL_R2, BIOKEPI_TUMOR_R1, and BIOKEPI_TUMOR_R2: the input files. For now each variable may contain a comma-separated list of *.fastq.gz (absolute) files on the running machine.
  • BIOKEPI_SSH_BOX_URI: an URI describing the machine to run on. For example ssh://SshName//home/user/biokepi-test/metaplay where:
    • SshName would be an entry in the .ssh/config of the server running Ketrew.
    • /home/user/biokepi-test/metaplay is the top-level directory where every generated file will go.
  • BIOKEPI_MUTECT_JAR_SCP or BIOKEPI_MUTECT_JAR_WGET: if you use Mutect (the default pipeline does) you need to provide a way to download the JAR file (biokepi would violate its non-free license by doing it itself). So use something like: BIOKEPI_MUTECT_JAR_SCP=MyServer:/path/to/mutect.jar.
    Same goes for GATK (with BIOKEPI_GATK_JAR_{SCP,WGET}).
  • BIOKEPI_CYCLEDASH_URL: if you are using Cycledash (i.e. the option -P) then you need to provide the base URL (Biokepi will append /runs and /upload to that URL).

For example:

export BIOKEPI_DATASET_NAME="CP4242"
export BIOKEPI_NORMAL_R1=/path/to/R1_L001.fastq.gz,/R1_L002.fastq.gz
export BIOKEPI_NORMAL_R2=/path/to/R2_L001.fastq.gz,/R2_L002.fastq.gz
export BIOKEPI_TUMOR_R1=/path/to/R1_L001.fastq.gz,/R1_L002.fastq.gz
export BIOKEPI_TUMOR_R2=/path/to/R2_L001.fastq.gz,/R2_L002.fastq.gz
export BIOKEPI_SSH_BOX_URI=ssh://SshName//home/user/biokepi-test/metaplay
export BIOKEPI_MUTECT_JAR_SCP=MyServer:path/to/mutect.jar
export BIOKEPI_GATK_JAR_WGET=http://example.com/top-secret/gatk.jar
export BIOKEPI_CYCLEDASH_URL=http://cycledash.example.com

Then you can run a few predefined somatic pipelines:

./biokepi-demo list-named-pipelines

will list the names you can use.

./biokepi-demo dump-pipeline -N somatic-crazy

will display the JSON representation of the pipeline named somatic-crazy.

./biokepi-demo run -N somatic-simple-mutect

should submit a the pipeline to your Ketrew server.