Tools bpipe 用于构建分析流程

bpipe是用于构建分析流程的软件，语法基本与shell一致[bpipe语法详情(http://docs.bpipe.org/Guides/ParallelTasks/)。 bpipe脚本的例子参数解析：

注释

@Transform("align the fq to reference genome")和@Filter("use picard to make duplicates")是注释语法，用来注释某一步骤的作用

@Transform("align the fq to reference genome")
align = { 
         exec """ 
          bwa aln $REFERENCE $input > $output.sai;
          bwa samse $REFERENCE $output.sai $input > $output.sam
         """ 
}

步骤

上述步骤名字是align,后面使用{exec """ """}中间是需要执行的shell命令。$input和$output是输入和输出文件，最好是跟上对应的后缀，bpipe会在流程合并起来run的时候，自动判断前后命令的输入和输出，前一步的输出是后一步的输入。如果前一步的输出不是后一步的输入，需要在exec “”“ ”“”下面加上一行forward input。当使用有后缀的输入和输出时，这种情况就可以大大降低。

并行任务

[task1,task2]并行task1和task2,task里面可以继续嵌套其他的并行或串行任务例如最下面的完整的实例中，index可以和call_variants一同执行，则run应该修改为：

run {
    align + sort + dedupe + [index , call_variants]
}

转载参考https://mp.weixin.qq.com/s?__biz=MzUzMTEwODk0Ng==&mid=2247487816&idx=1&sn=c87971a6a506c24ac80bc0d988287d55&scene=21#wechat_redirect bpipe1.txt内容如下：

PICARD_HOME="/usr/local/picard-tools/"
REFERENCE="reference.fa"
@Transform("align the fq to reference genome")
align = { 
         exec """ 
          bwa aln $REFERENCE $input > $output.sai;
          bwa samse $REFERENCE $output.sai $input > $output.sam
         """ 
}
@Transform("sort sam file to bam")
sort = { 
    exec "samtools view -bSu $input.sam | samtools sort -o - - > $output.bam"
}

@Filter("use picard to make duplicates")
dedupe = { 
    exec """ 
           java -Xmx1g -jar $PICARD_HOME/MarkDuplicates.jar
                            MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=1000
                            METRICS_FILE=out.metrics
                            REMOVE_DUPLICATES=true
                            ASSUME_SORTED=true
                            VALIDATION_STRINGENCY=LENIENT
                            INPUT=$input.bam
                            OUTPUT=$output.bam
    """ 
}
@Transform("build a index of bam file")
index = { 
    exec "samtools index $input.bam"
}

call_variants = { 
    exec "samtools mpileup -uf $REFERENCE $input.bam | bcftools view -bvcg - > $output.vcf"
}

run {
    align + sort + dedupe + index + call_variants
}

运行方式 bpipe run pipeline1.txt s_1.fq

不需要某个步骤时，直接修改pipeline1.txt文件里run的参数，例如：不需要index,则run应该是：

run{
  align + sort + dedupe + call_variants
}

bpipe支持按照染色体并行化

hello = {
    exec """samtools view test.bam $chr | some_other_tool """
}
Bpipe.run {
  chr(1..10, 'X','Y') * [ hello ]
}

上面会默认使用UCSC的基因组。 hg19.split(40) * [ stage1 + stage2 ]可以把基因组分成40份来并行

多个文件并行运行同一个命令

方法1：

Bpipe.run {
   "input_%.txt" * [ hello + world ] + nice_to_see_you
}

bpipe run helloworld.pipe input*.txt

注意：此时需要同时修改helloworld.pipe文件里的run的信息，同时在使用bpipe调用的时候，也需要加上`input*.txt`.这样才能批量并行执行脚本。

方法2：

// Create a data structure (Map) that maps branches to files
def branches = [
    sample1: ["sample1_2.fastq.gz"],
    sample2: ["sample2_2.fastq.gz"],
    sample3: ["sample3_2.fastq.gz"]
]

align = {
   // keep the sample name as a branch variable
   branch.sample = branch.name 
   ...
}

run { branches * [ align ] }

这样一次运行三个文件。

http://docs.bpipe.org/Guides/ParallelTasks/

WGS的bpipe的示例 github