基因组序列拼接后的补洞——GapCloser

Published at 2020-05-08 13:31

Author:zhixy

View:1300


简介

由华大基因开发(SOAPdenovo2的套件),用来对SOAPdenovo2或者其他软件在连接成scaffold过程中引入的gap进行回补的。

推荐安装:

(base) [user@server ~]# conda install -c bioconda soapdenovo2-gapcloser

编写配置文件config.lib

#maximal read length
max_rd_len=141
[LIB]
#average insert size
avg_ins=350
#if sequence needs to be reversed
reverse_seq=0
#in which part(s) the reads are used
asm_flags=4
#use only first 100 bps of each read
rd_len_cutoff=100
#in which order the reads are used while scaffolding
rank=1
# cutoff of pair number for a reliable connection (at least 3 for short insert size)
pair_num_cutoff=3
#minimum aligned length to contigs for a reliable read location (at least 32 for short insert size)
map_len=32
#a pair of fastq file, read 1 file should always be followed by read 2 file
q1=../path/to/reads_forword.fastq
q2=../path/to/reads_reverse.fastq

运行GapCloser

(base) [user@server ~]# GapCloser -h
GapCloser: invalid option -- 'h'
Version:
        1.12

Contact:
        soap@genomics.org.cn

Usage:
        GapCloser [options]
        -a      <string>        input scaffold file name, required. # 组装结果
        -b      <string>        input library info file name, required. # config.lib
        -o      <string>        output file name, required. # 补洞结果
        -l      <int>           maximum read length (<=155), default=100.
        -p      <int>           overlap param(<=31), default=25.
        -t      <int>           thread number, default=1. #计算核心数
        -h      -?              output help information.

(base) [user@server ~]# GapCloser -b config.lib -a scaffolds.fa -o scaffolds_gapclosed.fa -t 100