Published at 2020-05-08 13:54
Author:zhixy
View:1195
与GapCloser一样,GapFiller也是对基因组拼接Scaffolds结果,进行补洞的软件。
获取程序需向作者提交申请(https://www.baseclear.com/services/bioinformatics/basetools/gapfiller/) ,获得源码后编译安装。
(base) [user@server ~]# perl /usr/bio/GapFiller/GapFiller.pl
ERROR: Parameter -l is required. Please insert a library file
ERROR: Parameter -s is required. Please insert a scaffold fastA file
Usage: /usr/bio/GapFiller/GapFiller.pl [GapFiller_v1-10]
============ General Parameters ============
-l Library file containing two paired-read files with insert size, error and orientation indication. # 配置文件
-s Fasta file containing scaffold sequences used for extension. # 拼接Scaffolds结果
============ Extension Parameters ============
-m Minimum number of overlapping bases with the edge of the gap (default -m 29)
# 和gap边缘重叠的最小碱基数,该数值最好设置比reads的长度小一点点的数。比如150bp长度的reads,设置该值为140~149.
-o Minimum number of reads needed to call a base during an extension (default -o 2)
# 在补洞时,延伸一个碱基最小需要的reads数.
-r Percentage of reads that should have a single nucleotide extension in order to close a gap in a scaffold (Default: 0.7)
# 在补洞时,至少有该比例reads的碱基一致,才能对该碱基位点进行延伸。
-d Maximum difference between the gapsize and the number of gapclosed nucleotides. Extension is stopped if it matches this parameter gap size (default -d 50, optional).
# gap部分序列的允许的最大差异。填补gap后,若值“填补上的序列长度 - gap长度”大于该阈值,则停止补洞;若小于该阈值,则不进行融合。
-n Minimum overlap required between contigs to merge adjacent sequences in a scaffold (default -n 10, optional)
# 在一个scaffold中对邻近的两个contigs进行融合所需要最小重叠的碱基数。
-t Number of reads to trim off the start and begin of the sequence (usually missambled/low-coverage reads) (default -t 10, optional)
# 由于gap边缘的碱基大部分是低质量碱基,补洞时需要先将gap边缘该数目的碱基trim掉,作为N处理。
-i Number of iterations to fill the gaps (default -i 10, optional)
# 迭代的最大次数。
============ Bowtie Parameters ============
-g Maximum number of allowed gaps during mapping with Bowtie. Corresponds to the -v option in Bowtie. (default -g 1, optional)
============ Additional Parameters ============
-T Number of threads to run (default -T 1)
# 计算核心/线程数
-S Skip reading of the input files again
-b Base name for your output files (optional)
# 输出文件夹名
-l
参数所指向的library文件需要先行编辑好。该文件包含7列,每一列之间以空格隔开。示例如下:
Lib1 bwa file1.1.fastq file1.2.fastq 400 0.25 FR
(base) [user@server ~]# perl /usr/bio/GapFiller/GapFiller.pl -l libraries.txt -s scaffolds.fa -m 140 -T 100 -b scaffolds_gapfilled.fa
Boetzer M, Pirovano W. Toward almost closed genomes with GapFiller. Genome Biol. 2012 Jun 25;13(6):R56. DOI:10.1186/gb-2012-13-6-r56