基因组序列拼接——SPAdes

Published at 2020-05-08 14:57

Author:zhixy

View:897


简介

SPAdes是一款优秀的拼接软件,不仅支持illumina测序数据,而且还可以用于Ion Torrent测序数据,PacBio测序数据、sanger数据,Nanopore。并且可以加入其它序列拼接结果,作为辅助。

推荐安装方式:

(base) [user@server ~]# conda install -c bioconda spades

参数说明

(base) [user@server ~]# spades.py -h
SPAdes genome assembler v3.13.1

Usage: /opt/miniconda3/bin/spades.py [options] -o <output_dir>

Basic options:
-o      <output_dir>    directory to store all the resulting files (required)
--sc                    this flag is required for MDA (single-cell) data # 拼接单细胞测序数据
--meta                  this flag is required for metagenomic sample data # 拼接宏基因组测序数据
--rna                   this flag is required for RNA-Seq data # 拼接转录组测序数据
--plasmid               runs plasmidSPAdes pipeline for plasmid detection # 拼接质粒
--iontorrent            this flag is required for IonTorrent data
--test                  runs SPAdes on toy dataset
-h/--help               prints this usage message
-v/--version            prints version

Input data:
--12    <filename>      file with interlaced forward and reverse paired-end reads # PE 双末端交错的排位的reads (fastq)
-1      <filename>      file with forward paired-end reads # PE forward端reads (fastq)
-2      <filename>      file with reverse paired-end reads # PE reverse端reads (fastq)
-s      <filename>      file with unpaired reads # PE 未配对reads (fastq)
--merged        <filename>      file with merged forward and reverse paired-end reads # 合并的PE 双末端reads (fastq)
(此处省略19个参数)
--sanger        <filename>      file with Sanger reads # 与sanger测序结果混合拼接
--pacbio        <filename>      file with PacBio reads # 与PacBio测序结果混合拼接
--nanopore      <filename>      file with Nanopore reads # 与Nanopore测序结果混合拼接

Pipeline options:
--only-error-correction runs only read error correction (without assembling) # 只进行纠错
--only-assembler        runs only assembling (without read error correction) # 只进行拼接
--careful               tries to reduce number of mismatches and short indels
# 通过运行 MismatchCorrector 模块进行基因组上 mismatches 和 short indels 的修正。推荐使用此参数。
--continue              continue run from the last available check-point
(此处省略3个参数)

Advanced options:
--dataset       <filename>      file with dataset description in YAML format
-t/--threads    <int>           number of threads  [default: 16] # 计算核心/线程数
-m/--memory  <int>          RAM limit for SPAdes in Gb (terminates if exceeded) [default: 250]
                                        # SPAdes对内存的要求较高 !!!硬件允许的情况下最好设定-m 500 甚至跟高。
--tmp-dir  <dirname>       directory for temporary files [default: <output_dir>/tmp]
-k   <int,int,...>   comma-separated list of k-mer sizes (must be odd and less than 128) [default: 'auto']
                                    # Kmer长度,可设置多个:-k 33,43,55,63,73,89
--cov-cutoff    <float>         coverage cutoff value (a positive float number, or 'auto', or 'off') [default: 'off']
--phred-offset  <33 or 64>      PHRED quality offset in the input reads (33 or 64)  [default: auto-detect]

运行SPAdes

(base) [user@server ~]# spades.py -1 forward.fastq -2 reverse.fastq -s unpaired.fastq --careful -t 100 -o spades -m 500 -k 73

参考文献

Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin VM, Nikolenko SI, Pham S, Prjibelski AD, Pyshkin AV, Sirotkin AV, Vyahhi N, Tesler G, Alekseyev MA, Pevzner PA. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol. 2012 May;19(5):455-77. DOI:10.1089/cmb.2012.0021