DNA测序技术及其拼接算法综述Overview of DNA sequencing techniques and corresponding assembly algorithms
李艳慧;张少强;
摘要(Abstract):
DNA测序技术已经历了四代的发展变化,本文对这四代DNA测序技术及相关的读序拼接算法进行综述.介绍了每一代测序技术的特点,详细分析了相关读序拼接算法的主要思想和特点,并对四代测序技术进行了比较.最后分析了目前拼接算法面临的挑战,并指出了读序拼接算法的新研究方向.
关键词(KeyWords): DNA测序技术;读序;拼接算法
基金项目(Foundation): 国家自然科学基金资助项目(61572358);; 天津市自然科学基金资助项目(16JCYBJC23600)
作者(Author): 李艳慧;张少强;
Email:
DOI: 10.19638/j.issn1671-1114.20180501
参考文献(References):
- [1] LI Z,CHEN Y,MU D,et al. Comparison of the two major classes of assembly algorithms:Overlap-layout-consensus and De-Bruijn-graph[J].Briefings in Functional Genomics,2012,11(1):25-37.
- [2] BATZOGLOU S,JAFFE D B,STANLEY K,et al. ARACHNE:A wholegenome shotgun assembler[J]. Genome Research,2002,12(1):177-189.
- [3] MYERS E W,SUTTON G G,DELCHER A L,et al. A whole-genome assembly of Drosophila[J]. Science,2000,287(5461):2196-2204.
- [4] HUANG X,MADAN A. CAP3:A DNA sequence assembly program[J].Genome Research,1999,9(9):868-877.
- [5] HUANG X,YANG S P. Generating a genome assembly with PCAP[J].Current Protocols in Bioinformatics,2005,11(3):1-23.
- [6] DE LA BASTIDE M,MCCOMBIE W R. Assembling genomic DNA sequences with PHRAP[J]. Current Protocols in Bioinformatics,2007,11(4):1-15.
- [7] MULLIKIN J C,NING Z. The phusion assembler[J]. Genome Research,2003,13(1):81-90.
- [8] KIM D,LANGMEAD B,SALZBERG S L. HISAT:A fast spliced aligner with low memory requirements[J]. Nature Methods,2015,12(4):357-360.
- [9] TRAPNELL C,PACHTER L,SALZBERG S L. TopHat:Discovering splice junctions with RNA-seq[J]. Bioinformatics,2009,25(9):1105-1111.
- [10] FENG J,LI W,JIANG T. Inference of isoforms from short sequence reads[J]. Journal of Computational Biology,2011,18(3):305-321.
- [11] AU K F,JIANG H,LIN L,et al. Detection of splice junctions from paired-end RNA-seq data by SpliceMap[J]. Nucleic Acids Research,2010,38(14):4570-4578.
- [12] WANG K,SINGH D,ZENG Z,et al. MapSplice:Accurate mapping of RNA-seq reads for splice junction discovery[J]. Nucleic Acids Research,2010,38(18):178-188.
- [13] SAMMETH M,FOISSAC S,GUIG譫R. A general definition and nomenclature for alternative splicing events[J]. PLoS Computational Biology,2008,4(8):147-159.
- [14] MARETTY L,SIBBESEN J A,KROGH A. Bayesian transcriptome assembly[J]. Genome Biology,2014,15(10):1-11.
- [15] TRAPNELL C,WILLIAMS B A,PERTEA G,et al. Transcript assembly and quantification by RNA-seq reveals unannotated transcripts and isoform switching during cell differentiation[J]. Nature Biotechnology,2010,28(5):511-515.
- [16] GUTTMAN M,GARBER M,LEVIN J Z,et al. Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs[J]. Nature Biotechnology,2010,28(5):503-512.
- [17] LI W,FENG J,JIANG T. IsoLasso:a LASSO regression approach to RNA-seq based transcriptome assembly[J]. Journal of Computational Biology,2011,18(11):1693-1707.
- [18] MEZLINI A M,SMITH E J M,FIUME M,et al. i Reckon:Simultaneous isoform discovery and abundance estimation from RNA-seq data[J].Genome Research,2013,23(3):519-529.
- [19] LI W,JIANG T. Transcriptome assembly and isoform expression level estimation from biased RNA-seq reads[J]. Bioinformatics,2012,28(22):2914-2921.
- [20] TOMESCU A I,KUOSMANEN A,RIZZI R,et al. A novel min-cost flow method for estimating transcript expression with RNA-seq[C]//BMC bioinformatics. BioMed Central,2013,14(5):15.
- [21] CANZAR S,ANDREOTTI S,WEESE D,et al. CIDANE:Comprehensive isoform discovery and abundance estimation[J]. Genome Biology,2016,17(1):16-34.
- [22] SHAO M,KINGSFORD C. Accurate assembly of transcripts through phase-preserving graph decomposition[J]. Nature Biotechnology,2017,35(12):1167-1169.
- [23] HAAS B J,DELCHER A L,MOUNT S M,et al. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies[J]. Nucleic Acids Research,2003,31(19):5654-5666.
- [24] BIROL I,JACKMAN S D,NIELSEN C B,et al. De novo transcriptome assembly with ABySS[J]. Bioinformatics,2009,25(21):2872-2877.
- [25] LI R,YU C,LI Y,et al. SOAP2:An improved ultrafast tool for short read alignment[J]. Bioinformatics,2009,25(15):1966-1967.
- [26] SCHULZ M H,ZERBINO D R,VINGRON M,et al. Oases:Robust de novo RNA-seq assembly across the dynamic range of expression levels[J].Bioinformatics,2012,28(8):1086-1092.
- [27] PENG Y,LEUNG H C M,YIU S M,et al. IDBA-tran:A more robust de novo de Bruijn graph assembler for transcriptomes with uneven expression levels[J]. Bioinformatics,2013,29(13):326-334.
- [28] LIU J,LI G,CHANg Z,et al. BinPacker:Packing-based de novo transcriptome assembly from RNA-seq data[J]. PLoS Computational Biology,2016,12(2):1-15.
- [29] CHANG Z,LI G,LIU J,et al. Bridger:a new framework for de novo transcriptome assembly using RNA-seq data[J]. Genome Biology,2015,16(1):30-39.
- [30] GRABHERR M G,HAAS B J,YASSOUR M,et al. Full-length transcriptome assembly from RNA-seq data without a reference genome[J].Nature Biotechnology,2011,29(7):644-658.
- [31] LI H,HOMER N. A survey of sequence alignment algorithms for nextgeneration sequencing[J]. Briefings in Bioinformatics,2010,11(5):473-483.
- [32] KIM H,KIM J,SELBY H,et al. A short survey of computational analysis methods in analysing ChIP-seq data[J]. Human Genomics,2011,5(2):117-123.
- [33] WILBANKS E G,FACCIOTTI M T. Evaluation of algorithm performance in ChIP-seq peak detection[J]. PloS One,2010,5(7):1-12.
- [34] SZALKOWSKI A M,SCHMID C D. Rapid innovation in ChIP-seq peak-calling algorithms is outdistancing benchmarking efforts[J]. Briefings in Bioinformatics,2010,12(6):626-633.
- [35] LAAJALA T D,RAGHAV S,TUOMELa S,et al. A practical comparison of methods for detecting transcription factor binding sites in ChIPseq experiments[J]. BMC Genomics,2009,10(1):618-632.
- [36] ZAMBELLI F,PESOLE G,PAVESI G. Motif discovery and transcription factor binding sites before and after the next-generation sequencing era[J]. Briefings in Bioinformatics,2012,14(2):225-237.
- [37] LANGMEAD B,TRAPNELL C,POP M,et al. Ultrafast and memoryefficient alignment of short DNA sequences to the human genome[J].Genome Biology,2009,10(3):25-34.
- [38] LI H,DURBIN R. Fast and accurate short read alignment with BurrowsWheeler transform[J]. Bioinformatics,2009,25(14):1754-1760.
- [39] WU T D,NACU S. Fast and SNP-tolerant detection of complex variants and splicing in short reads[J]. Bioinformatics,2010,26(7):873-881.
- [40] BAILEY T,KRAJEWSKI P,LADUNGA I,et al. Practical guidelines for the comprehensive analysis of ChIP-seq data[J]. PLoS Computational Biology,2013,9(11):326-333.
- [41] ZHANG Y,LIU T,MEYER C A,et al. Model-based analysis of ChIPSeq(MACS)[J]. Genome Biology,2008,9(9):1-9.
- [42] JI H,JIANG H,MA W,et al. An integrated software system for analyzing ChIP-chip and ChIP-seq data[J]. Nature Biotechnology,2008,26(11):1293-1300.
- [43] ZANG C,SCHONES D E,ZENG C,et al. A clustering approach for identification of enriched domains from histone modification ChIP-seq data[J]. Bioinformatics,2009,25(15):1952-1958.
- [44] XU H,HANDOKO L,WEI X,et al. A signal-noise model for significance analysis of ChIP-seq with negative control[J]. Bioinformatics,2010,26(9):1199-1204.
- [45] RASHID N U,GIRESI P G,IBRAHIM J G,et al. ZINBA integrates local covariates with DNA-seq data to identify broad and narrow regions of enrichment,even within amplified genomic regions[J]. Genome Biology,2011,12(7):67-86.
- [46] BAILEY T L. DREME:Motif discovery in transcription factor ChIP-seq data[J]. Bioinformatics,2011,27(12):1653-1659.
- [47] MACHANICK P,BAILEY T L. MEME-ChIP:Motif analysis of large DNA datasets[J]. Bioinformatics,2011,27(12):1696-1697.
- [48] PAVESI G,MEREGHETTI P,MAUri G,et al. Weeder Web:Discovery of transcription factor binding sites in a set of sequences from coregulated genes[J]. Nucleic Acids Research,2004,32(suppl2):199-203.
- [49] BAILEY T L,ELKAN C. The value of prior knowledge in discovering motifs with MEME[C]//ISMB. 1995,3:21-29.
- [50] HERNANDEZ D,FRAN覶OIS P,FARINELLI L,et al. De novo bacterial genome sequencing:Millions of very short reads assembled on a desktop computer[J]. Genome Research,2008,18(5):802-809.
- [51] FLICEK P,BIRNEY E. Sense from sequence reads:Methods for alignment and assembly[J]. Nature Methods,2009,6(11):6-13.
- [52] WARREN R L,SUTTON G G,JONES S J M,et al. Assembling millions of short DNA sequences using SSAKE[J]. Bioinformatics,2006,23(4):500-501.
- [53] DOHM J C,LOTTAZ C,BORODINA T,et al. SHARCGS,a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing[J]. Genome Research,2007,17(11):1697-1706..
- [54] JECK W R,REINHARDT J A,BALTRUS D A,et al. Extending assembly of short DNA sequences to handle error[J]. Bioinformatics,2007,23(21):2942-2944.
- [55] BRYANT D W,WONG W K,MOCKLER T C. QSRA-a quality-value guided de novo short read assembler[J]. BMC Bioinformatics,2009,10(1):1-6.
- [56] CHAISSON M J,PEVZNER P A. Short read fragment assembly of bacterial genomes[J]. Genome Research,2008,18(2):324-330.
- [57] BUTLER J,MACCALLUM I,KLEBEr M,et al. ALLPATHS:De novo assembly of whole-genome shotgun microreads[J]. Genome Research,2008,18(5):810-820.
- [58] ZERBINO D R,BIRNEY E. Velvet:Algorithms for de novo short read assembly using de Bruijn graphs[J]. Genome Research,2008,18(5):821-829.
- [59] CHIN C S,ALEXANDER D H,MARKS P,et al. Nonhybrid,finished microbial genome assemblies from long-read SMRT sequencing data[J].Nature Methods,2013,10(6):563-571.
- [60] MYERS G. Efficient local alignment discovery amongst noisy long reads[C]//International Workshop on Algorithms in Bioinformatics. Springer,Berlin,Heidelberg,2014:52-67.
- [61] MIYAMOTO M,MOTOOKA D,GOTOH K,et al. Performance comparison of second-and third-generation sequencers using a bacterial genome with two chromosomes[J]. BMC Genomics,2014,15(1):699-707.
- [62] BERLIN K,KOREN S,CHIN C S,et al. Assembling large genomes with single-molecule sequencing and locality-sensitive hashing[J]. Nature Biotechnology,2015,33(6):623-630.
- [63] YE C,MA Z S. Sparc:A sparsity-based consensus algorithm for long erroneous sequencing reads[J]. Peer J,2016,4:16-27.
- [64] KOREN,SERGEY,et al. Reducing assembly complexity of microbial genomes with single-molecule sequencing[J]. Genome Biology,2013,14(9):101-116.
- [65] MYERS E W,SUTTON G G,Delcher A L,et al. A whole-genome assembly of Drosophila[J]. Science,2000,287(5461):2196-2204.
- [66] CHAISSON M J,TESLER G. Mapping single molecule sequencing reads using basic local alignment with successive refinement(BLASR):Application and theory[J]. BMC Bioinformatics,2012,13(1):238-255.
- [67] GORDON S,TSENG S,SALAMOV A,et al. Widespread polycistronic transcripts in mushroom-forming fungi revealed by single-molecule long-read mRNA sequencing[J]. PLOS ONE,2014,10(7):1-37.
- [68] SALMELA L,WALVE R,RIVALS E,et al. Accurate self-correction of errors in long reads using de Bruijn graphs[J]. Bioinformatics,2016,33(6):799-806.
- [69] AU K F,UNDERWOOD J G,LEE L,et al. Improving PacBio long read accuracy by short read alignment[J]. PLOS ONE,2012,7(10):679-686.
- [70] KOREN S,SCHATZ M C,WALENZ B P,et al. Hybrid error correction and de novo assembly of single-molecule sequencing reads[J]. Nature Biotechnology,2012,30(7):693-700.
- [71] HACKL T,HEDRICH R,SCHULTZ J,et al. Proovread:Large-scale high-accuracy PacBio correction through iterative short read consensus[J]. Bioinformatics,2014,30(21):3004-3020.
- [72] LEE H,GURTOWSKI J,YOO S,et al. Error correction and assembly complexity of single molecule sequencing reads[J]. BioRxiv,2014:006395.
- [73] DESHPANDE V,FUNG E D K,PHAM S,et al. Cerulean:A hybrid assembly using high throughput short and long reads[C]//International Workshop on Algorithms in Bioinformatics. Springer,Berlin,Heidelberg,2013:349-363.
- [74] SALMELA L,RIVALS E. LoRDEC:Accurate and efficient long read error correction[J]. Bioinformatics,2014,30(24):3506-3514.
- [75] MICLOTTE G,HEYDARI M,DEMEESTEr P,et al. Jabba:Hybrid error correction for long sequencing reads[J]. Algorithms for Molecular Biology,2016,11(1):10-21.
- [76] GOODWIN S,GURTOWSKI J,ETHE-SAYERS S,et al. Oxford nanopore sequencing and de novo assembly of a eukaryotic genome[J].BioRxiv,2015(1):1-15.
- [77] MADOUI M A,ENGELEN S,CRUAUD C,et al. Genome assembly using Nanopore-guided long and error-free DNA reads[J]. BMC Genomics,2015,16(1):327-336.
- [78] JAIN M,OLSEN H E,PATEN B,et al. The Oxford Nanopore MinION:Delivery of nanopore sequencing to the genomics community[J]. Genome Biology,2016,17(1):239-250.
- [79] WARREN R L,YANG C,VANDERVALK B P,et al. LINKS:Scalable,alignment-free scaffolding of draft genomes with long reads[J].Giga Science,2015,4(1):1-11.
- [80] BOETZER M,PIROVANO W. SSPACE-LongRead:Scaffolding bacterial draft genomes using long read sequence information[J]. BMC Bioinformatics,2014,15(1):211-219.