Plant genome assembly: Difference between revisions

Content deleted Content added

Inline

Revision as of 08:57, 13 September 2020

A plant genome assembly represents the complete genomic sequence of a plant species, which is assembled into chromosomes and other organelles by using DNA (deoxyribonucleic acid) fragments that are obtained from different types of sequencing technology.

Structure

The genome of plants can vary in their structure and complexity from small genomes like green algae (15 Mbp).^[1] to very large and complex genomes that have typically much higher ploidy, higher rates of heterozygosity and repetitive elements than species from other kingdoms.^[2] One of the most complex plant genome assemblies available is that of loblolly pine (22 Gbp).^[3] Due to their complexity, the plants’ genome sequences can't be assembled back into chromosomes using only short reads provided by next-generation- sequencing technologies (NGS),^[4]^[5] and therefore most plant genome assemblies available that used NGS alone are highly fragmented, contain large numbers of contigs, and genome regions are not finished. Highly repetitive sequences, often larger than 10kbp, are the main challenge in plants.^[6]^[7] Most of the chromosomal sequences are produced by the activity of mobile genetic elements (MGEs) in the plant genomes.^[8] MGEs are divided into two classes: class I or retrotransposons, and class II or DNA transposons. In plants, long- terminal repeat (LTR) retrotransposons are predominant and constitute from 15%^[9] to 90% of the genome.^[10] Polyploidy is another challenge in assembling a plant genome, and it is estimated that ~80% of plants are polyploids.^[11]

Assemblies

The first complete plant genome assembly, that of Arabidopsis thaliana, was finished in 2000,^[12] being the third multicellular eukaryotic genome published after C. elegans^[13] and D. melanogaster.^[14] Arabidopsis, unlike other plants’ genomes (e.g. Malus) has convenient traits, such as a small nuclear genome (135Mbp) and a short generation time (8 weeks from seed to seed). The genome has five chromosomes reflecting approximately 4% of the human genome size. The genome was sequenced and annotated by the Arabidopsis Genome Initiative (AGI).

The initiative for sequencing the genome of rice (Oryza sativa),^[15] began in September 1997, when scientists from many nations agreed to an international collaboration to sequence the rice genome, forming “The International Rice Genome Sequencing Project” (IRGSP). At an estimated size between 400-430 Mb, approximatively four times larger in dimensions than A. thaliana, rice has the smallest of the major cereal crop genomes.^[15]

Between 2000 and 2008 in total 10 plant genomes were published while in 2012 alone, 13 plant genomes were published. Since then the number was constantly increasing, and now more than 400 plant genomes are available in the NCBI genome database, of which 72 were re-annotated [NCBI].

Databases

EnsemblPlants^[16] is part of EnsemblGenome database and contains resources for a reduced number of sequenced plant species (45, Oct. 2017). It mainly provides genome sequences, gene models, functional annotations and polymorphic loci. For some of the plant species, additional information is provided including population structure, individual genotypes, linkage, and phenotype data.

Gramene^[17] is an online web database resource for plant comparative genomics and pathway analysis based on Ensembl technology.

Plant Genome DataBase Japan^[18] (PGDBj) is a website that contains information related to genomes of model and crop plants from databases. It has three main components: ortholog db, DNA marker and linkage map db, and plant resource db, where multiple plant resources accumulated by different institutes are integrated. The aim is “to provide a platform, enabling comparative searches of different resources” (pgdbj.jp).

PlantsDB^[19] is a resource for analysing and storing genetic and genomic information from various plants, and offers tools to query these data and to perform comparative analysis with the help of in-house tools.

PLAZA^[20] is another online resource for comparative genomics that integrates plant sequence data and comparative genomic methods, and performs evolutionary analysis within the green plant lineage (Viridiplantae).

The Arabidopsis Information Resource (TAIR)^[21] maintains a web database of the “model higher plant Arabidopsis Thaliana “.

Assembly strategies

In general, for sequencing and assembling large and complex genomes like plants, different strategies are used, based on the technologies available at that time when the project started.

Sanger clone-by-clone

Clone-by-clone sequencing strategies are based on the construction of a map for each chromosome before the sequencing, and rely on libraries made from large-insert clones. The most common type of large-insert clone is the bacterial artificial chromosome (BAC).

With BAC, the genome is first split into smaller pieces with the location recorded. The pieces of DNA are then inserted into BAC clones that are further multiplied by inserting them into bacterial cells that grow very fast. These pieces are further fragmented into overlapping smaller pieces that are placed into a vector and then sequenced. The small pieces are then assembled into contigs by overlapping them. Next, using the map from the first step the contigs are assembled back into the chromosomes.

The first complete plant genome assembly (also the first plant genome published) that used this type of technique was Arabidopsis thaliana, in 2000.^[12] Different large-insert libraries like BACs, P1 artificial chromosomes (PAC), yeast artificial chromosome (YAC) and transformation-competent artificial chromosomes (TACs) were combined to assemble the genome. From clones with restriction fragment fingerprint, by comparison of the patterns and hybridization or polymerase chain reaction (PCR) the physical maps were constructed. The physical maps were integrated together with genetic maps to identify contig positions and orientations. End sequences from 47,788 BAC clones were used to extend contigs from anchored BACs and to select a minimum tiling path. A total of 1,569 clones found in minimum tiling path were selected and sequenced. Direct PCR products were used to clone remaining gaps, and YACs allowed the characterization of telomere sequences. The resulting sequenced regions were 115.4 Mb of the 125 Mb predicted size of the genome and a total of 25,498 of protein-coding genes.

To sequence and assemble the genome of Oryza Sativa (japonica),^[15] the same strategy was used. For Oryza Sativa a total of 3,401 mapped clones in a minimum tiling path were selected from the physical map and assembled.

One of the most important crops in the world, maize (Zea mays), is the last plant genome project primarily based on Sanger BAC-by-BAC strategy.^[22] The genome size of Maize, 2.3 Gb and 10 chromosomes,^[22] is significantly larger than that of rice and Arabidopsis.^[22] To assemble the genome of maize a set of 16,848 minimally overlapping BAC clones derived

from combinations of physical and genetic map were selected and sequenced. The assembly on maize was performed in addition with external information data. The data was obtained from cDNA and sequences from libraries with methyl-filtered DNA (libraries that uses the knowledge that the bases in genic sequences tends to be less heavily methylated than those in non-genic regions) and high C0 t techniques.

Sanger clone-by-clone strategy has the advantage of working in small units, which reduces the complexity and computational requirements, as well as minimized problems associated with the misassembly of highly repetitive DNA and therefore is an attractive solution in assembling plant genomes and other complex eukaryotic genomes. The main disadvantages of this method are the costs and the resources required. The cost of the first plant genome assemblies was estimated between 70 million dollars^[23] and 200 million dollars per assembly.^[24]

Sanger whole-genome shotgun (WGS)

In the WGS sequencing technology there is no order for the fragments that are sequenced. The DNA is randomly sheared and cloned fragments are sequenced and assembled using computational methods. This technology reduced the cost and the time associated with construction of the maps and relies on computational resources.

A considerable number of important plant genomes like grapevine (Vitis Vinifer),^[25] papaya (Carica papaya),^[26] and cottonwood (Populus trichocarpa)^[27] were sequenced and assembled with Sanger WGS strategy.

The draft genome of grapevine^[25] is the fourth genome published for a flowering plant and the first from a fruit crop. The sequences of the genome were obtained from different types of libraries, like plasmids, fosmids and BACs. All the data were generated by paired-end sequencing of cloned insert using Sanger technology on ABI3730x1 sequencers. To assemble the reads, Arachne, 2002,^[28] a software designed to analyze reads obtained from both ends of plasmid clones, was used. In total 6.2 million paired-end tag reads were produced. The software produced 20.784 contigs that were combined into 3,830 supercontigs, having an N50 value of 64kb. Supercontigs had a total size of 498 Mb.

The anchorage of the supercontigs along the genome was performed first by joining supercontigs together using paired BAC end sequences. The resulting ultracontigs and the remained supercontigs were then aligned along the genetic map of the genome. Later improvements of this strategy enabled the sequencing of Brachypodium distachyon,^[29] Sorghum bicolor^[30] and soybean.^[31]

Next-generation sequencing

Due to its relatively cheap cost in comparison to previous methods, most of the recent plant genomes were sequenced and assembled using data from NGS (next-generation- sequencing) technology. In general the NGS data are used in combination with Sanger Sequencing technology or long-reads obtained from the third generation sequencing. The genome of the cucumber, (Cucumis sativus),^[32] was one of the plant genomes that used the NGS Illumina reads in combination with Sanger sequences. 72.2-fold genome coverage high quality base pairs were generated from which 3.9-fold coverage was provided from Sanger and the Illumina GA reads provided 68.3-fold coverage. From this two assemblies were produced based on the sequencing technology. The resulting contigs were compared between them, resulting in a total length of the assembled genome of 243.5 Mb. The result is about 30% smaller than the genome size estimated by flow cytometry of isolated nuclei stained with propidium iodide (367 Mb). A genetic map was constructed to anchor the assembled genome. 72.8% of the assembled sequences were successfully anchored onto the seven chromosomes. Another plant genome that combined NGS with Sanger sequencing was the genome of Theobroma cacao, 2010,^[33] an economically important tropical fruit tree crop. The genome was sequenced in a consortium, “The International Cocoa Genome Sequencing consortium (ICGS) “ and produced a total of 17.6 million 454 single end reads, 8.8 million 454 paired-end reads, 398.0 million Illumina paired-end reads and about 88,000 Sanger BAC reads. First by using genome assembly software, Newbler, an assembly was produced with 25,912 contigs and 4,792 scaffolds from the reads obtained from Roche/454 and Sanger raw data. This had a total length of 326.9 Mb, which represents 76% of the estimated genome size. The Illumina reads were used to complement the 454 assembly, by aligning the short reads on the cocoa genome assembly using the SOAP software. A similar strategy that combined NGS reads and Sanger Sequencing was used for other important plant species like the first published apple genome (Malus domestica),^[34] cotton (Gossypium Raimond),^[35] draft genome of sweet orange (Citrus sinensis)^[36] and the domesticated tomato (Solanum lycopersicum) genome^[37]

Third-generation

With the emergence of third-generation sequencing (TGS) some of the limitations from previous methods of sequencing and assembling plant genomes have started to be addressed. This technology is characterized by the parallel sequencing of single molecules of DNA, that results in sequences up to 54 kbp length (PacBio RS 2).^[38] In general, long reads from TGS have relatively high error rates (~10% on average)^[39] and therefore repeated sequencing of the same DNA fragments is required. The price of such technology is still quite high and therefore is generally used in combination with short reads from NGS. One of the first plant genome that used long-reads from TGS, Pacific Biosciences in combination with short reads from NGS was the genome of Spinach^[40] having a genome size estimated at 989 Mb. For this, a 60× coverage of the genome was generated, with 20% of the reads larger than 20 kb. Data were assembled using PacBio’s hierarchical genome assembly process (HGAP),^[41] and showed that long-read assemblies revealed a 63-fold improvement in contig size over an Illumina-only assembly. Another plant genome that was recently published that used long reads in combination with short reads is the improved assembly of the apple genome.^[42] In this project a hybrid approach was used, combining different data types from sequencing technologies. The sequences used came from: PacBio RS II, Illumina paired-end reads (PE) and Illumina mate- pair reads (MP). As a first step an assembly from Illumina paired-end reads was performed using a well-known de novo assembly software SOAPdevo.^[43] Then using a hybrid assembly pipeline DBG2OLC.^[44] the contigs obtained at the first step and the long reads from PacBio were combined. The assembly was then polished with the help of Illumina paired-end reads by mapping them to the contigs using BWA-MEM.^[45] By mapping the mate-pair reads on the corrected contigs they scaffold the assembly. Further BioNano (https://bionanogenomics.com/) optical mapping analysis with a total length of 649.7 Mb, were used in the hybrid assembly pipeline together with the scaffolds obtained from the previous step. The resulting scaffolds were anchored to a genetic map constructed from 15,417 single-nucleotide polymorphisms (SNPs) markers. For better understanding of the number and diversity of genes that were identified, ribonucleic acid RNA-seq, were used. The resulted genome has a dimension of 643.2 Mb getting closer to the estimated genome size than the previous published assembly^[34] and a smaller number of protein-coding- genes.

The use of long reads in the plant genome assemblies became more popular, for reducing the number of scaffolds and increasing the quality of the genome by improving the assembly and coverage in regions that are not clearly defined by NGS assembly.

References

^ Moreau H, Verhelst B, Couloux A, Derelle E, Rombauts S, Grimsley N, Van Bel M, Poulain J, Katinka M, Hohmann-Marriott MF, Piganeau G, Rouzé P, Da Silva C, Wincker P, Van de Peer Y, Vandepoele K (August 2012). "Gene functionalities and genome structure in Bathycoccus prasinos reflect cellular specializations at the base of the green lineage". Genome Biology. 13 (8): R74. doi:10.1186/gb-2012-13-8-r74. PMC 3491373. PMID 22925495.{{cite journal}}: CS1 maint: unflagged free DOI (link)
^ Gregory TR (January 2005). "The C-value enigma in plants and animals: a review of parallels and an appeal for partnership". Annals of Botany. 95 (1): 133–46. doi:10.1093/aob/mci009. PMC 4246714. PMID 15596463.
^ Zimin A, Stevens KA, Crepeau MW, Holtz-Morris A, Koriabine M, Marçais G, Puiu D, Roberts M, Wegrzyn JL, de Jong PJ, Neale DB, Salzberg SL, Yorke JA, Langley CH (March 2014). "Sequencing and assembly of the 22-gb loblolly pine genome". Genetics. 196 (3): 875–90. doi:10.1534/genetics.113.159715. PMC 3948813. PMID 24653210.
^ Deschamps, Stéphane; Campbell, Matthew A. (2010-04-01). "Utilization of next-generation sequencing platforms in plant genomics and genetic variant discovery". Molecular Breeding. 25 (4): 553–570. doi:10.1007/s11032-009-9357-9. S2CID 29239452. {{cite journal}}: Unknown parameter |name-list-format= ignored (|name-list-style= suggested) (help)
^ Shendure J, Ji H (October 2008). "Next-generation DNA sequencing". Nature Biotechnology. 26 (10): 1135–45. doi:10.1038/nbt1486. PMID 18846087. S2CID 6384349.
^ Treangen TJ, Salzberg SL (November 2011). "Repetitive DNA and next-generation sequencing: computational challenges and solutions". Nature Reviews. Genetics. 13 (1): 36–46. doi:10.1038/nrg3117. PMC 3324860. PMID 22124482.
^ Harrison GE, Heslop-Harrison JS (February 1995). "Centromeric repetitive DNA sequences in the genus Brassica". TAG. Theoretical and Applied Genetics. Theoretische und Angewandte Genetik. 90 (2): 157–65. doi:10.1007/BF00222197. PMID 24173886. S2CID 20591213.
^ Lanciano S, Carpentier MC, Llauro C, Jobet E, Robakowska-Hyzorek D, Lasserre E, Ghesquière A, Panaud O, Mirouze M (February 2017). "Sequencing the extrachromosomal circular mobilome reveals retrotransposon activity in plants". PLOS Genetics. 13 (2): e1006630. doi:10.1371/journal.pgen.1006630. PMC 5338827. PMID 28212378.{{cite journal}}: CS1 maint: unflagged free DOI (link)
^ Michael TP, VanBuren R (April 2015). "Progress, challenges and the future of crop genomes". Current Opinion in Plant Biology. 24: 71–81. doi:10.1016/j.pbi.2015.02.002. PMID 25703261.
^ Flavell RB, Gale MD, O'dell M, Murphy G, Moore G, Lucas H (1993). "Molecular organization of genes and repeats in the large cereal genomes and implications for the isolation of genes by chromosome walking". Chromosomes Today. Dordrecht: Springer. pp. 199–213. doi:10.1007/978-94-011-1510-0_16. ISBN 9789401046602.
^ Meyers LA, Levin DA, Geber M (2006-06-01). "On the abundance of polyploids in flowering plants". Evolution. 60 (6): 1198–1206. doi:10.1554/05-629.1. PMID 16892970. S2CID 198156503.
^ ^a ^b "Analysis of the genome sequence of the flowering plant Arabidopsis thaliana". Nature. 408 (6814): 796–815. December 2000. doi:10.1038/35048692. PMID 11130711.
^ The C. elegans Sequencing Consortium (1998). "Genome Sequence of the Nematode C. elegans: A Platform for Investigating Biology". Science. 282 (5396): 2012–2018. doi:10.1126/science.282.5396.2012. JSTOR 2897605. PMID 9851916.
^ Adams MD, Celniker SE, Holt RA, Evans CA, Gocayne JD, Amanatides PG, et al. (March 2000). "The genome sequence of Drosophila melanogaster". Science. 287 (5461): 2185–95. CiteSeerX 10.1.1.549.8639. doi:10.1126/science.287.5461.2185. PMID 10731132.
^ ^a ^b ^c Goff SA, Ricke D, Lan TH, Presting G, Wang R, Dunn M, et al. (April 2002). "A draft sequence of the rice genome (Oryza sativa L. ssp. japonica)". Science. 296 (5565): 92–100. doi:10.1126/science.1068275. PMID 11935018. S2CID 2960202.
^ Bolser, Dan; Staines, Daniel M.; Pritchard, Emily; Kersey, Paul (2016). Plant Bioinformatics. Methods in Molecular Biology. Vol. 1374. Humana Press, New York, NY. pp. 115–140. doi:10.1007/978-1-4939-3167-5_6. ISBN 9781493931668. PMID 26519403. {{cite book}}: Unknown parameter |name-list-format= ignored (|name-list-style= suggested) (help)
^ Gupta P, Naithani S, Tello-Ruiz MK, Chougule K, D'Eustachio P, Fabregat A, et al. (November 2016). "Gramene Database: Navigating Plant Comparative Genomics Resources". Current Plant Biology. 7–8: 10–15. doi:10.1016/j.cpb.2016.12.005. PMC 5509230. PMID 28713666.
^ Nakaya, Akihiro; Ichihara, Hisako; Asamizu, Erika; Shirasawa, Sachiko; Nakamura, Yasukazu; Tabata, Satoshi; Hirakawa, Hideki (2017). Plant Genomics Databases. Methods in Molecular Biology. Vol. 1533. Humana Press, New York, NY. pp. 45–77. doi:10.1007/978-1-4939-6658-5_3. ISBN 9781493966561. PMID 27987164. {{cite book}}: Unknown parameter |name-list-format= ignored (|name-list-style= suggested) (help)
^ Spannagl, Manuel; Nussbaumer, Thomas; Bader, Kai; Gundlach, Heidrun; Mayer, Klaus F. X. (2017). Plant Genomics Databases. Methods in Molecular Biology. Vol. 1533. Humana Press, New York, NY. pp. 33–44. doi:10.1007/978-1-4939-6658-5_2. ISBN 9781493966561. PMID 27987163. {{cite book}}: Unknown parameter |name-list-format= ignored (|name-list-style= suggested) (help)
^ Vandepoele, Klaas (2017). "A Guide to the PLAZA 3.0 Plant Comparative Genomic Database". Plant Genomics Databases. Methods in Molecular Biology. Vol. 1533. Humana Press, New York, NY. pp. 183–200. doi:10.1007/978-1-4939-6658-5_10. ISBN 9781493966561. PMID 27987171. {{cite book}}: Unknown parameter |name-list-format= ignored (|name-list-style= suggested) (help)
^ Reiser L, Berardini TZ, Li D, Muller R, Strait EM, Li Q, Mezheritsky Y, Vetushko A, Huala E (2016-01-01). "Sustainable funding for biocuration: The Arabidopsis Information Resource (TAIR) as a case study of a subscription-based funding model". Database. 2016: baw018. doi:10.1093/database/baw018. PMC 4795935. PMID 26989150.
^ ^a ^b ^c Schnable PS, Ware D, Fulton RS, Stein JC, Wei F, Pasternak S, et al. (November 2009). "The B73 maize genome: complexity, diversity, and dynamics". Science. 326 (5956): 1112–5. doi:10.1126/science.1178534. PMID 19965430. S2CID 21433160.
^ Feuillet C, Leach JE, Rogers J, Schnable PS, Eversole K (February 2011). "Crop genome sequencing: lessons and rationales". Trends in Plant Science. 16 (2): 77–88. doi:10.1016/j.tplants.2010.10.005. PMID 21081278.
^ Saegusa A (April 1999). "US firm's bid to sequence rice genome causes stir in Japan". Nature. 398 (6728): 545. doi:10.1038/19123. PMID 10217128.
^ ^a ^b Jaillon O, Aury JM, Noel B, Policriti A, Clepet C, Casagrande A, et al. (September 2007). "The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla". Nature. 449 (7161): 463–7. doi:10.1038/nature06148. PMID 17721507.
^ Ming R, Hou S, Feng Y, Yu Q, Dionne-Laporte A, Saw JH, et al. (April 2008). "The draft genome of the transgenic tropical fruit tree papaya (Carica papaya Linnaeus)". Nature. 452 (7190): 991–6. doi:10.1038/nature06856. PMC 2836516. PMID 18432245.
^ Tuskan GA, Difazio S, Jansson S, Bohlmann J, Grigoriev I, Hellsten U, et al. (September 2006). "The genome of black cottonwood, Populus trichocarpa (Torr. & Gray)". Science (Submitted manuscript). 313 (5793): 1596–604. doi:10.1126/science.1128691. PMID 16973872. S2CID 7717980.
^ Swan KA, Curtis DE, McKusick KB, Voinov AV, Mapa FA, Cancilla MR (July 2002). "High-throughput gene mapping in Caenorhabditis elegans". Genome Research. 12 (7): 1100–5. doi:10.1101/gr.208902. PMC 186621. PMID 12097347.
^ International Brachypodium Initiative (February 2010). "Genome sequencing and analysis of the model grass Brachypodium distachyon". Nature. 463 (7282): 763–8. doi:10.1038/nature08747. PMID 20148030.
^ Paterson AH, Bowers JE, Bruggmann R, Dubchak I, Grimwood J, Gundlach H, et al. (January 2009). "The Sorghum bicolor genome and the diversification of grasses". Nature. 457 (7229): 551–6. doi:10.1038/nature07723. PMID 19189423.
^ Schmutz, Jeremy (2009). "Genome sequence of the palaeopolyploid soybean". Nature. 463 (7278): 178–83. doi:10.1038/nature08670. PMID 20075913. S2CID 4372224. {{cite journal}}: Unknown parameter |name-list-format= ignored (|name-list-style= suggested) (help)
^ Huang S, Li R, Zhang Z, Li L, Gu X, Fan W, et al. (December 2009). "The genome of the cucumber, Cucumis sativus L". Nature Genetics. 41 (12): 1275–81. doi:10.1038/ng.475. PMID 19881527.
^ Argout X, Salse J, Aury JM, Guiltinan MJ, Droc G, Gouzy J, et al. (February 2011). "The genome of Theobroma cacao". Nature Genetics. 43 (2): 101–8. doi:10.1038/ng.736. PMID 21186351. S2CID 4685532.
^ ^a ^b Velasco R, Zharkikh A, Affourtit J, Dhingra A, Cestaro A, Kalyanaraman A, et al. (October 2010). "The genome of the domesticated apple (Malus × domestica Borkh.)". Nature Genetics. 42 (10): 833–9. doi:10.1038/ng.654. PMID 20802477.
^ Wang K, Wang Z, Li F, Ye W, Wang J, Song G, et al. (October 2012). "The draft genome of a diploid cotton Gossypium raimondii". Nature Genetics. 44 (10): 1098–103. doi:10.1038/ng.2371. PMID 22922876. S2CID 38495587.
^ Xu Q, Chen LL, Ruan X, Chen D, Zhu A, Chen C, et al. (January 2013). "The draft genome of sweet orange (Citrus sinensis)". Nature Genetics. 45 (1): 59–66. doi:10.1038/ng.2472. PMID 23179022.
^ Tomato Genome Consortium (May 2012). "The tomato genome sequence provides insights into fleshy fruit evolution". Nature. 485 (7400): 635–41. doi:10.1038/nature11119. PMC 3378239. PMID 22660326.
^ Bleidorn, Christoph (2015). "Third generation sequencing: technology and its potential impact on evolutionary biodiversity research". Systematics and Biodiversity. {{cite journal}}: Unknown parameter |name-list-format= ignored (|name-list-style= suggested) (help)
^ Lee, Hayan; Gurtowski, James; Yoo, Shinjae; Nattestad, Maria; Marcus, Shoshana; Goodwin, Sara; McCombie, W. Richard; Schatz, Michael (2016-04-13). "Third-generation sequencing and the future of genomics". bioRxiv: 048603. doi:10.1101/048603. {{cite journal}}: Unknown parameter |name-list-format= ignored (|name-list-style= suggested) (help)
^ van Deynze A (2015). "Using spinach to compare technologies for whole genome assemblies". Plant & Animal Genomics XXIII Conference.
^ Chin CS, Alexander DH, Marks P, Klammer AA, Drake J, Heiner C, Clum A, Copeland A, Huddleston J, Eichler EE, Turner SW, Korlach J (June 2013). "Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data". Nature Methods. 10 (6): 563–9. doi:10.1038/nmeth.2474. PMID 23644548. S2CID 205421576.
^ Daccord N, Celton JM, Linsmith G, Becker C, Choisne N, Schijlen E, et al. (July 2017). "High-quality de novo assembly of the apple genome and methylome dynamics of early fruit development" (PDF). Nature Genetics. 49 (7): 1099–1106. doi:10.1038/ng.3886. PMID 28581499. S2CID 24690391.
^ Luo, Ruibang (2012). "SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler". Gigascience. 1. doi:10.1186/2047-217X-1-18. PMID 23587118. S2CID 2681931. {{cite journal}}: Unknown parameter |name-list-format= ignored (|name-list-style= suggested) (help)CS1 maint: unflagged free DOI (link)
^ Ye C, Hill CM, Wu S, Ruan J, Ma ZS (August 2016). "DBG2OLC: Efficient Assembly of Large Genomes Using Long Erroneous Reads of the Third Generation Sequencing Technologies". Scientific Reports. 6 (1): 31900. doi:10.1038/srep31900. PMC 5004134. PMID 27573208.
^ Li, Heng (2013). "Aligning sequence reads, clone sequences and assembly contigs with BWA- MEM". Broad Institute of Harvard and MIT. arXiv:1303.3997. {{cite journal}}: Unknown parameter |name-list-format= ignored (|name-list-style= suggested) (help)

[1] Moreau H, Verhelst B, Couloux A, Derelle E, Rombauts S, Grimsley N, Van Bel M, Poulain J, Katinka M, Hohmann-Marriott MF, Piganeau G, Rouzé P, Da Silva C, Wincker P, Van de Peer Y, Vandepoele K (August 2012). "Gene functionalities and genome structure in Bathycoccus prasinos reflect cellular specializations at the base of the green lineage". Genome Biology. 13 (8): R74. doi:10.1186/gb-2012-13-8-r74. PMC 3491373. PMID 22925495.{{cite journal}}: CS1 maint: unflagged free DOI (link)

[2] Gregory TR (January 2005). "The C-value enigma in plants and animals: a review of parallels and an appeal for partnership". Annals of Botany. 95 (1): 133–46. doi:10.1093/aob/mci009. PMC 4246714. PMID 15596463.

[3] Zimin A, Stevens KA, Crepeau MW, Holtz-Morris A, Koriabine M, Marçais G, Puiu D, Roberts M, Wegrzyn JL, de Jong PJ, Neale DB, Salzberg SL, Yorke JA, Langley CH (March 2014). "Sequencing and assembly of the 22-gb loblolly pine genome". Genetics. 196 (3): 875–90. doi:10.1534/genetics.113.159715. PMC 3948813. PMID 24653210.

[4] Deschamps, Stéphane; Campbell, Matthew A. (2010-04-01). "Utilization of next-generation sequencing platforms in plant genomics and genetic variant discovery". Molecular Breeding. 25 (4): 553–570. doi:10.1007/s11032-009-9357-9. S2CID 29239452. {{cite journal}}: Unknown parameter |name-list-format= ignored (|name-list-style= suggested) (help)

[5] Shendure J, Ji H (October 2008). "Next-generation DNA sequencing". Nature Biotechnology. 26 (10): 1135–45. doi:10.1038/nbt1486. PMID 18846087. S2CID 6384349.

[6] Treangen TJ, Salzberg SL (November 2011). "Repetitive DNA and next-generation sequencing: computational challenges and solutions". Nature Reviews. Genetics. 13 (1): 36–46. doi:10.1038/nrg3117. PMC 3324860. PMID 22124482.

[7] Harrison GE, Heslop-Harrison JS (February 1995). "Centromeric repetitive DNA sequences in the genus Brassica". TAG. Theoretical and Applied Genetics. Theoretische und Angewandte Genetik. 90 (2): 157–65. doi:10.1007/BF00222197. PMID 24173886. S2CID 20591213.

[8] Lanciano S, Carpentier MC, Llauro C, Jobet E, Robakowska-Hyzorek D, Lasserre E, Ghesquière A, Panaud O, Mirouze M (February 2017). "Sequencing the extrachromosomal circular mobilome reveals retrotransposon activity in plants". PLOS Genetics. 13 (2): e1006630. doi:10.1371/journal.pgen.1006630. PMC 5338827. PMID 28212378.{{cite journal}}: CS1 maint: unflagged free DOI (link)

[9] Michael TP, VanBuren R (April 2015). "Progress, challenges and the future of crop genomes". Current Opinion in Plant Biology. 24: 71–81. doi:10.1016/j.pbi.2015.02.002. PMID 25703261.

[10] Flavell RB, Gale MD, O'dell M, Murphy G, Moore G, Lucas H (1993). "Molecular organization of genes and repeats in the large cereal genomes and implications for the isolation of genes by chromosome walking". Chromosomes Today. Dordrecht: Springer. pp. 199–213. doi:10.1007/978-94-011-1510-0_16. ISBN 9789401046602.

[11] Meyers LA, Levin DA, Geber M (2006-06-01). "On the abundance of polyploids in flowering plants". Evolution. 60 (6): 1198–1206. doi:10.1554/05-629.1. PMID 16892970. S2CID 198156503.

[:0-12] "Analysis of the genome sequence of the flowering plant Arabidopsis thaliana". Nature. 408 (6814): 796–815. December 2000. doi:10.1038/35048692. PMID 11130711.

[13] The C. elegans Sequencing Consortium (1998). "Genome Sequence of the Nematode C. elegans: A Platform for Investigating Biology". Science. 282 (5396): 2012–2018. doi:10.1126/science.282.5396.2012. JSTOR 2897605. PMID 9851916.

[14] Adams MD, Celniker SE, Holt RA, Evans CA, Gocayne JD, Amanatides PG, et al. (March 2000). "The genome sequence of Drosophila melanogaster". Science. 287 (5461): 2185–95. CiteSeerX 10.1.1.549.8639. doi:10.1126/science.287.5461.2185. PMID 10731132.

[:1-15] Goff SA, Ricke D, Lan TH, Presting G, Wang R, Dunn M, et al. (April 2002). "A draft sequence of the rice genome (Oryza sativa L. ssp. japonica)". Science. 296 (5565): 92–100. doi:10.1126/science.1068275. PMID 11935018. S2CID 2960202.

[16] Bolser, Dan; Staines, Daniel M.; Pritchard, Emily; Kersey, Paul (2016). Plant Bioinformatics. Methods in Molecular Biology. Vol. 1374. Humana Press, New York, NY. pp. 115–140. doi:10.1007/978-1-4939-3167-5_6. ISBN 9781493931668. PMID 26519403. {{cite book}}: Unknown parameter |name-list-format= ignored (|name-list-style= suggested) (help)

[17] Gupta P, Naithani S, Tello-Ruiz MK, Chougule K, D'Eustachio P, Fabregat A, et al. (November 2016). "Gramene Database: Navigating Plant Comparative Genomics Resources". Current Plant Biology. 7–8: 10–15. doi:10.1016/j.cpb.2016.12.005. PMC 5509230. PMID 28713666.

[18] Nakaya, Akihiro; Ichihara, Hisako; Asamizu, Erika; Shirasawa, Sachiko; Nakamura, Yasukazu; Tabata, Satoshi; Hirakawa, Hideki (2017). Plant Genomics Databases. Methods in Molecular Biology. Vol. 1533. Humana Press, New York, NY. pp. 45–77. doi:10.1007/978-1-4939-6658-5_3. ISBN 9781493966561. PMID 27987164. {{cite book}}: Unknown parameter |name-list-format= ignored (|name-list-style= suggested) (help)

[19] Spannagl, Manuel; Nussbaumer, Thomas; Bader, Kai; Gundlach, Heidrun; Mayer, Klaus F. X. (2017). Plant Genomics Databases. Methods in Molecular Biology. Vol. 1533. Humana Press, New York, NY. pp. 33–44. doi:10.1007/978-1-4939-6658-5_2. ISBN 9781493966561. PMID 27987163. {{cite book}}: Unknown parameter |name-list-format= ignored (|name-list-style= suggested) (help)

[20] Vandepoele, Klaas (2017). "A Guide to the PLAZA 3.0 Plant Comparative Genomic Database". Plant Genomics Databases. Methods in Molecular Biology. Vol. 1533. Humana Press, New York, NY. pp. 183–200. doi:10.1007/978-1-4939-6658-5_10. ISBN 9781493966561. PMID 27987171. {{cite book}}: Unknown parameter |name-list-format= ignored (|name-list-style= suggested) (help)

[21] Reiser L, Berardini TZ, Li D, Muller R, Strait EM, Li Q, Mezheritsky Y, Vetushko A, Huala E (2016-01-01). "Sustainable funding for biocuration: The Arabidopsis Information Resource (TAIR) as a case study of a subscription-based funding model". Database. 2016: baw018. doi:10.1093/database/baw018. PMC 4795935. PMID 26989150.

[:2-22] Schnable PS, Ware D, Fulton RS, Stein JC, Wei F, Pasternak S, et al. (November 2009). "The B73 maize genome: complexity, diversity, and dynamics". Science. 326 (5956): 1112–5. doi:10.1126/science.1178534. PMID 19965430. S2CID 21433160.

[23] Feuillet C, Leach JE, Rogers J, Schnable PS, Eversole K (February 2011). "Crop genome sequencing: lessons and rationales". Trends in Plant Science. 16 (2): 77–88. doi:10.1016/j.tplants.2010.10.005. PMID 21081278.

[24] Saegusa A (April 1999). "US firm's bid to sequence rice genome causes stir in Japan". Nature. 398 (6728): 545. doi:10.1038/19123. PMID 10217128.

[:3-25] Jaillon O, Aury JM, Noel B, Policriti A, Clepet C, Casagrande A, et al. (September 2007). "The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla". Nature. 449 (7161): 463–7. doi:10.1038/nature06148. PMID 17721507.

[26] Ming R, Hou S, Feng Y, Yu Q, Dionne-Laporte A, Saw JH, et al. (April 2008). "The draft genome of the transgenic tropical fruit tree papaya (Carica papaya Linnaeus)". Nature. 452 (7190): 991–6. doi:10.1038/nature06856. PMC 2836516. PMID 18432245.

[27] Tuskan GA, Difazio S, Jansson S, Bohlmann J, Grigoriev I, Hellsten U, et al. (September 2006). "The genome of black cottonwood, Populus trichocarpa (Torr. & Gray)". Science (Submitted manuscript). 313 (5793): 1596–604. doi:10.1126/science.1128691. PMID 16973872. S2CID 7717980.

[28] Swan KA, Curtis DE, McKusick KB, Voinov AV, Mapa FA, Cancilla MR (July 2002). "High-throughput gene mapping in Caenorhabditis elegans". Genome Research. 12 (7): 1100–5. doi:10.1101/gr.208902. PMC 186621. PMID 12097347.

[29] International Brachypodium Initiative (February 2010). "Genome sequencing and analysis of the model grass Brachypodium distachyon". Nature. 463 (7282): 763–8. doi:10.1038/nature08747. PMID 20148030.

[30] Paterson AH, Bowers JE, Bruggmann R, Dubchak I, Grimwood J, Gundlach H, et al. (January 2009). "The Sorghum bicolor genome and the diversification of grasses". Nature. 457 (7229): 551–6. doi:10.1038/nature07723. PMID 19189423.

[31] Schmutz, Jeremy (2009). "Genome sequence of the palaeopolyploid soybean". Nature. 463 (7278): 178–83. doi:10.1038/nature08670. PMID 20075913. S2CID 4372224. {{cite journal}}: Unknown parameter |name-list-format= ignored (|name-list-style= suggested) (help)

[32] Huang S, Li R, Zhang Z, Li L, Gu X, Fan W, et al. (December 2009). "The genome of the cucumber, Cucumis sativus L". Nature Genetics. 41 (12): 1275–81. doi:10.1038/ng.475. PMID 19881527.

[33] Argout X, Salse J, Aury JM, Guiltinan MJ, Droc G, Gouzy J, et al. (February 2011). "The genome of Theobroma cacao". Nature Genetics. 43 (2): 101–8. doi:10.1038/ng.736. PMID 21186351. S2CID 4685532.

[:4-34] Velasco R, Zharkikh A, Affourtit J, Dhingra A, Cestaro A, Kalyanaraman A, et al. (October 2010). "The genome of the domesticated apple (Malus × domestica Borkh.)". Nature Genetics. 42 (10): 833–9. doi:10.1038/ng.654. PMID 20802477.

[35] Wang K, Wang Z, Li F, Ye W, Wang J, Song G, et al. (October 2012). "The draft genome of a diploid cotton Gossypium raimondii". Nature Genetics. 44 (10): 1098–103. doi:10.1038/ng.2371. PMID 22922876. S2CID 38495587.

[36] Xu Q, Chen LL, Ruan X, Chen D, Zhu A, Chen C, et al. (January 2013). "The draft genome of sweet orange (Citrus sinensis)". Nature Genetics. 45 (1): 59–66. doi:10.1038/ng.2472. PMID 23179022.

[37] Tomato Genome Consortium (May 2012). "The tomato genome sequence provides insights into fleshy fruit evolution". Nature. 485 (7400): 635–41. doi:10.1038/nature11119. PMC 3378239. PMID 22660326.

[38] Bleidorn, Christoph (2015). "Third generation sequencing: technology and its potential impact on evolutionary biodiversity research". Systematics and Biodiversity. {{cite journal}}: Unknown parameter |name-list-format= ignored (|name-list-style= suggested) (help)

[39] Lee, Hayan; Gurtowski, James; Yoo, Shinjae; Nattestad, Maria; Marcus, Shoshana; Goodwin, Sara; McCombie, W. Richard; Schatz, Michael (2016-04-13). "Third-generation sequencing and the future of genomics". bioRxiv: 048603. doi:10.1101/048603. {{cite journal}}: Unknown parameter |name-list-format= ignored (|name-list-style= suggested) (help)

[40] van Deynze A (2015). "Using spinach to compare technologies for whole genome assemblies". Plant & Animal Genomics XXIII Conference.

[41] Chin CS, Alexander DH, Marks P, Klammer AA, Drake J, Heiner C, Clum A, Copeland A, Huddleston J, Eichler EE, Turner SW, Korlach J (June 2013). "Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data". Nature Methods. 10 (6): 563–9. doi:10.1038/nmeth.2474. PMID 23644548. S2CID 205421576.

[42] Daccord N, Celton JM, Linsmith G, Becker C, Choisne N, Schijlen E, et al. (July 2017). "High-quality de novo assembly of the apple genome and methylome dynamics of early fruit development" (PDF). Nature Genetics. 49 (7): 1099–1106. doi:10.1038/ng.3886. PMID 28581499. S2CID 24690391.

[43] Luo, Ruibang (2012). "SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler". Gigascience. 1. doi:10.1186/2047-217X-1-18. PMID 23587118. S2CID 2681931. {{cite journal}}: Unknown parameter |name-list-format= ignored (|name-list-style= suggested) (help)CS1 maint: unflagged free DOI (link)

[44] Ye C, Hill CM, Wu S, Ruan J, Ma ZS (August 2016). "DBG2OLC: Efficient Assembly of Large Genomes Using Long Erroneous Reads of the Third Generation Sequencing Technologies". Scientific Reports. 6 (1): 31900. doi:10.1038/srep31900. PMC 5004134. PMID 27573208.

[45] Li, Heng (2013). "Aligning sequence reads, clone sequences and assembly contigs with BWA- MEM". Broad Institute of Harvard and MIT. arXiv:1303.3997. {{cite journal}}: Unknown parameter |name-list-format= ignored (|name-list-style= suggested) (help)

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]

[38]

[39]

[40]

[41]

[42]

[43]

[44]

[45]

@@ Line 4: / Line 4: @@
 == Structure ==
-The genome of plants can vary in their structure and complexity from small genomes like [[green algae]] (15 Mbp).<ref>{{cite journal | vauthors = Moreau H, Verhelst B, Couloux A, Derelle E, Rombauts S, Grimsley N, Van Bel M, Poulain J, Katinka M, Hohmann-Marriott MF, Piganeau G, Rouzé P, Da Silva C, Wincker P, Van de Peer Y, Vandepoele K | title = Gene functionalities and genome structure in Bathycoccus prasinos reflect cellular specializations at the base of the green lineage | journal = Genome Biology | volume = 13 | issue = 8 | pages = R74 | date = August 2012 | pmid = 22925495 | pmc = 3491373 | doi = 10.1186/gb-2012-13-8-r74 }}</ref> to very large and complex genomes that have typically much higher [[ploidy]], higher rates of [[heterozygosity]] and repetitive elements than species from other kingdoms.<ref>{{cite journal | vauthors = Gregory TR | title = The C-value enigma in plants and animals: a review of parallels and an appeal for partnership | journal = Annals of Botany | volume = 95 | issue = 1 | pages = 133–46 | date = January 2005 | pmid = 15596463 | pmc = 4246714 | doi = 10.1093/aob/mci009 }}</ref> One of the most complex plant genome assemblies available is that of [[Pinus taeda|loblolly pine]] (22 Gbp).<ref>{{cite journal | vauthors = Zimin A, Stevens KA, Crepeau MW, Holtz-Morris A, Koriabine M, Marçais G, Puiu D, Roberts M, Wegrzyn JL, de Jong PJ, Neale DB, Salzberg SL, Yorke JA, Langley CH | title = Sequencing and assembly of the 22-gb loblolly pine genome | journal = Genetics | volume = 196 | issue = 3 | pages = 875–90 | date = March 2014 | pmid = 24653210 | pmc = 3948813 | doi = 10.1534/genetics.113.159715 }}</ref> Due to their complexity, the plants’ genome sequences can't be assembled back into [[chromosome]]s using only short reads provided by [[DNA sequencing|next-generation- sequencing technologies]] (NGS),<ref>{{cite journal|last=Deschamps|first=Stéphane|last2=Campbell|first2=Matthew A. | name-list-format = vanc |date=2010-04-01|title=Utilization of next-generation sequencing platforms in plant genomics and genetic variant discovery |journal=Molecular Breeding|volume=25|issue=4|pages=553–570|doi=10.1007/s11032-009-9357-9 }}</ref><ref>{{cite journal | vauthors = Shendure J, Ji H | title = Next-generation DNA sequencing | journal = Nature Biotechnology | volume = 26 | issue = 10 | pages = 1135–45 | date = October 2008 | pmid = 18846087 | doi = 10.1038/nbt1486 }}</ref> and therefore most plant genome assemblies available that used NGS alone are highly fragmented, contain large numbers of contigs, and genome regions are not finished. Highly repetitive sequences, often larger than 10kbp, are the main challenge in plants.<ref>{{cite journal | vauthors = Treangen TJ, Salzberg SL | title = Repetitive DNA and next-generation sequencing: computational challenges and solutions | journal = Nature Reviews. Genetics | volume = 13 | issue = 1 | pages = 36–46 | date = November 2011 | pmid = 22124482 | pmc = 3324860 | doi = 10.1038/nrg3117 }}</ref><ref>{{cite journal | vauthors = Harrison GE, Heslop-Harrison JS | title = Centromeric repetitive DNA sequences in the genus Brassica | journal = TAG. Theoretical and Applied Genetics. Theoretische und Angewandte Genetik | volume = 90 | issue = 2 | pages = 157–65 | date = February 1995 | pmid = 24173886 | doi = 10.1007/BF00222197 }}</ref> Most of the chromosomal sequences are produced by the activity of [[mobile genetic elements]] (MGEs) in the plant genomes.<ref>{{cite journal | vauthors = Lanciano S, Carpentier MC, Llauro C, Jobet E, Robakowska-Hyzorek D, Lasserre E, Ghesquière A, Panaud O, Mirouze M | title = Sequencing the extrachromosomal circular mobilome reveals retrotransposon activity in plants | journal = PLoS Genetics | volume = 13 | issue = 2 | pages = e1006630 | date = February 2017 | pmid = 28212378 | pmc = 5338827 | doi = 10.1371/journal.pgen.1006630 }}</ref> MGEs are divided into two classes: class I or [[retrotransposon]]s, and class II or [[DNA transposon]]s. In plants, [[Long terminal repeat|long- terminal repeat]] (LTR) retrotransposons are predominant and constitute from 15%<ref>{{cite journal | vauthors = Michael TP, VanBuren R | title = Progress, challenges and the future of crop genomes | journal = Current Opinion in Plant Biology | volume = 24 | pages = 71–81 | date = April 2015 | pmid = 25703261 | doi = 10.1016/j.pbi.2015.02.002 }}</ref> to 90% of the genome.<ref>{{cite book |title=Chromosomes Today | vauthors = Flavell RB, Gale MD, O'dell M, Murphy G, Moore G, Lucas H |date=1993|publisher=Springer | location = Dordrecht|isbn=9789401046602|pages=199–213|doi=10.1007/978-94-011-1510-0_16| chapter = Molecular organization of genes and repeats in the large cereal genomes and implications for the isolation of genes by chromosome walking }}</ref> Polyploidy is another challenge in assembling a plant genome, and it is estimated that ~80% of plants are polyploids.<ref>{{cite journal | vauthors= Meyers LA, Levin DA, Geber M |date=2006-06-01|title=On the abundance of polyploids in flowering plants |journal=Evolution|volume=60|issue=6|pages=1198–1206|doi=10.1554/05-629.1 }}</ref>
+The genome of plants can vary in their structure and complexity from small genomes like [[green algae]] (15 Mbp).<ref>{{cite journal | vauthors = Moreau H, Verhelst B, Couloux A, Derelle E, Rombauts S, Grimsley N, Van Bel M, Poulain J, Katinka M, Hohmann-Marriott MF, Piganeau G, Rouzé P, Da Silva C, Wincker P, Van de Peer Y, Vandepoele K | title = Gene functionalities and genome structure in Bathycoccus prasinos reflect cellular specializations at the base of the green lineage | journal = Genome Biology | volume = 13 | issue = 8 | pages = R74 | date = August 2012 | pmid = 22925495 | pmc = 3491373 | doi = 10.1186/gb-2012-13-8-r74 }}</ref> to very large and complex genomes that have typically much higher [[ploidy]], higher rates of [[heterozygosity]] and repetitive elements than species from other kingdoms.<ref>{{cite journal | vauthors = Gregory TR | title = The C-value enigma in plants and animals: a review of parallels and an appeal for partnership | journal = Annals of Botany | volume = 95 | issue = 1 | pages = 133–46 | date = January 2005 | pmid = 15596463 | pmc = 4246714 | doi = 10.1093/aob/mci009 }}</ref> One of the most complex plant genome assemblies available is that of [[Pinus taeda|loblolly pine]] (22 Gbp).<ref>{{cite journal | vauthors = Zimin A, Stevens KA, Crepeau MW, Holtz-Morris A, Koriabine M, Marçais G, Puiu D, Roberts M, Wegrzyn JL, de Jong PJ, Neale DB, Salzberg SL, Yorke JA, Langley CH | title = Sequencing and assembly of the 22-gb loblolly pine genome | journal = Genetics | volume = 196 | issue = 3 | pages = 875–90 | date = March 2014 | pmid = 24653210 | pmc = 3948813 | doi = 10.1534/genetics.113.159715 }}</ref> Due to their complexity, the plants’ genome sequences can't be assembled back into [[chromosome]]s using only short reads provided by [[DNA sequencing|next-generation- sequencing technologies]] (NGS),<ref>{{cite journal|last1=Deschamps|first1=Stéphane|last2=Campbell|first2=Matthew A. | name-list-format = vanc |date=2010-04-01|title=Utilization of next-generation sequencing platforms in plant genomics and genetic variant discovery |journal=Molecular Breeding|volume=25|issue=4|pages=553–570|doi=10.1007/s11032-009-9357-9 |s2cid=29239452}}</ref><ref>{{cite journal | vauthors = Shendure J, Ji H | title = Next-generation DNA sequencing | journal = Nature Biotechnology | volume = 26 | issue = 10 | pages = 1135–45 | date = October 2008 | pmid = 18846087 | doi = 10.1038/nbt1486 | s2cid = 6384349 }}</ref> and therefore most plant genome assemblies available that used NGS alone are highly fragmented, contain large numbers of contigs, and genome regions are not finished. Highly repetitive sequences, often larger than 10kbp, are the main challenge in plants.<ref>{{cite journal | vauthors = Treangen TJ, Salzberg SL | title = Repetitive DNA and next-generation sequencing: computational challenges and solutions | journal = Nature Reviews. Genetics | volume = 13 | issue = 1 | pages = 36–46 | date = November 2011 | pmid = 22124482 | pmc = 3324860 | doi = 10.1038/nrg3117 }}</ref><ref>{{cite journal | vauthors = Harrison GE, Heslop-Harrison JS | title = Centromeric repetitive DNA sequences in the genus Brassica | journal = TAG. Theoretical and Applied Genetics. Theoretische und Angewandte Genetik | volume = 90 | issue = 2 | pages = 157–65 | date = February 1995 | pmid = 24173886 | doi = 10.1007/BF00222197 | s2cid = 20591213 }}</ref> Most of the chromosomal sequences are produced by the activity of [[mobile genetic elements]] (MGEs) in the plant genomes.<ref>{{cite journal | vauthors = Lanciano S, Carpentier MC, Llauro C, Jobet E, Robakowska-Hyzorek D, Lasserre E, Ghesquière A, Panaud O, Mirouze M | title = Sequencing the extrachromosomal circular mobilome reveals retrotransposon activity in plants | journal = PLOS Genetics | volume = 13 | issue = 2 | pages = e1006630 | date = February 2017 | pmid = 28212378 | pmc = 5338827 | doi = 10.1371/journal.pgen.1006630 }}</ref> MGEs are divided into two classes: class I or [[retrotransposon]]s, and class II or [[DNA transposon]]s. In plants, [[Long terminal repeat|long- terminal repeat]] (LTR) retrotransposons are predominant and constitute from 15%<ref>{{cite journal | vauthors = Michael TP, VanBuren R | title = Progress, challenges and the future of crop genomes | journal = Current Opinion in Plant Biology | volume = 24 | pages = 71–81 | date = April 2015 | pmid = 25703261 | doi = 10.1016/j.pbi.2015.02.002 }}</ref> to 90% of the genome.<ref>{{cite book |title=Chromosomes Today | vauthors = Flavell RB, Gale MD, O'dell M, Murphy G, Moore G, Lucas H |date=1993|publisher=Springer | location = Dordrecht|isbn=9789401046602|pages=199–213|doi=10.1007/978-94-011-1510-0_16| chapter = Molecular organization of genes and repeats in the large cereal genomes and implications for the isolation of genes by chromosome walking }}</ref> Polyploidy is another challenge in assembling a plant genome, and it is estimated that ~80% of plants are polyploids.<ref>{{cite journal | vauthors= Meyers LA, Levin DA, Geber M |date=2006-06-01|title=On the abundance of polyploids in flowering plants |journal=Evolution|volume=60|issue=6|pages=1198–1206|doi=10.1554/05-629.1 |pmid=16892970|s2cid=198156503}}</ref>
 == Assemblies ==
 The first complete plant genome [[sequence assembly|assembly]], that of ''[[Arabidopsis thaliana]]'', was finished in 2000,<ref name=":0">{{cite journal | vauthors =  | title = Analysis of the genome sequence of the flowering plant Arabidopsis thaliana | journal = Nature | volume = 408 | issue = 6814 | pages = 796–815 | date = December 2000 | pmid = 11130711 | doi = 10.1038/35048692 | doi-access = free }}</ref> being the third multicellular [[Eukaryote|eukaryotic]] genome published after [[Caenorhabditis elegans|C. elegans]]<ref>{{cite journal| author = The C. elegans Sequencing Consortium |date=1998|title=Genome Sequence of the Nematode C. elegans: A Platform for Investigating Biology|journal=Science|volume=282|issue=5396|pages=2012–2018|jstor=2897605|doi=10.1126/science.282.5396.2012|pmid=9851916}}</ref> and [[Drosophila melanogaster|D. melanogaster]].<ref>{{cite journal | vauthors = Adams MD, Celniker SE, Holt RA, Evans CA, Gocayne JD, Amanatides PG, Scherer SE, Li PW, Hoskins RA, Galle RF, George RA, Lewis SE, Richards S, Ashburner M, Henderson SN, Sutton GG, Wortman JR, Yandell MD, Zhang Q, Chen LX, Brandon RC, Rogers YH, Blazej RG, Champe M, Pfeiffer BD, Wan KH, Doyle C, Baxter EG, Helt G, Nelson CR, Gabor GL, Abril JF, Agbayani A, An HJ, Andrews-Pfannkoch C, Baldwin D, Ballew RM, Basu A, Baxendale J, Bayraktaroglu L, Beasley EM, Beeson KY, Benos PV, Berman BP, Bhandari D, Bolshakov S, Borkova D, Botchan MR, Bouck J, Brokstein P, Brottier P, Burtis KC, Busam DA, Butler H, Cadieu E, Center A, Chandra I, Cherry JM, Cawley S, Dahlke C, Davenport LB, Davies P, de Pablos B, Delcher A, Deng Z, Mays AD, Dew I, Dietz SM, Dodson K, Doup LE, Downes M, Dugan-Rocha S, Dunkov BC, Dunn P, Durbin KJ, Evangelista CC, Ferraz C, Ferriera S, Fleischmann W, Fosler C, Gabrielian AE, Garg NS, Gelbart WM, Glasser K, Glodek A, Gong F, Gorrell JH, Gu Z, Guan P, Harris M, Harris NL, Harvey D, Heiman TJ, Hernandez JR, Houck J, Hostin D, Houston KA, Howland TJ, Wei MH, Ibegwam C, Jalali M, Kalush F, Karpen GH, Ke Z, Kennison JA, Ketchum KA, Kimmel BE, Kodira CD, Kraft C, Kravitz S, Kulp D, Lai Z, Lasko P, Lei Y, Levitsky AA, Li J, Li Z, Liang Y, Lin X, Liu X, Mattei B, McIntosh TC, McLeod MP, McPherson D, Merkulov G, Milshina NV, Mobarry C, Morris J, Moshrefi A, Mount SM, Moy M, Murphy B, Murphy L, Muzny DM, Nelson DL, Nelson DR, Nelson KA, Nixon K, Nusskern DR, Pacleb JM, Palazzolo M, Pittman GS, Pan S, Pollard J, Puri V, Reese MG, Reinert K, Remington K, Saunders RD, Scheeler F, Shen H, Shue BC, Sidén-Kiamos I, Simpson M, Skupski MP, Smith T, Spier E, Spradling AC, Stapleton M, Strong R, Sun E, Svirskas R, Tector C, Turner R, Venter E, Wang AH, Wang X, Wang ZY, Wassarman DA, Weinstock GM, Weissenbach J, Williams SM, Worley KC, Wu D, Yang S, Yao QA, Ye J, Yeh RF, Zaveri JS, Zhan M, Zhang G, Zhao Q, Zheng L, Zheng XH, Zhong FN, Zhong W, Zhou X, Zhu S, Zhu X, Smith HO, Gibbs RA, Myers EW, Rubin GM, Venter JC | display-authors = 6 | title = The genome sequence of Drosophila melanogaster | journal = Science | volume = 287 | issue = 5461 | pages = 2185–95 | date = March 2000 | pmid = 10731132 | doi = 10.1126/science.287.5461.2185 | citeseerx = 10.1.1.549.8639 }}</ref> Arabidopsis, unlike other plants’ genomes (e.g. [[Malus]]) has convenient traits, such as a small nuclear genome (135Mbp) and a short generation time (8 weeks from seed to seed). The genome has five chromosomes reflecting approximately 4% of the [[human genome]] size. The genome was sequenced and annotated by the Arabidopsis Genome Initiative (AGI).
-The initiative for sequencing the genome of rice (''[[Oryza sativa]]''),<ref name=":1">{{cite journal | vauthors = Goff SA, Ricke D, Lan TH, Presting G, Wang R, Dunn M, Glazebrook J, Sessions A, Oeller P, Varma H, Hadley D, Hutchison D, Martin C, Katagiri F, Lange BM, Moughamer T, Xia Y, Budworth P, Zhong J, Miguel T, Paszkowski U, Zhang S, Colbert M, Sun WL, Chen L, Cooper B, Park S, Wood TC, Mao L, Quail P, Wing R, Dean R, Yu Y, Zharkikh A, Shen R, Sahasrabudhe S, Thomas A, Cannings R, Gutin A, Pruss D, Reid J, Tavtigian S, Mitchell J, Eldredge G, Scholl T, Miller RM, Bhatnagar S, Adey N, Rubano T, Tusneem N, Robinson R, Feldhaus J, Macalma T, Oliphant A, Briggs S | display-authors = 6 | title = A draft sequence of the rice genome (Oryza sativa L. ssp. japonica) | journal = Science | volume = 296 | issue = 5565 | pages = 92–100 | date = April 2002 | pmid = 11935018 | doi = 10.1126/science.1068275 }}</ref> began in September 1997, when scientists from many nations agreed to an international collaboration to sequence the rice genome, forming “The International Rice Genome Sequencing Project” (IRGSP). At an estimated size between 400-430 Mb, approximatively four times larger in dimensions than ''A. thaliana'', rice has the smallest of the major cereal crop genomes.<ref name=":1" />
+The initiative for sequencing the genome of rice (''[[Oryza sativa]]''),<ref name=":1">{{cite journal | vauthors = Goff SA, Ricke D, Lan TH, Presting G, Wang R, Dunn M, Glazebrook J, Sessions A, Oeller P, Varma H, Hadley D, Hutchison D, Martin C, Katagiri F, Lange BM, Moughamer T, Xia Y, Budworth P, Zhong J, Miguel T, Paszkowski U, Zhang S, Colbert M, Sun WL, Chen L, Cooper B, Park S, Wood TC, Mao L, Quail P, Wing R, Dean R, Yu Y, Zharkikh A, Shen R, Sahasrabudhe S, Thomas A, Cannings R, Gutin A, Pruss D, Reid J, Tavtigian S, Mitchell J, Eldredge G, Scholl T, Miller RM, Bhatnagar S, Adey N, Rubano T, Tusneem N, Robinson R, Feldhaus J, Macalma T, Oliphant A, Briggs S | display-authors = 6 | title = A draft sequence of the rice genome (Oryza sativa L. ssp. japonica) | journal = Science | volume = 296 | issue = 5565 | pages = 92–100 | date = April 2002 | pmid = 11935018 | doi = 10.1126/science.1068275 | s2cid = 2960202 }}</ref> began in September 1997, when scientists from many nations agreed to an international collaboration to sequence the rice genome, forming “The International Rice Genome Sequencing Project” (IRGSP). At an estimated size between 400-430 Mb, approximatively four times larger in dimensions than ''A. thaliana'', rice has the smallest of the major cereal crop genomes.<ref name=":1" />
 Between 2000 and 2008 in total 10 plant genomes were published while in 2012 alone, 13 plant genomes were published. Since then the number was constantly increasing, and now more than 400 plant genomes are available in the NCBI genome database, of which 72 were re-annotated [NCBI].
 === Databases ===
-EnsemblPlants<ref>{{cite book |title=Plant Bioinformatics|volume=1374|last=Bolser|first=Dan|last2=Staines|first2=Daniel M.|last3=Pritchard|first3=Emily|last4=Kersey|first4=Paul | name-list-format = vanc |date=2016|publisher=Humana Press, New York, NY|isbn=9781493931668|series=Methods in Molecular Biology|pages=115–140|doi=10.1007/978-1-4939-3167-5_6|pmid=26519403}}</ref> is part of [[Ensembl Genomes|EnsemblGenome]] database and contains resources for a reduced number of sequenced plant species (45, Oct. 2017). It mainly provides genome sequences, gene models, functional annotations and polymorphic loci. For some of the plant species, additional information is provided including population structure, individual genotypes, linkage, and phenotype data.
+EnsemblPlants<ref>{{cite book |title=Plant Bioinformatics|volume=1374|last1=Bolser|first1=Dan|last2=Staines|first2=Daniel M.|last3=Pritchard|first3=Emily|last4=Kersey|first4=Paul | name-list-format = vanc |date=2016|publisher=Humana Press, New York, NY|isbn=9781493931668|series=Methods in Molecular Biology|pages=115–140|doi=10.1007/978-1-4939-3167-5_6|pmid=26519403}}</ref> is part of [[Ensembl Genomes|EnsemblGenome]] database and contains resources for a reduced number of sequenced plant species (45, Oct. 2017). It mainly provides genome sequences, gene models, functional annotations and polymorphic loci. For some of the plant species, additional information is provided including population structure, individual genotypes, linkage, and phenotype data.
 Gramene<ref>{{cite journal | vauthors = Gupta P, Naithani S, Tello-Ruiz MK, Chougule K, D'Eustachio P, Fabregat A, Jiao Y, Keays M, Lee YK, Kumari S, Mulvaney J, Olson A, Preece J, Stein J, Wei S, Weiser J, Huerta L, Petryszak R, Kersey P, Stein LD, Ware D, Jaiswal P | display-authors = 6 | title = Gramene Database: Navigating Plant Comparative Genomics Resources | journal = Current Plant Biology | volume = 7-8 | pages = 10–15 | date = November 2016 | pmid = 28713666 | pmc = 5509230 | doi = 10.1016/j.cpb.2016.12.005 }}</ref> is an online web database resource for plant comparative genomics and pathway analysis based on Ensembl technology.
-Plant Genome DataBase Japan<ref>{{cite book |title=Plant Genomics Databases |volume=1533|last=Nakaya|first=Akihiro|last2=Ichihara|first2=Hisako|last3=Asamizu|first3=Erika|last4=Shirasawa|first4=Sachiko|last5=Nakamura|first5=Yasukazu|last6=Tabata|first6=Satoshi|last7=Hirakawa|first7=Hideki | name-list-format = vanc |date=2017|publisher=Humana Press, New York, NY|isbn=9781493966561|series=Methods in Molecular Biology|pages=45–77|doi=10.1007/978-1-4939-6658-5_3|pmid=27987164}}</ref> (PGDBj) is a website that contains information related to genomes of model and crop plants from databases. It has three main components: ortholog db, DNA marker and linkage map db, and plant resource db, where multiple plant resources accumulated by different institutes are integrated. The aim is “to provide a platform, enabling comparative searches of different resources” (pgdbj.jp).
+Plant Genome DataBase Japan<ref>{{cite book |title=Plant Genomics Databases |volume=1533|last1=Nakaya|first1=Akihiro|last2=Ichihara|first2=Hisako|last3=Asamizu|first3=Erika|last4=Shirasawa|first4=Sachiko|last5=Nakamura|first5=Yasukazu|last6=Tabata|first6=Satoshi|last7=Hirakawa|first7=Hideki | name-list-format = vanc |date=2017|publisher=Humana Press, New York, NY|isbn=9781493966561|series=Methods in Molecular Biology|pages=45–77|doi=10.1007/978-1-4939-6658-5_3|pmid=27987164}}</ref> (PGDBj) is a website that contains information related to genomes of model and crop plants from databases. It has three main components: ortholog db, DNA marker and linkage map db, and plant resource db, where multiple plant resources accumulated by different institutes are integrated. The aim is “to provide a platform, enabling comparative searches of different resources” (pgdbj.jp).
-PlantsDB<ref>{{cite book |title=Plant Genomics Databases |volume=1533 |last=Spannagl |first=Manuel |last2=Nussbaumer |first2=Thomas |last3=Bader |first3=Kai |last4=Gundlach |first4=Heidrun |last5=Mayer |first5=Klaus F. X. | name-list-format = vanc |date=2017|publisher=Humana Press, New York, NY|isbn=9781493966561|series=Methods in Molecular Biology|pages=33–44|doi=10.1007/978-1-4939-6658-5_2|pmid=27987163 }}</ref> is a resource for analysing and storing genetic and genomic information from various plants, and offers tools to query these data and to perform comparative analysis with the help of in-house tools.
+PlantsDB<ref>{{cite book |title=Plant Genomics Databases |volume=1533 |last1=Spannagl |first1=Manuel |last2=Nussbaumer |first2=Thomas |last3=Bader |first3=Kai |last4=Gundlach |first4=Heidrun |last5=Mayer |first5=Klaus F. X. | name-list-format = vanc |date=2017|publisher=Humana Press, New York, NY|isbn=9781493966561|series=Methods in Molecular Biology|pages=33–44|doi=10.1007/978-1-4939-6658-5_2|pmid=27987163 }}</ref> is a resource for analysing and storing genetic and genomic information from various plants, and offers tools to query these data and to perform comparative analysis with the help of in-house tools.
 PLAZA<ref>{{cite book |title=Plant Genomics Databases|volume=1533|last=Vandepoele|first=Klaas | name-list-format = vanc |date=2017|publisher=Humana Press, New York, NY|isbn=9781493966561|series=Methods in Molecular Biology|pages=183–200|doi=10.1007/978-1-4939-6658-5_10|pmid=27987171|chapter=A Guide to the PLAZA 3.0 Plant Comparative Genomic Database}}</ref> is another online resource for comparative genomics that integrates plant sequence data and comparative genomic methods, and performs evolutionary analysis within the green plant lineage ([[Viridiplantae]]).
@@ Line 38: / Line 38: @@
 To sequence and assemble the genome of Oryza Sativa (japonica),<ref name=":1" /> the same strategy was used. For Oryza Sativa a total of 3,401 mapped clones in a minimum tiling path were selected from the physical map and assembled.
-One of the most important crops in the world, maize (Zea mays), is the last plant genome project primarily based on Sanger BAC-by-BAC strategy.<ref name=":2" /> The genome size of Maize, 2.3 Gb and 10 chromosomes,<ref name=":2">{{cite journal | vauthors = Schnable PS, Ware D, Fulton RS, Stein JC, Wei F, Pasternak S, Liang C, Zhang J, Fulton L, Graves TA, Minx P, Reily AD, Courtney L, Kruchowski SS, Tomlinson C, Strong C, Delehaunty K, Fronick C, Courtney B, Rock SM, Belter E, Du F, Kim K, Abbott RM, Cotton M, Levy A, Marchetto P, Ochoa K, Jackson SM, Gillam B, Chen W, Yan L, Higginbotham J, Cardenas M, Waligorski J, Applebaum E, Phelps L, Falcone J, Kanchi K, Thane T, Scimone A, Thane N, Henke J, Wang T, Ruppert J, Shah N, Rotter K, Hodges J, Ingenthron E, Cordes M, Kohlberg S, Sgro J, Delgado B, Mead K, Chinwalla A, Leonard S, Crouse K, Collura K, Kudrna D, Currie J, He R, Angelova A, Rajasekar S, Mueller T, Lomeli R, Scara G, Ko A, Delaney K, Wissotski M, Lopez G, Campos D, Braidotti M, Ashley E, Golser W, Kim H, Lee S, Lin J, Dujmic Z, Kim W, Talag J, Zuccolo A, Fan C, Sebastian A, Kramer M, Spiegel L, Nascimento L, Zutavern T, Miller B, Ambroise C, Muller S, Spooner W, Narechania A, Ren L, Wei S, Kumari S, Faga B, Levy MJ, McMahan L, Van Buren P, Vaughn MW, Ying K, Yeh CT, Emrich SJ, Jia Y, Kalyanaraman A, Hsia AP, Barbazuk WB, Baucom RS, Brutnell TP, Carpita NC, Chaparro C, Chia JM, Deragon JM, Estill JC, Fu Y, Jeddeloh JA, Han Y, Lee H, Li P, Lisch DR, Liu S, Liu Z, Nagel DH, McCann MC, SanMiguel P, Myers AM, Nettleton D, Nguyen J, Penning BW, Ponnala L, Schneider KL, Schwartz DC, Sharma A, Soderlund C, Springer NM, Sun Q, Wang H, Waterman M, Westerman R, Wolfgruber TK, Yang L, Yu Y, Zhang L, Zhou S, Zhu Q, Bennetzen JL, Dawe RK, Jiang J, Jiang N, Presting GG, Wessler SR, Aluru S, Martienssen RA, Clifton SW, McCombie WR, Wing RA, Wilson RK | display-authors = 6 | title = The B73 maize genome: complexity, diversity, and dynamics | journal = Science | volume = 326 | issue = 5956 | pages = 1112–5 | date = November 2009 | pmid = 19965430 | doi = 10.1126/science.1178534 | url = https://lib.dr.iastate.edu/stat_las_pubs/210 }}</ref> is significantly larger than that of rice and Arabidopsis.<ref name=":2" /> To assemble the genome of maize a set of 16,848 minimally overlapping BAC clones derived
+One of the most important crops in the world, maize (Zea mays), is the last plant genome project primarily based on Sanger BAC-by-BAC strategy.<ref name=":2" /> The genome size of Maize, 2.3 Gb and 10 chromosomes,<ref name=":2">{{cite journal | vauthors = Schnable PS, Ware D, Fulton RS, Stein JC, Wei F, Pasternak S, Liang C, Zhang J, Fulton L, Graves TA, Minx P, Reily AD, Courtney L, Kruchowski SS, Tomlinson C, Strong C, Delehaunty K, Fronick C, Courtney B, Rock SM, Belter E, Du F, Kim K, Abbott RM, Cotton M, Levy A, Marchetto P, Ochoa K, Jackson SM, Gillam B, Chen W, Yan L, Higginbotham J, Cardenas M, Waligorski J, Applebaum E, Phelps L, Falcone J, Kanchi K, Thane T, Scimone A, Thane N, Henke J, Wang T, Ruppert J, Shah N, Rotter K, Hodges J, Ingenthron E, Cordes M, Kohlberg S, Sgro J, Delgado B, Mead K, Chinwalla A, Leonard S, Crouse K, Collura K, Kudrna D, Currie J, He R, Angelova A, Rajasekar S, Mueller T, Lomeli R, Scara G, Ko A, Delaney K, Wissotski M, Lopez G, Campos D, Braidotti M, Ashley E, Golser W, Kim H, Lee S, Lin J, Dujmic Z, Kim W, Talag J, Zuccolo A, Fan C, Sebastian A, Kramer M, Spiegel L, Nascimento L, Zutavern T, Miller B, Ambroise C, Muller S, Spooner W, Narechania A, Ren L, Wei S, Kumari S, Faga B, Levy MJ, McMahan L, Van Buren P, Vaughn MW, Ying K, Yeh CT, Emrich SJ, Jia Y, Kalyanaraman A, Hsia AP, Barbazuk WB, Baucom RS, Brutnell TP, Carpita NC, Chaparro C, Chia JM, Deragon JM, Estill JC, Fu Y, Jeddeloh JA, Han Y, Lee H, Li P, Lisch DR, Liu S, Liu Z, Nagel DH, McCann MC, SanMiguel P, Myers AM, Nettleton D, Nguyen J, Penning BW, Ponnala L, Schneider KL, Schwartz DC, Sharma A, Soderlund C, Springer NM, Sun Q, Wang H, Waterman M, Westerman R, Wolfgruber TK, Yang L, Yu Y, Zhang L, Zhou S, Zhu Q, Bennetzen JL, Dawe RK, Jiang J, Jiang N, Presting GG, Wessler SR, Aluru S, Martienssen RA, Clifton SW, McCombie WR, Wing RA, Wilson RK | display-authors = 6 | title = The B73 maize genome: complexity, diversity, and dynamics | journal = Science | volume = 326 | issue = 5956 | pages = 1112–5 | date = November 2009 | pmid = 19965430 | doi = 10.1126/science.1178534 | s2cid = 21433160 | url = https://lib.dr.iastate.edu/stat_las_pubs/210 }}</ref> is significantly larger than that of rice and Arabidopsis.<ref name=":2" /> To assemble the genome of maize a set of 16,848 minimally overlapping BAC clones derived
 from combinations of physical and genetic map were selected and sequenced. The assembly on [[maize]] was performed in addition with external information data. The data was obtained from [[Complementary DNA|cDNA]] and sequences from libraries with methyl-filtered DNA (libraries that uses the knowledge that the bases in genic sequences tends to be less heavily methylated than those in non-genic regions) and high C0 t techniques.
@@ Line 47: / Line 47: @@
 In the [[Whole genome sequencing|WGS]] sequencing technology there is no order for the fragments that are sequenced. The DNA is randomly sheared and cloned fragments are sequenced and assembled using computational methods. This technology reduced the cost and the time associated with construction of the maps and relies on computational resources.
-A considerable number of important plant genomes like grapevine ([[Vitis vinifera|Vitis Vinifer]]),<ref name=":3">{{cite journal | vauthors = Jaillon O, Aury JM, Noel B, Policriti A, Clepet C, Casagrande A, Choisne N, Aubourg S, Vitulo N, Jubin C, Vezzi A, Legeai F, Hugueney P, Dasilva C, Horner D, Mica E, Jublot D, Poulain J, Bruyère C, Billault A, Segurens B, Gouyvenoux M, Ugarte E, Cattonaro F, Anthouard V, Vico V, Del Fabbro C, Alaux M, Di Gaspero G, Dumas V, Felice N, Paillard S, Juman I, Moroldo M, Scalabrin S, Canaguier A, Le Clainche I, Malacrida G, Durand E, Pesole G, Laucou V, Chatelet P, Merdinoglu D, Delledonne M, Pezzotti M, Lecharny A, Scarpelli C, Artiguenave F, Pè ME, Valle G, Morgante M, Caboche M, Adam-Blondon AF, Weissenbach J, Quétier F, Wincker P | display-authors = 6 | title = The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla | journal = Nature | volume = 449 | issue = 7161 | pages = 463–7 | date = September 2007 | pmid = 17721507 | doi = 10.1038/nature06148 | doi-access = free }}</ref> papaya ([[Papaya|Carica papaya]]),<ref>{{cite journal | vauthors = Ming R, Hou S, Feng Y, Yu Q, Dionne-Laporte A, Saw JH, Senin P, Wang W, Ly BV, Lewis KL, Salzberg SL, Feng L, Jones MR, Skelton RL, Murray JE, Chen C, Qian W, Shen J, Du P, Eustice M, Tong E, Tang H, Lyons E, Paull RE, Michael TP, Wall K, Rice DW, Albert H, Wang ML, Zhu YJ, Schatz M, Nagarajan N, Acob RA, Guan P, Blas A, Wai CM, Ackerman CM, Ren Y, Liu C, Wang J, Wang J, Na JK, Shakirov EV, Haas B, Thimmapuram J, Nelson D, Wang X, Bowers JE, Gschwend AR, Delcher AL, Singh R, Suzuki JY, Tripathi S, Neupane K, Wei H, Irikura B, Paidi M, Jiang N, Zhang W, Presting G, Windsor A, Navajas-Pérez R, Torres MJ, Feltus FA, Porter B, Li Y, Burroughs AM, Luo MC, Liu L, Christopher DA, Mount SM, Moore PH, Sugimura T, Jiang J, Schuler MA, Friedman V, Mitchell-Olds T, Shippen DE, dePamphilis CW, Palmer JD, Freeling M, Paterson AH, Gonsalves D, Wang L, Alam M | display-authors = 6 | title = The draft genome of the transgenic tropical fruit tree papaya (Carica papaya Linnaeus) | journal = Nature | volume = 452 | issue = 7190 | pages = 991–6 | date = April 2008 | pmid = 18432245 | pmc = 2836516 | doi = 10.1038/nature06856 }}</ref> and cottonwood ([[Populus trichocarpa]])<ref>{{cite journal | vauthors = Tuskan GA, Difazio S, Jansson S, Bohlmann J, Grigoriev I, Hellsten U, Putnam N, Ralph S, Rombauts S, Salamov A, Schein J, Sterck L, Aerts A, Bhalerao RR, Bhalerao RP, Blaudez D, Boerjan W, Brun A, Brunner A, Busov V, Campbell M, Carlson J, Chalot M, Chapman J, Chen GL, Cooper D, Coutinho PM, Couturier J, Covert S, Cronk Q, Cunningham R, Davis J, Degroeve S, Déjardin A, Depamphilis C, Detter J, Dirks B, Dubchak I, Duplessis S, Ehlting J, Ellis B, Gendler K, Goodstein D, Gribskov M, Grimwood J, Groover A, Gunter L, Hamberger B, Heinze B, Helariutta Y, Henrissat B, Holligan D, Holt R, Huang W, Islam-Faridi N, Jones S, Jones-Rhoades M, Jorgensen R, Joshi C, Kangasjärvi J, Karlsson J, Kelleher C, Kirkpatrick R, Kirst M, Kohler A, Kalluri U, Larimer F, Leebens-Mack J, Leplé JC, Locascio P, Lou Y, Lucas S, Martin F, Montanini B, Napoli C, Nelson DR, Nelson C, Nieminen K, Nilsson O, Pereda V, Peter G, Philippe R, Pilate G, Poliakov A, Razumovskaya J, Richardson P, Rinaldi C, Ritland K, Rouzé P, Ryaboy D, Schmutz J, Schrader J, Segerman B, Shin H, Siddiqui A, Sterky F, Terry A, Tsai CJ, Uberbacher E, Unneberg P, Vahala J, Wall K, Wessler S, Yang G, Yin T, Douglas C, Marra M, Sandberg G, Van de Peer Y, Rokhsar D | display-authors = 6 | title = The genome of black cottonwood, Populus trichocarpa (Torr. & Gray) | journal = Science | volume = 313 | issue = 5793 | pages = 1596–604 | date = September 2006 | pmid = 16973872 | doi = 10.1126/science.1128691 | url = https://digital.library.unt.edu/ark:/67531/metadc883930/ | type = Submitted manuscript }}</ref> were sequenced and assembled with Sanger WGS strategy.
+A considerable number of important plant genomes like grapevine ([[Vitis vinifera|Vitis Vinifer]]),<ref name=":3">{{cite journal | vauthors = Jaillon O, Aury JM, Noel B, Policriti A, Clepet C, Casagrande A, Choisne N, Aubourg S, Vitulo N, Jubin C, Vezzi A, Legeai F, Hugueney P, Dasilva C, Horner D, Mica E, Jublot D, Poulain J, Bruyère C, Billault A, Segurens B, Gouyvenoux M, Ugarte E, Cattonaro F, Anthouard V, Vico V, Del Fabbro C, Alaux M, Di Gaspero G, Dumas V, Felice N, Paillard S, Juman I, Moroldo M, Scalabrin S, Canaguier A, Le Clainche I, Malacrida G, Durand E, Pesole G, Laucou V, Chatelet P, Merdinoglu D, Delledonne M, Pezzotti M, Lecharny A, Scarpelli C, Artiguenave F, Pè ME, Valle G, Morgante M, Caboche M, Adam-Blondon AF, Weissenbach J, Quétier F, Wincker P | display-authors = 6 | title = The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla | journal = Nature | volume = 449 | issue = 7161 | pages = 463–7 | date = September 2007 | pmid = 17721507 | doi = 10.1038/nature06148 | doi-access = free }}</ref> papaya ([[Papaya|Carica papaya]]),<ref>{{cite journal | vauthors = Ming R, Hou S, Feng Y, Yu Q, Dionne-Laporte A, Saw JH, Senin P, Wang W, Ly BV, Lewis KL, Salzberg SL, Feng L, Jones MR, Skelton RL, Murray JE, Chen C, Qian W, Shen J, Du P, Eustice M, Tong E, Tang H, Lyons E, Paull RE, Michael TP, Wall K, Rice DW, Albert H, Wang ML, Zhu YJ, Schatz M, Nagarajan N, Acob RA, Guan P, Blas A, Wai CM, Ackerman CM, Ren Y, Liu C, Wang J, Wang J, Na JK, Shakirov EV, Haas B, Thimmapuram J, Nelson D, Wang X, Bowers JE, Gschwend AR, Delcher AL, Singh R, Suzuki JY, Tripathi S, Neupane K, Wei H, Irikura B, Paidi M, Jiang N, Zhang W, Presting G, Windsor A, Navajas-Pérez R, Torres MJ, Feltus FA, Porter B, Li Y, Burroughs AM, Luo MC, Liu L, Christopher DA, Mount SM, Moore PH, Sugimura T, Jiang J, Schuler MA, Friedman V, Mitchell-Olds T, Shippen DE, dePamphilis CW, Palmer JD, Freeling M, Paterson AH, Gonsalves D, Wang L, Alam M | display-authors = 6 | title = The draft genome of the transgenic tropical fruit tree papaya (Carica papaya Linnaeus) | journal = Nature | volume = 452 | issue = 7190 | pages = 991–6 | date = April 2008 | pmid = 18432245 | pmc = 2836516 | doi = 10.1038/nature06856 }}</ref> and cottonwood ([[Populus trichocarpa]])<ref>{{cite journal | vauthors = Tuskan GA, Difazio S, Jansson S, Bohlmann J, Grigoriev I, Hellsten U, Putnam N, Ralph S, Rombauts S, Salamov A, Schein J, Sterck L, Aerts A, Bhalerao RR, Bhalerao RP, Blaudez D, Boerjan W, Brun A, Brunner A, Busov V, Campbell M, Carlson J, Chalot M, Chapman J, Chen GL, Cooper D, Coutinho PM, Couturier J, Covert S, Cronk Q, Cunningham R, Davis J, Degroeve S, Déjardin A, Depamphilis C, Detter J, Dirks B, Dubchak I, Duplessis S, Ehlting J, Ellis B, Gendler K, Goodstein D, Gribskov M, Grimwood J, Groover A, Gunter L, Hamberger B, Heinze B, Helariutta Y, Henrissat B, Holligan D, Holt R, Huang W, Islam-Faridi N, Jones S, Jones-Rhoades M, Jorgensen R, Joshi C, Kangasjärvi J, Karlsson J, Kelleher C, Kirkpatrick R, Kirst M, Kohler A, Kalluri U, Larimer F, Leebens-Mack J, Leplé JC, Locascio P, Lou Y, Lucas S, Martin F, Montanini B, Napoli C, Nelson DR, Nelson C, Nieminen K, Nilsson O, Pereda V, Peter G, Philippe R, Pilate G, Poliakov A, Razumovskaya J, Richardson P, Rinaldi C, Ritland K, Rouzé P, Ryaboy D, Schmutz J, Schrader J, Segerman B, Shin H, Siddiqui A, Sterky F, Terry A, Tsai CJ, Uberbacher E, Unneberg P, Vahala J, Wall K, Wessler S, Yang G, Yin T, Douglas C, Marra M, Sandberg G, Van de Peer Y, Rokhsar D | display-authors = 6 | title = The genome of black cottonwood, Populus trichocarpa (Torr. & Gray) | journal = Science | volume = 313 | issue = 5793 | pages = 1596–604 | date = September 2006 | pmid = 16973872 | doi = 10.1126/science.1128691 | s2cid = 7717980 | url = https://digital.library.unt.edu/ark:/67531/metadc883930/ | type = Submitted manuscript }}</ref> were sequenced and assembled with Sanger WGS strategy.
 The draft genome of grapevine<ref name=":3" /> is the fourth genome published for a flowering plant and the first from a fruit crop. The sequences of the genome were obtained from different types of libraries, like plasmids, fosmids and BACs. All the data were generated by paired-end sequencing of cloned insert using Sanger technology on ABI3730x1 sequencers. To assemble the reads, Arachne, 2002,<ref>{{cite journal | vauthors = Swan KA, Curtis DE, McKusick KB, Voinov AV, Mapa FA, Cancilla MR | title = High-throughput gene mapping in Caenorhabditis elegans | journal = Genome Research | volume = 12 | issue = 7 | pages = 1100–5 | date = July 2002 | pmid = 12097347 | pmc = 186621 | doi = 10.1101/gr.208902 }}</ref> a software designed to analyze reads obtained from both ends of plasmid clones, was used. In total 6.2 million paired-end tag reads were produced. The software produced 20.784 contigs that were combined into 3,830 supercontigs, having an [[N50, L50, and related statistics|N50]] value of 64kb. Supercontigs had a total size of 498 Mb.
 The anchorage of the supercontigs along the genome was performed first by joining supercontigs together using paired BAC end sequences. The resulting ultracontigs and the remained supercontigs were then aligned along the genetic map of the genome.
-Later improvements of this strategy enabled the sequencing of [[Brachypodium distachyon]],<ref>{{cite journal | title = Genome sequencing and analysis of the model grass Brachypodium distachyon | journal = Nature | volume = 463 | issue = 7282 | pages = 763–8 | date = February 2010 | pmid = 20148030 | doi = 10.1038/nature08747 | author1 = International Brachypodium Initiative | url = http://digitalcommons.unl.edu/usdaarsfacpub/1040 | doi-access = free }}</ref> [[Sorghum bicolor]]<ref>{{cite journal | vauthors = Paterson AH, Bowers JE, Bruggmann R, Dubchak I, Grimwood J, Gundlach H, Haberer G, Hellsten U, Mitros T, Poliakov A, Schmutz J, Spannagl M, Tang H, Wang X, Wicker T, Bharti AK, Chapman J, Feltus FA, Gowik U, Grigoriev IV, Lyons E, Maher CA, Martis M, Narechania A, Otillar RP, Penning BW, Salamov AA, Wang Y, Zhang L, Carpita NC, Freeling M, Gingle AR, Hash CT, Keller B, Klein P, Kresovich S, McCann MC, Ming R, Peterson DG, Ware D, Westhoff P, Mayer KF, Messing J, Rokhsar DS | display-authors = 6 | title = The Sorghum bicolor genome and the diversification of grasses | journal = Nature | volume = 457 | issue = 7229 | pages = 551–6 | date = January 2009 | pmid = 19189423 | doi = 10.1038/nature07723 | doi-access = free }}</ref> and [[soybean]].<ref>{{cite journal|last=Schmutz|first=Jeremy| name-list-format = vanc |date=2009|title=Genome sequence of the palaeopolyploid soybean |journal=Nature |volume=463|pages=|via=}}</ref>
+Later improvements of this strategy enabled the sequencing of [[Brachypodium distachyon]],<ref>{{cite journal | title = Genome sequencing and analysis of the model grass Brachypodium distachyon | journal = Nature | volume = 463 | issue = 7282 | pages = 763–8 | date = February 2010 | pmid = 20148030 | doi = 10.1038/nature08747 | author1 = International Brachypodium Initiative | url = http://digitalcommons.unl.edu/usdaarsfacpub/1040 | doi-access = free }}</ref> [[Sorghum bicolor]]<ref>{{cite journal | vauthors = Paterson AH, Bowers JE, Bruggmann R, Dubchak I, Grimwood J, Gundlach H, Haberer G, Hellsten U, Mitros T, Poliakov A, Schmutz J, Spannagl M, Tang H, Wang X, Wicker T, Bharti AK, Chapman J, Feltus FA, Gowik U, Grigoriev IV, Lyons E, Maher CA, Martis M, Narechania A, Otillar RP, Penning BW, Salamov AA, Wang Y, Zhang L, Carpita NC, Freeling M, Gingle AR, Hash CT, Keller B, Klein P, Kresovich S, McCann MC, Ming R, Peterson DG, Ware D, Westhoff P, Mayer KF, Messing J, Rokhsar DS | display-authors = 6 | title = The Sorghum bicolor genome and the diversification of grasses | journal = Nature | volume = 457 | issue = 7229 | pages = 551–6 | date = January 2009 | pmid = 19189423 | doi = 10.1038/nature07723 | doi-access = free }}</ref> and [[soybean]].<ref>{{cite journal|last=Schmutz|first=Jeremy| name-list-format = vanc |date=2009|title=Genome sequence of the palaeopolyploid soybean |journal=Nature |volume=463|issue=7278|pages=178–83|doi=10.1038/nature08670|pmid=20075913|s2cid=4372224}}</ref>
 === Next-generation sequencing ===
 Due to its relatively cheap cost in comparison to previous methods, most of the recent plant genomes were sequenced and assembled using data from NGS (next-generation- sequencing) technology. In general the NGS data are used in combination with Sanger Sequencing technology or long-reads obtained from the [[Third-generation sequencing|third generation sequencing]].
 The genome of the [[cucumber]], (Cucumis sativus),<ref>{{cite journal | vauthors = Huang S, Li R, Zhang Z, Li L, Gu X, Fan W, Lucas WJ, Wang X, Xie B, Ni P, Ren Y, Zhu H, Li J, Lin K, Jin W, Fei Z, Li G, Staub J, Kilian A, van der Vossen EA, Wu Y, Guo J, He J, Jia Z, Ren Y, Tian G, Lu Y, Ruan J, Qian W, Wang M, Huang Q, Li B, Xuan Z, Cao J, Wu Z, Zhang J, Cai Q, Bai Y, Zhao B, Han Y, Li Y, Li X, Wang S, Shi Q, Liu S, Cho WK, Kim JY, Xu Y, Heller-Uszynska K, Miao H, Cheng Z, Zhang S, Wu J, Yang Y, Kang H, Li M, Liang H, Ren X, Shi Z, Wen M, Jian M, Yang H, Zhang G, Yang Z, Chen R, Liu S, Li J, Ma L, Liu H, Zhou Y, Zhao J, Fang X, Li G, Fang L, Li Y, Liu D, Zheng H, Zhang Y, Qin N, Li Z, Yang G, Yang S, Bolund L, Kristiansen K, Zheng H, Li S, Zhang X, Yang H, Wang J, Sun R, Zhang B, Jiang S, Wang J, Du Y, Li S | display-authors = 6 | title = The genome of the cucumber, Cucumis sativus L | journal = Nature Genetics | volume = 41 | issue = 12 | pages = 1275–81 | date = December 2009 | pmid = 19881527 | doi = 10.1038/ng.475 | doi-access = free }}</ref> was one of the plant genomes that used the NGS [[Illumina dye sequencing|Illumina]] reads in combination with Sanger sequences. 72.2-fold genome coverage high quality base pairs were generated from which 3.9-fold coverage was provided from Sanger and the Illumina GA reads provided 68.3-fold coverage. From this two assemblies were produced based on the sequencing technology. The resulting contigs were compared between them, resulting in a total length of the assembled genome of 243.5 Mb. The result is about 30% smaller than the genome size estimated by [[flow cytometry]] of isolated nuclei stained with [[propidium iodide]] (367 Mb). A genetic map was constructed to anchor the assembled genome. 72.8% of the assembled sequences were successfully anchored onto the seven chromosomes.
-Another plant genome that combined NGS with Sanger sequencing was the genome of [[Theobroma cacao]], 2010,<ref>{{cite journal | vauthors = Argout X, Salse J, Aury JM, Guiltinan MJ, Droc G, Gouzy J, Allegre M, Chaparro C, Legavre T, Maximova SN, Abrouk M, Murat F, Fouet O, Poulain J, Ruiz M, Roguet Y, Rodier-Goud M, Barbosa-Neto JF, Sabot F, Kudrna D, Ammiraju JS, Schuster SC, Carlson JE, Sallet E, Schiex T, Dievart A, Kramer M, Gelley L, Shi Z, Bérard A, Viot C, Boccara M, Risterucci AM, Guignon V, Sabau X, Axtell MJ, Ma Z, Zhang Y, Brown S, Bourge M, Golser W, Song X, Clement D, Rivallan R, Tahi M, Akaza JM, Pitollat B, Gramacho K, D'Hont A, Brunel D, Infante D, Kebe I, Costet P, Wing R, McCombie WR, Guiderdoni E, Quetier F, Panaud O, Wincker P, Bocs S, Lanaud C | display-authors = 6 | title = The genome of Theobroma cacao | journal = Nature Genetics | volume = 43 | issue = 2 | pages = 101–8 | date = February 2011 | pmid = 21186351 | doi = 10.1038/ng.736 }}</ref> an economically important tropical fruit tree crop. The genome was sequenced in a consortium, “The International Cocoa Genome Sequencing consortium (ICGS) “ and produced a total of 17.6 million 454 single end reads, 8.8 million 454 paired-end reads, 398.0 million Illumina paired-end reads and about 88,000 Sanger BAC reads. First by using genome assembly software, Newbler, an assembly was produced with 25,912 contigs and 4,792 scaffolds from the reads obtained from Roche/454 and Sanger raw data. This had a total length of 326.9 Mb, which represents 76% of the estimated genome size.
+Another plant genome that combined NGS with Sanger sequencing was the genome of [[Theobroma cacao]], 2010,<ref>{{cite journal | vauthors = Argout X, Salse J, Aury JM, Guiltinan MJ, Droc G, Gouzy J, Allegre M, Chaparro C, Legavre T, Maximova SN, Abrouk M, Murat F, Fouet O, Poulain J, Ruiz M, Roguet Y, Rodier-Goud M, Barbosa-Neto JF, Sabot F, Kudrna D, Ammiraju JS, Schuster SC, Carlson JE, Sallet E, Schiex T, Dievart A, Kramer M, Gelley L, Shi Z, Bérard A, Viot C, Boccara M, Risterucci AM, Guignon V, Sabau X, Axtell MJ, Ma Z, Zhang Y, Brown S, Bourge M, Golser W, Song X, Clement D, Rivallan R, Tahi M, Akaza JM, Pitollat B, Gramacho K, D'Hont A, Brunel D, Infante D, Kebe I, Costet P, Wing R, McCombie WR, Guiderdoni E, Quetier F, Panaud O, Wincker P, Bocs S, Lanaud C | display-authors = 6 | title = The genome of Theobroma cacao | journal = Nature Genetics | volume = 43 | issue = 2 | pages = 101–8 | date = February 2011 | pmid = 21186351 | doi = 10.1038/ng.736 | s2cid = 4685532 }}</ref> an economically important tropical fruit tree crop. The genome was sequenced in a consortium, “The International Cocoa Genome Sequencing consortium (ICGS) “ and produced a total of 17.6 million 454 single end reads, 8.8 million 454 paired-end reads, 398.0 million Illumina paired-end reads and about 88,000 Sanger BAC reads. First by using genome assembly software, Newbler, an assembly was produced with 25,912 contigs and 4,792 scaffolds from the reads obtained from Roche/454 and Sanger raw data. This had a total length of 326.9 Mb, which represents 76% of the estimated genome size.
 The Illumina reads were used to complement the [[454 Sequencing|454]] assembly, by aligning the short reads on the cocoa genome assembly using the SOAP software.
-A similar strategy that combined NGS reads and Sanger Sequencing was used for other important plant species like the first published apple genome ([[Apple|Malus domestica]]),<ref name=":4">{{cite journal | vauthors = Velasco R, Zharkikh A, Affourtit J, Dhingra A, Cestaro A, Kalyanaraman A, Fontana P, Bhatnagar SK, Troggio M, Pruss D, Salvi S, Pindo M, Baldi P, Castelletti S, Cavaiuolo M, Coppola G, Costa F, Cova V, Dal Ri A, Goremykin V, Komjanc M, Longhi S, Magnago P, Malacarne G, Malnoy M, Micheletti D, Moretto M, Perazzolli M, Si-Ammour A, Vezzulli S, Zini E, Eldredge G, Fitzgerald LM, Gutin N, Lanchbury J, Macalma T, Mitchell JT, Reid J, Wardell B, Kodira C, Chen Z, Desany B, Niazi F, Palmer M, Koepke T, Jiwan D, Schaeffer S, Krishnan V, Wu C, Chu VT, King ST, Vick J, Tao Q, Mraz A, Stormo A, Stormo K, Bogden R, Ederle D, Stella A, Vecchietti A, Kater MM, Masiero S, Lasserre P, Lespinasse Y, Allan AC, Bus V, Chagné D, Crowhurst RN, Gleave AP, Lavezzo E, Fawcett JA, Proost S, Rouzé P, Sterck L, Toppo S, Lazzari B, Hellens RP, Durel CE, Gutin A, Bumgarner RE, Gardiner SE, Skolnick M, Egholm M, Van de Peer Y, Salamini F, Viola R | display-authors = 6 | title = The genome of the domesticated apple (Malus × domestica Borkh.) | journal = Nature Genetics | volume = 42 | issue = 10 | pages = 833–9 | date = October 2010 | pmid = 20802477 | doi = 10.1038/ng.654 | doi-access = free }}</ref> cotton ([[Gossypium raimondii|Gossypium Raimond]]),<ref>{{cite journal | vauthors = Wang K, Wang Z, Li F, Ye W, Wang J, Song G, Yue Z, Cong L, Shang H, Zhu S, Zou C, Li Q, Yuan Y, Lu C, Wei H, Gou C, Zheng Z, Yin Y, Zhang X, Liu K, Wang B, Song C, Shi N, Kohel RJ, Percy RG, Yu JZ, Zhu YX, Wang J, Yu S | display-authors = 6 | title = The draft genome of a diploid cotton Gossypium raimondii | journal = Nature Genetics | volume = 44 | issue = 10 | pages = 1098–103 | date = October 2012 | pmid = 22922876 | doi = 10.1038/ng.2371 }}</ref> draft genome of sweet orange ([[Citrus × sinensis|Citrus sinensis]])<ref>{{cite journal | vauthors = Xu Q, Chen LL, Ruan X, Chen D, Zhu A, Chen C, Bertrand D, Jiao WB, Hao BH, Lyon MP, Chen J, Gao S, Xing F, Lan H, Chang JW, Ge X, Lei Y, Hu Q, Miao Y, Wang L, Xiao S, Biswas MK, Zeng W, Guo F, Cao H, Yang X, Xu XW, Cheng YJ, Xu J, Liu JH, Luo OJ, Tang Z, Guo WW, Kuang H, Zhang HY, Roose ML, Nagarajan N, Deng XX, Ruan Y | display-authors = 6 | title = The draft genome of sweet orange (Citrus sinensis) | journal = Nature Genetics | volume = 45 | issue = 1 | pages = 59–66 | date = January 2013 | pmid = 23179022 | doi = 10.1038/ng.2472 | doi-access = free }}</ref> and the domesticated tomato ([[Tomato|Solanum lycopersicum]]) genome<ref>{{cite journal | author = Tomato Genome Consortium | title = The tomato genome sequence provides insights into fleshy fruit evolution | journal = Nature | volume = 485 | issue = 7400 | pages = 635–41 | date = May 2012 | pmid = 22660326 | pmc = 3378239 | doi = 10.1038/nature11119 }}</ref>
+A similar strategy that combined NGS reads and Sanger Sequencing was used for other important plant species like the first published apple genome ([[Apple|Malus domestica]]),<ref name=":4">{{cite journal | vauthors = Velasco R, Zharkikh A, Affourtit J, Dhingra A, Cestaro A, Kalyanaraman A, Fontana P, Bhatnagar SK, Troggio M, Pruss D, Salvi S, Pindo M, Baldi P, Castelletti S, Cavaiuolo M, Coppola G, Costa F, Cova V, Dal Ri A, Goremykin V, Komjanc M, Longhi S, Magnago P, Malacarne G, Malnoy M, Micheletti D, Moretto M, Perazzolli M, Si-Ammour A, Vezzulli S, Zini E, Eldredge G, Fitzgerald LM, Gutin N, Lanchbury J, Macalma T, Mitchell JT, Reid J, Wardell B, Kodira C, Chen Z, Desany B, Niazi F, Palmer M, Koepke T, Jiwan D, Schaeffer S, Krishnan V, Wu C, Chu VT, King ST, Vick J, Tao Q, Mraz A, Stormo A, Stormo K, Bogden R, Ederle D, Stella A, Vecchietti A, Kater MM, Masiero S, Lasserre P, Lespinasse Y, Allan AC, Bus V, Chagné D, Crowhurst RN, Gleave AP, Lavezzo E, Fawcett JA, Proost S, Rouzé P, Sterck L, Toppo S, Lazzari B, Hellens RP, Durel CE, Gutin A, Bumgarner RE, Gardiner SE, Skolnick M, Egholm M, Van de Peer Y, Salamini F, Viola R | display-authors = 6 | title = The genome of the domesticated apple (Malus × domestica Borkh.) | journal = Nature Genetics | volume = 42 | issue = 10 | pages = 833–9 | date = October 2010 | pmid = 20802477 | doi = 10.1038/ng.654 | doi-access = free }}</ref> cotton ([[Gossypium raimondii|Gossypium Raimond]]),<ref>{{cite journal | vauthors = Wang K, Wang Z, Li F, Ye W, Wang J, Song G, Yue Z, Cong L, Shang H, Zhu S, Zou C, Li Q, Yuan Y, Lu C, Wei H, Gou C, Zheng Z, Yin Y, Zhang X, Liu K, Wang B, Song C, Shi N, Kohel RJ, Percy RG, Yu JZ, Zhu YX, Wang J, Yu S | display-authors = 6 | title = The draft genome of a diploid cotton Gossypium raimondii | journal = Nature Genetics | volume = 44 | issue = 10 | pages = 1098–103 | date = October 2012 | pmid = 22922876 | doi = 10.1038/ng.2371 | s2cid = 38495587 }}</ref> draft genome of sweet orange ([[Citrus × sinensis|Citrus sinensis]])<ref>{{cite journal | vauthors = Xu Q, Chen LL, Ruan X, Chen D, Zhu A, Chen C, Bertrand D, Jiao WB, Hao BH, Lyon MP, Chen J, Gao S, Xing F, Lan H, Chang JW, Ge X, Lei Y, Hu Q, Miao Y, Wang L, Xiao S, Biswas MK, Zeng W, Guo F, Cao H, Yang X, Xu XW, Cheng YJ, Xu J, Liu JH, Luo OJ, Tang Z, Guo WW, Kuang H, Zhang HY, Roose ML, Nagarajan N, Deng XX, Ruan Y | display-authors = 6 | title = The draft genome of sweet orange (Citrus sinensis) | journal = Nature Genetics | volume = 45 | issue = 1 | pages = 59–66 | date = January 2013 | pmid = 23179022 | doi = 10.1038/ng.2472 | doi-access = free }}</ref> and the domesticated tomato ([[Tomato|Solanum lycopersicum]]) genome<ref>{{cite journal | author = Tomato Genome Consortium | title = The tomato genome sequence provides insights into fleshy fruit evolution | journal = Nature | volume = 485 | issue = 7400 | pages = 635–41 | date = May 2012 | pmid = 22660326 | pmc = 3378239 | doi = 10.1038/nature11119 }}</ref>
 === Third-generation ===
-With the emergence of [[third-generation sequencing]] (TGS) some of the limitations from previous methods of sequencing and assembling plant genomes have started to be addressed. This technology is characterized by the parallel sequencing of single molecules of DNA, that results in sequences up to 54 kbp length ([[Pacific Biosciences|PacBio]] RS 2).<ref>{{cite journal|last=Bleidorn|first=Christoph| name-list-format = vanc |date=2015|title=Third generation sequencing: technology and its potential impact on evolutionary biodiversity research |journal=Systematics and Biodiversity|volume=|pages= }}</ref> In general, long reads from TGS have relatively high error rates (~10% on average)<ref>{{cite journal|last=Lee|first=Hayan|last2=Gurtowski|first2=James|last3=Yoo|first3=Shinjae|last4=Nattestad|first4=Maria|last5=Marcus|first5=Shoshana|last6=Goodwin|first6=Sara|last7=McCombie|first7=W. Richard|last8=Schatz|first8=Michael | name-list-format = vanc |date=2016-04-13|title=Third-generation sequencing and the future of genomics |journal=bioRxiv|pages=048603|doi=10.1101/048603|doi-access=free}}</ref> and therefore repeated sequencing of the same DNA fragments is required. The price of such technology is still quite high and therefore is generally used in combination with short reads from NGS.
+With the emergence of [[third-generation sequencing]] (TGS) some of the limitations from previous methods of sequencing and assembling plant genomes have started to be addressed. This technology is characterized by the parallel sequencing of single molecules of DNA, that results in sequences up to 54 kbp length ([[Pacific Biosciences|PacBio]] RS 2).<ref>{{cite journal|last=Bleidorn|first=Christoph| name-list-format = vanc |date=2015|title=Third generation sequencing: technology and its potential impact on evolutionary biodiversity research |journal=Systematics and Biodiversity|volume=|pages= }}</ref> In general, long reads from TGS have relatively high error rates (~10% on average)<ref>{{cite journal|last1=Lee|first1=Hayan|last2=Gurtowski|first2=James|last3=Yoo|first3=Shinjae|last4=Nattestad|first4=Maria|last5=Marcus|first5=Shoshana|last6=Goodwin|first6=Sara|last7=McCombie|first7=W. Richard|last8=Schatz|first8=Michael | name-list-format = vanc |date=2016-04-13|title=Third-generation sequencing and the future of genomics |journal=bioRxiv|pages=048603|doi=10.1101/048603|doi-access=free}}</ref> and therefore repeated sequencing of the same DNA fragments is required. The price of such technology is still quite high and therefore is generally used in combination with short reads from NGS.
-One of the first plant genome that used long-reads from TGS, Pacific Biosciences in combination with short reads from NGS was the genome of [[Spinach]]<ref>{{cite journal | vauthors = van Deynze A | date = 2015 |title=Using spinach to compare technologies for whole genome assemblies |journal=Plant & Animal Genomics XXIII Conference|volume=|pages=|via=}}</ref> having a genome size estimated at 989 Mb. For this, a 60× coverage of the genome was generated, with 20% of the reads larger than 20 kb. Data were assembled using PacBio’s hierarchical genome assembly process (HGAP),<ref>{{cite journal | vauthors = Chin CS, Alexander DH, Marks P, Klammer AA, Drake J, Heiner C, Clum A, Copeland A, Huddleston J, Eichler EE, Turner SW, Korlach J | title = Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data | journal = Nature Methods | volume = 10 | issue = 6 | pages = 563–9 | date = June 2013 | pmid = 23644548 | doi = 10.1038/nmeth.2474 }}</ref> and showed that long-read assemblies revealed a 63-fold improvement in contig size over an Illumina-only assembly.
+One of the first plant genome that used long-reads from TGS, Pacific Biosciences in combination with short reads from NGS was the genome of [[Spinach]]<ref>{{cite journal | vauthors = van Deynze A | date = 2015 |title=Using spinach to compare technologies for whole genome assemblies |journal=Plant & Animal Genomics XXIII Conference|volume=|pages=|via=}}</ref> having a genome size estimated at 989 Mb. For this, a 60× coverage of the genome was generated, with 20% of the reads larger than 20 kb. Data were assembled using PacBio’s hierarchical genome assembly process (HGAP),<ref>{{cite journal | vauthors = Chin CS, Alexander DH, Marks P, Klammer AA, Drake J, Heiner C, Clum A, Copeland A, Huddleston J, Eichler EE, Turner SW, Korlach J | title = Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data | journal = Nature Methods | volume = 10 | issue = 6 | pages = 563–9 | date = June 2013 | pmid = 23644548 | doi = 10.1038/nmeth.2474 | s2cid = 205421576 }}</ref> and showed that long-read assemblies revealed a 63-fold improvement in contig size over an Illumina-only assembly.
-Another plant genome that was recently published that used long reads in combination with short reads is the improved assembly of the apple genome.<ref>{{cite journal | vauthors = Daccord N, Celton JM, Linsmith G, Becker C, Choisne N, Schijlen E, van de Geest H, Bianco L, Micheletti D, Velasco R, Di Pierro EA, Gouzy J, Rees DJ, Guérif P, Muranty H, Durel CE, Laurens F, Lespinasse Y, Gaillard S, Aubourg S, Quesneville H, Weigel D, van de Weg E, Troggio M, Bucher E | display-authors = 6 | title = High-quality de novo assembly of the apple genome and methylome dynamics of early fruit development | journal = Nature Genetics | volume = 49 | issue = 7 | pages = 1099–1106 | date = July 2017 | pmid = 28581499 | doi = 10.1038/ng.3886 | url = https://openpub.fmach.it/bitstream/10449/42064/4/2017%20NG.pdf }}</ref> In this project a hybrid approach was used, combining different data types from sequencing technologies. The sequences used came from: PacBio RS II, Illumina paired-end reads (PE) and Illumina mate- pair reads (MP). As a first step an assembly from Illumina paired-end reads was performed using a well-known de novo assembly software SOAPdevo.<ref>{{cite journal|last=Luo|first=Ruibang | name-list-format = vanc |date=2012|title=SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler |journal=Gigascience|volume=|pages=|via=}}</ref> Then using a hybrid assembly pipeline DBG2OLC.<ref>{{cite journal | vauthors = Ye C, Hill CM, Wu S, Ruan J, Ma ZS | title = DBG2OLC: Efficient Assembly of Large Genomes Using Long Erroneous Reads of the Third Generation Sequencing Technologies | journal = Scientific Reports | volume = 6 | issue = 1 | pages = 31900 | date = August 2016 | pmid = 27573208 | pmc = 5004134 | doi = 10.1038/srep31900 }}</ref> the contigs obtained at the first step and the long reads from PacBio were combined. The assembly was then polished with the help of Illumina paired-end reads by mapping them to the contigs using BWA-MEM.<ref>{{cite journal|last=Li|first=Heng | name-list-format = vanc |date=2013|title=Aligning sequence reads, clone sequences and assembly contigs with BWA- MEM |journal=Broad Institute of Harvard and MIT |arxiv=1303.3997 }}</ref> By mapping the mate-pair reads on the corrected contigs they scaffold the assembly. Further BioNano (https://bionanogenomics.com/) optical mapping analysis with a total length of 649.7 Mb, were used in the hybrid assembly pipeline together with the scaffolds obtained from the previous step. The resulting scaffolds were anchored to a genetic map constructed from 15,417 single-nucleotide [[Polymorphism (biology)|polymorphisms]] (SNPs) markers. For better understanding of the number and diversity of genes that were identified, ribonucleic acid RNA-seq, were used.
+Another plant genome that was recently published that used long reads in combination with short reads is the improved assembly of the apple genome.<ref>{{cite journal | vauthors = Daccord N, Celton JM, Linsmith G, Becker C, Choisne N, Schijlen E, van de Geest H, Bianco L, Micheletti D, Velasco R, Di Pierro EA, Gouzy J, Rees DJ, Guérif P, Muranty H, Durel CE, Laurens F, Lespinasse Y, Gaillard S, Aubourg S, Quesneville H, Weigel D, van de Weg E, Troggio M, Bucher E | display-authors = 6 | title = High-quality de novo assembly of the apple genome and methylome dynamics of early fruit development | journal = Nature Genetics | volume = 49 | issue = 7 | pages = 1099–1106 | date = July 2017 | pmid = 28581499 | doi = 10.1038/ng.3886 | s2cid = 24690391 | url = https://openpub.fmach.it/bitstream/10449/42064/4/2017%20NG.pdf }}</ref> In this project a hybrid approach was used, combining different data types from sequencing technologies. The sequences used came from: PacBio RS II, Illumina paired-end reads (PE) and Illumina mate- pair reads (MP). As a first step an assembly from Illumina paired-end reads was performed using a well-known de novo assembly software SOAPdevo.<ref>{{cite journal|last=Luo|first=Ruibang | name-list-format = vanc |date=2012|title=SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler |journal=Gigascience|volume=1|pages=|doi=10.1186/2047-217X-1-18 |pmid=23587118 |s2cid=2681931 }}</ref> Then using a hybrid assembly pipeline DBG2OLC.<ref>{{cite journal | vauthors = Ye C, Hill CM, Wu S, Ruan J, Ma ZS | title = DBG2OLC: Efficient Assembly of Large Genomes Using Long Erroneous Reads of the Third Generation Sequencing Technologies | journal = Scientific Reports | volume = 6 | issue = 1 | pages = 31900 | date = August 2016 | pmid = 27573208 | pmc = 5004134 | doi = 10.1038/srep31900 }}</ref> the contigs obtained at the first step and the long reads from PacBio were combined. The assembly was then polished with the help of Illumina paired-end reads by mapping them to the contigs using BWA-MEM.<ref>{{cite journal|last=Li|first=Heng | name-list-format = vanc |date=2013|title=Aligning sequence reads, clone sequences and assembly contigs with BWA- MEM |journal=Broad Institute of Harvard and MIT |arxiv=1303.3997 }}</ref> By mapping the mate-pair reads on the corrected contigs they scaffold the assembly. Further BioNano (https://bionanogenomics.com/) optical mapping analysis with a total length of 649.7 Mb, were used in the hybrid assembly pipeline together with the scaffolds obtained from the previous step. The resulting scaffolds were anchored to a genetic map constructed from 15,417 single-nucleotide [[Polymorphism (biology)|polymorphisms]] (SNPs) markers. For better understanding of the number and diversity of genes that were identified, ribonucleic acid RNA-seq, were used.
 The resulted genome has a dimension of 643.2 Mb getting closer to the estimated genome size than the previous published assembly<ref name=":4" /> and a smaller number of protein-coding- genes.