User:EmaadP92/PatternHunter
This is not a Wikipedia article: It is an individual user's work-in-progress page, and may be incomplete and/or unreliable. For guidance on developing this draft, see Wikipedia:So you made a userspace draft. Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL |
PatternHunter PatternHunter is a commercially available innovative homology search instrument that uses well-known, inventive and proprietary expertise. It was initially developed in the year 2002 by three scientists Bin Ma, John Tramp and Ming Li (Ma, Tromp and Li 440). These scientists were driven by the desire to solve the problem that many investigators face during studies that involve genomics and proteomics. These scientists realized that such studies greatly relied on homology studies that established short seed matches that were subsequently lengthened. Describing homologous genes was an essential part of most evolutionary studies and was crucial to the understanding of the evolution of gene families, the relationship between domains and families (Joseph 7). Homologous genes could only be studied effectively using search tools that established like portions or local placement between two proteins or nucleic acid sequences (Pevsner 15). Homology was quantified by scores obtained from matching sequences, “mismatch and gap scores” (Li et al. 164).
Factors Contributing to the Development of PatternHunter In comparative genomics, for example, it is necessary to compare huge chromosomes such as those found in the human genome. However, the immense expansion of genomic data introduces a predicament in the available methods of carrying out homology searches. For instance, enlarging the seed size lowers sensitivity while reducing seed size reduces the speed of calculations. Several programs have been developed to determine homology between genes. These include FASTA, the BLAST family, QUASAR, MUMmer, SENSE, SIM, and REPuter (Ma, Tromp and Li 440). They mostly use Smith-Waterman alignment technique, which compares bases against other bases, but is too slow. BLAST makes an improvement to this technique by establishing brief, precise seed matches that it later joins up to form longer alignments (Pearson 737). However, when dealing with lengthy sequences, the above-mentioned techniques are extremely sluggish and required considerable memory sizes. SENSEI, however, is more efficient than the other methods, but is incompetent in other forms of alignment as its strength lies in handling ungapped alignments. The quality of the production from Megablast, on the other hand, is of poor quality and does not adapt well to large sequences. Techniques such as MUMmer and QUASAR employ suffix trees, which are supposed to handle exact matches. However, these methods can only apply to the comparison of sequences that display elevated similarities. All the above-mentioned problems necessitate the development of a fast reliable tool that can handle all types of sequences efficiently without consuming too many resources in a computer. PatternHunter is the ultimate solution to all these problems.
Approach PatternHunter utilizes numerous seeds (tiny search strings) with optimal intervals between them. Searches that employ seeds are extremely fast because they only determine homology in places where hits are established. The sensitivity of a search string is greatly influenced by the amount of space between adjacent strings. Large seeds are unable to find isolated homologies, whereas small ones generate numerous arbitrary hits that delay computation. PatternHunter strikes a delicate balance in this area by providing optimal spacing between search strings. It uses alternate k (k is equal to eleven) letters as seeds in contrast with BLAST, which utilizes successive k letters as seeds. The first stage in PatternHunter analysis entails a filtering phase where the program hunts for matches in k alternating points as denoted by the most advantageous pattern (Zhang 11). The second stage is the alignment phase, which is identical to BLAST. In addition, it is possible to use more than one seed at a go with PatternHunter. This elevates the sensitivity of the tool without interfering with its speed.
Speed PatternHunter takes a short time to analyze all types of sequences. On a modern computer, it can take a few seconds to handle prokaryotic genomes, minutes to process Arabidopsis sequences and several hours to process a human chromosome (Ma, Tromp and Li 440). When compared to other tools, PatternHunter exhibits speeds that are approximately a hundred times faster than BLAST and Mega BLAST (“PatternHunter” 1 par. 2). These speeds are 3000-fold those attained from a Smith-Waterman algorithm. In addition, the program has a user-friendly interface that allows one to customize the search parameters.
Sensitivity In terms of sensitivity, it is possible to attain the optimum sensitivity with PatternHunter while still retaining the same speed as a conventional BLAST search.
Specifications The designing of PatternHunter uses Java technology. Consequently, the program runs smoothly when installed in any Java 1.4 environments (“PatternHunter” 2).
Future Advances Homology search is a very lengthy procedure that requires a lot of time. Challenges still remain in handling DNA-DNA searches as well as translated DNA-protein searches because of the vast sizes of databases and the tiny query that is used. PatternHunter has been improved to an upgraded PatternHunter II version, which hastens DNA-protein searches a hundredfold without altering the sensitivity. However, there are plans to improve PatternHunter to attain the high sensitivity of the Smith - Waterman tool while obtaining Blastp pace. A novel translated PatternHunter that intends to hasten tBlastx (Li et al. 174) is also in the developmental stages.
References
[edit]<Joseph, M Jacob 2012, On the identification and investigation of homologous gene families, with particular emphasis on the accuracy of multidomain families. PDF File. 26 Nov. 2013. <http://reports-archive.adm.cs.cmu.edu/anon/lane/CMU-CB-12-103.pdf> Li, Ming, Bin Ma, Derek Kisman, John Tromp. “PatternHunter II: Highly Sensitive and Fast Homology Search.” Genome Informatics. 14.2003(2003): 164-175. Web. 30 Nov. 2013. Louxin, Zhang. Sequence Database Search Techniques I: Blast and PatternHunter tools. Web. 30 Nov. 2013. <http://www.bii.astar.edu.sg/.../Sequence%20Database%20Search%20Techniqu...>. Ma, Bin, John Tromp, and Ming Li. “PatternHunter: Faster and More Sensitive Homology Search.” Bioinformatics. 18.2(2002): 440-445. Web. 30 Nov. 2013. PatternHunter n.d. Web. 30 Nov. 2013. <www.bioinfor.com/images/stories/pdf/patternhunterbrochure.pdf>. Pearson, W. R. “Searching Protein Sequence Libraries: Comparison of the Sensitivity and Selectivity of the Smith-Waterman and FASTA Algorithms.” Genomics. 11.1999 (1999): 635-650. Pevsner, Jonathan. Bioinformatics and Functional Genomics. New Jersey: Wiley Blackwell, 2009. Print. >