List of gene prediction software

From Wikipedia, the free encyclopedia
Jump to: navigation, search

This is a list of software tools and web portals used for gene prediction.

Ab initio methods[edit]

Name Description Species Links References
ATGpr Identifies translational initiation sites in cDNA sequences
AUGUSTUS Eukaryote gene predictor Eukaryotes Predict Train AUGUSTUS [1]
BGF Hidden Markov model (HMM) and dynamic programming based ab initio gene prediction program webserver
DIOGENES Fast detection of coding regions in short genome sequences
Dragon Promoter Finder Program to recognize vertebrate RNA polymerase II promoters
EUGENE Integrative gene finding Eukaryotes, prokaryotes EuGene webserver [2]
FGENESH HMM-based gene structure prediction: multiple genes, both chains Eukaryotes webserver
FRAMED Find genes and frameshift in G+C rich prokaryote sequences Prokaryotes webserver [3]
GENIUS Links ORFs in complete genomes to protein 3D structures
geneid Program to predict genes, exons, splice sites, and other signals along DNA sequences Eukaryotes webserver
GENEPARSER Parse DNA sequences into introns and exons
GeneMark Family of gene predicting programs Prokaryotes+Eukaryotes webserver [4]
GeneTack Predicts genes with frameshifts in prokaryote genomes Prokaryotes webserver [5]
GENOMESCAN Predicts locations and exon-intron structures of genes in genome sequences from a variety of organisms webserver
GENSCAN Finds genes using Fourier transform webserver [6]
GLIMMER Finds genes in microbial DNA Prokaryotes sourcecode webserver
GLIMMERHMM Eukaryotic gene-finding system Eukaryotes webserver [7]
GrailEXP Predicts exons, genes, promoters, polyas, CpG islands, EST similarities, and repeat elements in DNA sequence
mGene Support-vector machine (SVM) based system to find genes Eukaryotes webserver [8]
mGene.ngs SVM based system to find genes using heterogeneous information: RNA-seq, tiling arrays Eukaryotes [9]
MORGAN Decision tree system to find genes in vertebrate DNA Eukaryotes
NNPP Neural network promoter prediction
NNSPLICE Neural network splice site prediction
ORF FINDER Graphical analysis tool to find all open reading frames
Regulatory Sequence Analysis Tools Series of modular computer programs to detect regulatory signals in non-coding sequences
SPLICEPREDICTOR Method to identify potential splice sites in (plant) pre-mRNA by sequence inspection using Bayesian statistical models Eukaryotes
VEIL Hidden Markov model to find genes in vertebrate DNA Server Eukaryotes

See also[edit]


  1. ^ Keller O, Kollmar M, Stanke M, Waack S. A novel hybrid gene prediction method employing protein multiple sequence alignments. Bioinformatics. 2011 Mar 15;27(6):757-63.
  2. ^ Foissac, S., Gouzy, J.P., Rombauts, S., Mathé, C., Amselem, J., Sterck, L., Van de Peer, Y., Rouzé, P., Schiex, T. (2008) Genome Annotation in Plants and Fungi: EuGene as a model platform. Curr. Bioinform. 3, 87-97
  3. ^ Schiex T, Gouzy J, Moisan A, de Oliveira Y. FrameD: A flexible program for quality check and gene prediction in prokaryotic genomes and noisy matured eukaryotic sequences. Nucleic Acids Res. 2003 Jul 1;31(13):3738-41
  4. ^ Lukashin A. and Borodovsky M. (1998). "GeneMark.hmm: new solutions for gene finding". Nucleic Acids Research. 26 (4): 1107–1115. doi:10.1093/nar/26.4.1107. PMC 147337Freely accessible. PMID 9461475. 
  5. ^ Antonov I. and Borodovsky M. (2010). "Genetack: frameshift identification in protein-coding sequences by the Viterbi algorithm". J Bioinform Comput Biol. 8 (3): 535–51. doi:10.1142/S0219720010004847. PMID 20556861. 
  6. ^ Burge C, Karlin S (1997). "Prediction of complete gene structures in human genomic DNA". J. Mol. Biol. 268 (1): 78–94. doi:10.1006/jmbi.1997.0951. PMID 9149143. 
  7. ^ Majoros, W.H., Pertea, M., and Salzberg, S.L. TigrScan and GlimmerHMM: two open-source ab initio eukaryotic gene-finders Bioinformatics 20 2878-2879
  8. ^ Schweikert, G., Zien, A., et al., Rätsch, G., mGene: Accurate SVM-based gene finding with an application to nematode genomes, Genome Res. Nov 2009; 19(11): 2133–2143.
  9. ^ Gan, X, Stegle, O., Behr, J, et al., Rätsch, G., Mott, R. Multiple reference genomes and transcriptomes for Arabidopsis thaliana, Nature 477, 419–423.