From Wikipedia, the free encyclopedia
Jump to navigation Jump to search
DescriptionFor automated bacterial genome annotation and chromosomal map generation
Data types
Data input: Raw genome sequence (FASTA format), labeled genome sequence (FAST format) or predicted/labeled proteome sequence (FASTA); Data output: Fully annotated genome along with an interactive, annotated genome map
Research centerUniversity of Alberta
LaboratoryDavid S. Wishart
Primary citation[1]
Download URL
Data release
Last update 2012
Curation policyManually curated

BASys (Bacterial Annotation System) is a freely available web server that can be used to perform automated, comprehensive annotation of bacterial genomes.[2] With the advent of next generation DNA sequencing it is now possible to sequence the complete genome of a bacterium (typically ~4 million bases) within a single day. This has led to an explosion in the number of fully sequenced microbes. In fact, as of 2013, there were more than 2700 fully sequenced bacterial genomes deposited with GenBank. However, a continuing challenge with microbial genomics is finding the resources or tools for annotating the large number of newly sequenced genomes. BASys was developed in 2005 in anticipation of these needs. In fact, BASys was the world’s first publicly accessible microbial genome annotation web server. Because of its widespread popularity, the BASys server was updated in 2011 through the addition of multiple server nodes to handle the large number of queries it was receiving.

The BASys server is designed to accept either assembled genome data (raw DNA sequence data) or complete proteome assignments as input. If raw DNA sequence is provided, BASys employs Glimmer (version 2.1.3) to identify the genes.[1] The output from BASys is a comprehensive genome-wide annotation (with ~60 annotation subfields for each gene) and a zoomable, hyperlinked genome map of the query genome. BASys uses nearly 30 different programs to determine and annotate gene/protein names, GO functions, COG functions, possible paralogues and orthologues, molecular weight, isoelectric point, operon structure, subcellular localization, signal peptides, transmembrane regions, secondary structure, 3D structure, reactions and pathways. The full list of programs used by BASys is given below:

Name Method
Glimmer 2.1.3 Glimmer is a popular and very accurate ab initio gene finding program for microbial DNA. On a study of for 31 complete bacterial and archaeal genomes, Glimmer achieved an average gene prediction accuracy of 99.36%. Glimmer uses Interpolated Markov Models to distinguish coding regions from noncoding DNA. Glimmer's performance decreases with increasing GC Content. For genomes with high GC content (>60%), Glimmer may generate a high number of false positive predictions and therefore should be used with caution.
HMMER 2.3.2 Used for local Pfam Searches
Homodeller 2.0 Locally developed homology modelling program.
SignalP 3.0 Signal peptide prediction.
TMHMM 2.0 Prediction of transmembrane helices in protein.
PSIPRED 2.45 Secondary structure prediction. PSIPRED achieves an average Q3 score of 80.6% for secondary structure prediction.
PS_scan Tool for local PROSITE scans.
VADAR 1.4 Locally developed protein structure analysis tool. BASys uses VADAR to analyze protein structures for secondary structural information
PSORT-B 2.0.4 Used to predict subcellular location. PSORT-B attains a precision of 96% for Gram-positive and Gram-negative bacteria
ProteinNameExtractor 1.0 BASys function prediction module. This module was validated against a set of expertly annotated proteins from C.trachomatis.
FindParalogs 1.0 BASys module for paralog identification. The paralogs database is created from the conceptual translations for the identified coding regions supplied to BASys by Glimmer or by the submitter.
FindHomologs 1.0 BASys module for homolog identification. Searches model organism databases for possible homologs.
GOSearch 1.0 BASys module for extracting Gene Ontology information from various sources.
OperonFinder 1.0 BASys module for identifying operons.
StructureManager 1.0 BASys module for manipulating protein structure files.
StructureClassifier 1.0 BASys module for determining structure class from secondary structure information.
Structure Finder 1.0 BASys module for generating protein structures from various sources.
COG_Finder 1.0 BASys module for identifying COG functional categories and accessions
Secondary Structure Manager 1.0 BASys module for generating secondary structure information from various sources.
ECNumber_Finder BASys module for mapping EC_number to and from various sources.
SwissProt Annotation Manager 1.0 BASys module for comparing and transitively applying annotations from SwissProt records.
CCDB Annotation Manager 1.0 BASys module for comparing and transitively applying annotations from CCDB records.
Gene Identifier 1.0 BASys module for coordinating gene identification information from glimmer or user submissions
BASys Annotation Manager 1.0 The BASys pipeline manager.
KEGG Search Manager BASys module for searching and extracting metabolic information from KEGG.
SubCellLocalization Manager 1.0 BASys module for generating subcellular location annotation from various sources.

In addition to its extensive annotation for each gene/protein in the query genome, BASys also generates colorful, clickable and fully zoomable circular maps of each input chromosome. These bacterial genome maps are generated used a program called CGView (Circular Genome Viewer) which was developed in 2004.[3] The genome maps are designed to allow rapid navigation and detailed visualization of all the BASys-generated gene annotations. A complete BASys run takes approximately 16 h for an average bacterial chromosome (approximately 4 Megabases). BASys annotations may be viewed and downloaded anonymously or through a password protected access system. BASys will store its bacterial genome annotations on the server for a maximum of 180 days. BASys handles approximately 1000 submissions a year. BASys is accessible at

Scope and Access[edit]

All data in BacMap is non-proprietary or is derived from a non-proprietary source. It is freely accessible and available to anyone. In addition, nearly every data item is fully traceable and explicitly referenced to the original source. BacMap data is available through a public web interface and downloads.

See also[edit]


  1. ^ a b Van Domselaar, GH; Stothard P; Shrivastava S; Cruz JA; Guo A; Dong X; Lu P; Szafron D; Greiner R; Wishart DS (July 2005). "BASys: a web server for automated bacterial genome annotation". Nucleic Acids Res. 33 (Web Server issue): W455–9. doi:10.1093/nar/gki593. PMC 1160269. PMID 15980511.
  2. ^ Stothard P, Van Domselaar G, Shrivastava S, Guo A, O'Neill B, Cruz J, Ellison M, Wishart DS (2005). "BacMap: an interactive picture atlas of annotated bacterial genomes". Nucleic Acids Res. 33 (Database issue): D317–20. doi:10.1093/nar/gki075. PMC 540029. PMID 15608206.
  3. ^ Stothard, P; Wishart DS (2005). "Circular genome visualization and exploration using CGView". Bioinformatics. 21 (4): 537–9. doi:10.1093/bioinformatics/bti054. PMID 15479716.