Hypothetical protein

From Wikipedia, the free encyclopedia
Jump to: navigation, search

In biochemistry, a hypothetical protein is a protein whose existence has been predicted, but for which there is a lack of experimental evidence that it is expressed in vivo. Sequencing of several genomes has resulted in numerous predicted open reading frames to which functions cannot be readily assigned. These proteins, either orphan or conserved hypothetical proteins, make up ~ 20% to 40% of proteins encoded in each newly sequenced genome.[1] Even when there is enough evidence that the product of the gene is expressed, by techniques such as microarray and mass-spectrometry, it is difficult to assign a function to it given its lack of identity to protein sequences with annotated biochemical function. Nowadays, most protein sequences are inferred from computational analysis of genomic DNA sequence. Hypothetical proteins are created by gene prediction software during genome analysis. When the bioinformatic tool used for the gene identification finds a large open reading frame without a characterised homologue in the protein database, it returns "hypothetical protein" as an annotation remark.

The function of a hypothetical protein can be predicted by domain homology searches with various confidence levels. Conserved domains are available in the hypothetical proteins which need to be compared with the known family domains by which hypothetical protein could be classified into particular protein families even though they have not been in vivo investigated. The function of hypothetical protein could also be predicted by homology modelling, in which hypothetical protein has to align with known protein sequence whose three dimensional structure is known and by modelling method if structure predicted then the capability of hypothetical protein to function could be ascertained computationally.

Further, approaches to annotate function to hypothetical proteins include determination of 3-dimensional structure of these proteins by structural genomics initiatives, understanding the nature and mode of prosthetic group/metal ion binding, fold similarity with other proteins of known functions and annotating possible catalytic site and regulatory site.[2] Structure prediction with biochemical function assessment by screening for various substrate is another promising approach to annotate function[3]

See also[edit]


  1. ^ Galperin MY (2001). "Conserved 'hypothetical' proteins: new hints and new puzzles.". Comp Funct Genomics. 2 (1): 14–18. doi:10.1002/cfg.66. PMC 2447192Freely accessible. PMID 18628897. 
  2. ^ Eisenstein E; et al. (2000). "Biological function made crystal clear - annotation of hypothetical proteins via structural genomics.". Curr Opin Biotechnol. 11 (1): 25–30. doi:10.1016/j.exppara.2015.01.013. PMID 10679350. 
  3. ^ Srinivasan B; et al. (2015). "Prediction of substrate specificity and preliminary kinetic characterization of the hypothetical protein PVX_123945 from Plasmodium vivax.". Exp Parasitol. 151-152: 56–63. doi:10.1016/j.exppara.2015.01.013. PMID 25655405. 

External links[edit]