Phylogenetic footprinting
Introduction
Researchers have found that non-coding pieces of DNA contain binding sites for regulatory proteins that govern the spatiotemporal expression of genes. These transcription factor binding sites (TFBS), or regulatory motifs, have proven hard to identify, primarily because they are short in length, and can show sequence variation. The importance of understanding transcriptional regulation to many fields of biology has led researchers to develop strategies for predicting the presence of transcription factor binding sites (TFBS), many of which have led to publicly available databases. One such technique is called Phylogenetic Footprinting.
Phylogenetic Footprinting is a technique used to identify TFBS within a non-coding region of DNA of interest by comparing it to the orthologous sequence in different species. It relies upon two major concepts. 1) The function and DNA binding preferences of transcription factors are well-conserved between diverse species. 2) Important non-coding DNA sequences that are essential for regulating gene expression will show differential selective pressure, where a slower rate of change occurs in those TFBS than in other parts of the non-coding genome.[1]
History
Phylogenetic footprinting was first used and published by Tagle et al in 1988, which allowed researches to predict evolutionary conserved cis-regulatory elements responsible for embryonic ε and γ globulin gene expression in primates.[2] Before phylogenetic footprinting, DNase footprinting was used, where protein would be bound to DNA transcription factor binding sites (TFBS) protecting it from DNase digestion. One of the problems with this technique was the amount of time and labor it would take. Unlike DNase footprinting, phylogenetic footprinting relies on evolutionary constraints with in the genome, with the “important” parts of the sequence being conserved among the different species. [3]
Protocol
It is important when using this technique to decide which genome your sequence should be aligned to. More divergent species will have less sequence homology between orthologous genes. Therefore, the key is to pick species that are related enough to detect homology, but divergent enough to maximize non-alignment “noise”. Step wise approach to Phylogenetic footprinting consists of several steps:
1)One should decide on the gene of interest.
2)Carefully choose species with orthologous genes.
3) Decide on the length of the upstreamor maybe downstream region to be looked at.
4)Align the sequences.
5)Look for conserved regions and analyse them.
Problems with loosing true TFBS
1)Some binding sites seem to have no significant matches in most other species.
2)Some binding sites show excellent conservation, but just in a shorter region than the ones we looked for.
3)Some binding sites show conservation but have had insertions or deletions (although it is not obvious that these sequences with insertions or deletions are still functional).
4)Some motifs are quite well conserved, but they are statistically insignificant.
5)Some transcription factors bind as dimmers. Therefore, their binding sites may consist of two conserved regions, separated by a few variable nucleotides.
Accuracy
It is important to keep in mind that not all the conserved sequences were under selective pressure. To eliminate false positives statistical analysis must be performed that will show that the motifs reported have a mutation rate meaningfully less than that of the surrounding nonfunctional sequence.[4]
References
1. Neph, S. and Tompa, M. 2006. MicroFootPrinter: a tool for phylogenetic footprinting in prokaryotic genomes. Nucleic Acids Research. 34: 366-368
2. Tagle, D. A., Koop, B. F., Goodman, M., Slightom, J. L., Hess, D., and Jones, R. T. 1988. Embryonic ε and γ globin genes of a prosimian primate (Galago crassicaudatis): nucleotide and amino acid sequences, developmental regulation, and phylogenetic footprints. J. Mol. Biol. 203:439-455.
3. Zhang, Z. and Gerstein, M. 2003. Of mice and men: phylogenetic footprinting aids the discovery of regulatory elements.J. Biol.2:11-11.4
4. Blanchette, M. and Tompa, M. 2002. Discovery of Regulatory Elements by a Computational Method for Phylogenetic Footprinting. Genome Res. 12: 739-748