Serial analysis of gene expression
Serial analysis of gene expression (SAGE) is a technique used by molecular biologists to produce a snapshot of the messenger RNA population in a sample of interest in the form of small tags that correspond to fragments of those transcripts. The original technique was developed by Dr. Victor Velculescu at the Oncology Center of Johns Hopkins University and published in 1995. Several variants have been developed since, most notably a more robust version, LongSAGE, RL-SAGE and the most recent SuperSAGE. Many of these have improved the technique with the capture of longer tags, enabling more confident identification of a source gene.
Briefly, SAGE experiments proceed as follows:
- The mRNA of an input sample (e.g. a tumour) is isolated and a reverse transcriptase and biotinylated primers are used to synthesize cDNA from mRNA.
- The cDNA is bound to Streptavidin beads via interaction with the biotin attached to the primers, and is then cleaved using a restriction endonuclease called an anchoring enzyme (AE). The location of the cleavage site and thus the length of the remaining cDNA bound to the bead will vary for each individual cDNA (mRNA).
- The cleaved cDNA downstream from the cleavage site is then discarded, and the remaining immobile cDNA fragments upstream from cleavage sites are divided in half and exposed to one of two adapter oligonucleotides (A or B) containing several components in the following order upstream from the attachment site: 1) Sticky ends with the AE cut site to allow for attachment to cleaved cDNA; 2) A recognition site for a restriction endonuclease known as the tagging enzyme (TE), which cuts about 15 nucleotides downstream of its recognition site (within the original cDNA/mRNA sequence); 3) A short primer sequence unique to either adapter A or B, which will later be used for further amplification via PCR.
- After adapter ligation, cDNA are cleaved using TE to remove them from the beads, leaving only a short "tag" of about 11 nucleotides of original cDNA (15 nucleotides minus the 4 corresponding to the AE recognition site).
- The cleaved cDNA tags are then repaired with DNA polymerase to produce blunt end cDNA fragments.
- These cDNA tag fragments (with adapter primers and AE and TE recognition sites attached) are ligated, sandwiching the two tag sequences together, and flanking adapters A and B at either end. These new constructs, called ditags, are then PCR amplified using anchor A and B specific primers.
- The ditags are then cleaved using the original AE, and allowed to link together with other ditags, which will be ligated to create a cDNA concatemer with each ditag being separated by the AE recognition site.
- These concatemers are then transformed into bacteria for amplification through bacterial replication.
- The cDNA concatemers can then be isolated and sequenced using modern high-throughput DNA sequencers, and these sequences can be analysed with computer programs which quantify the recurrence of individual tags.
The output of SAGE is a list of short sequence tags and the number of times it is observed. Using sequence databases a researcher can usually determine, with some confidence, from which original mRNA (and therefore which gene) the tag was extracted.
Statistical methods can be applied to tag and count lists from different samples in order to determine which genes are more highly expressed. For example, a normal tissue sample can be compared against a corresponding tumour to determine which genes tend to be more (or less) active.
Although SAGE was originally conceived for use in cancer studies, it has been successfully used to describe the transcriptome of other diseases and in a wide variety of organisms.
In 1979 teams at Harvard and Caltech extended the basic idea of making DNA copies of mRNAs in vitro to amplifying a library of such in bacterial plasmids
In 1982-3, the idea of selecting random or semi-random clones from such a cDNA library for sequencing was explored by Greg Sutcliffe and coworkers. and Putney et al. who sequenced 178 clones from a rabbit muscle cDNA library
In 1991 Adams and co-workers coined the term Expressed Sequence Tag (EST) and initiated more systematic sequencing of cDNAs as a project (starting with 600 brain cDNAs). The identification of ESTs has proceeded rapidly, with approximately 72.6 million ESTs now available in public databases (e.g. GenBank 11 May 2011, all species).
In 1995, the idea of reducing the tag length from 100 to 800 bp down to tag length of 10 to 22 bp helped reduce the cost of mRNA surveys.
Comparison to DNA microarrays
The general goal of the technique is similar to the DNA microarray. However, SAGE sampling is based on sequencing mRNA output, not on hybridization of mRNA output to probes, so transcription levels are measured more quantitatively than by microarray. In addition, the mRNA sequences do not need to be known a priori, so genes or gene variants which are not known can be discovered. Microarray experiments are much cheaper to perform, so large-scale studies do not typically use SAGE. Quantifying gene expressions is more exact in SAGE because it involves directly counting the number of transcripts whereas spot intensities in microarrays fall in non-discrete gradients and are prone to background noise.
MicroRNAs, or miRNAs for short, are small (~22nt) segments of RNA which have been found to play a crucial role in gene regulation. One of the most commonly used methods for cloning and identifying miRNAs within a cell or tissue was developed in the Bartel Lab and published in a paper by Lau et al. (2001). Since then, several variant protocols have arisen, but most have the same basic format. The procedure is quite similar to SAGE: The small RNA are isolated, then linkers are added to each, and the RNA is converted to cDNA by RT-PCR. Following this, the linkers, containing internal restriction sites, are digested with the appropriate restriction enzyme and the sticky ends are ligated together into concatamers. Following concatenation, the fragments are ligated into plasmids and are used to transform bacteria to generate many copies of the plasmid containing the inserts. Those may then be sequenced to identify the miRNA present, as well as analysing expression levels of a given miRNA by counting the number of times it is present, similar to SAGE.
LongSAGE and RL-SAGE
LongSAGE was a more robust version of the original SAGE developed in 2002 which had a higher throughput, using 20 μg of mRNA to generate a cDNA library of thousands of tags. Robust LongSage (RL-SAGE) Further improved on the LongSAGE protocol with the ability to generate a library with an insert size of 50 ng mRNA, much smaller than previous LongSAGE insert size of 2 μg mRNA and using a lower number of ditag polymerase chain reactions (PCR) to obtain a complete cDNA library. 
SuperSAGE is a derivative of SAGE that uses the type III-endonuclease EcoP15I of phage P1, to cut 26 bp long sequence tags from each transcript's cDNA, expanding the tag-size by at least 6 bp as compared to the predecessor techniques SAGE and LongSAGE. The longer tag-size allows for a more precise allocation of the tag to the corresponding transcript, because each additional base increases the precision of the annotation considerably.
Like in the original SAGE protocol, so-called ditags are formed, using blunt-ended tags. However, SuperSAGE avoids the bias observed during the less random LongSAGE 20 bp ditag-ligation. By direct sequencing with high-throughput sequencing techniques (next-generation sequencing, i.e. pyrosequencing), hundred thousands or millions of tags can be analyzed simultaneously, producing very precise and quantitative gene expression profiles. Therefore, tag-based gene expression profiling also called "digital gene expression profiling" (DGE) can today provide most accurate transcription profiles that overcome the limitations of microarrays.
- Velculescu VE; Zhang L; Vogelstein B; Kinzler KW. (1995). "Serial analysis of gene expression". Science. 270 (5235): 484–7. doi:10.1126/science.270.5235.484. PMID 7570003.
- Saha S, et al. (2002). "Using the transcriptome to annotate the genome". Nat Biotechnol. 20 (5): 508–12. doi:10.1038/nbt0502-508. PMID 11981567.
- Gowda M; Jantasuriyarat C; Dean RA; Wang GL. (2004). "Robust-LongSAGE (RL-SAGE): a substantially improved LongSAGE method for gene discovery and transcriptome analysis". Plant Physiol. 134 (3): 890–7. doi:10.1104/pp.103.034496. PMC . PMID 15020752.
- Matsumura H; Ito A; Saitoh H; Winter P; Kahl G; Reuter M; Krüger DH; Terauchi R. (2005). "SuperSAGE". Cell Microbiol. 7 (1): 11–8. doi:10.1111/j.1462-5822.2004.00478.x. PMID 15617519.
- Sim GK; Kafatos FC; Jones CW; Koehler MD; Efstratiadis A; Maniatis T (December 1979). "Use of a cDNA library for studies on evolution and developmental expression of the chorion multigene families". Cell. 18 (4): 1303–16. doi:10.1016/0092-8674(79)90241-1.
- Sutcliffe JG; Milner RJ; Bloom FE; Lerner RA (August 1982). "Common 82-nucleotide sequence unique to brain RNA". Proc Natl Acad Sci U S A. 79 (16): 4942–6. doi:10.1073/pnas.79.16.4942.
- Putney SD; Herlihy WC; Schimmel P (1983). "A new troponin T and cDNA clones for 13 different muscle proteins, found by shotgun sequencing". Nature. 302: 718–21. doi:10.1038/302718a0.
- Adams MD, Kelley JM, Gocayne JD, et al. (Jun 1991). "Complementary DNA sequencing: expressed sequence tags and human genome project". Science. 252 (5013): 1651–6. doi:10.1126/science.2047873. PMID 2047873.
- Saha, S., et al. (2002). "Using the transcriptome to annotate the genome." Nat Biotechnol 20(5): 508-512.
- Saha, S., et al. (2002). "Using the transcriptome to annotate the genome." Nat Biotechnol 20(5): 508-512.
- Gowda, M., et al. (2004). "Robust-LongSAGE (RL-SAGE): a substantially improved LongSAGE method for gene discovery and transcriptome analysis." Plant Physiol 134(3): 890-897.
- Matsumura, H.; Reich, S.; Ito, A.; Saitoh, H.; Kamoun, S.; Winter, P.; Kahl, G.; Reuter, M.; Krüger, D.; Terauchi, R. (2003). "Gene expression analysis of plant host-pathogen interactions by SuperSAGE". Proceedings of the National Academy of Sciences. 100 (26): 15718–15723. Bibcode:2003PNAS..10015718M. doi:10.1073/pnas.2536670100. PMC . PMID 14676315.
- Gowda, Malali; Jantasuriyarat, Chatchawan; Dean, Ralph A.; Wang, Guo-Liang (2004-03-01). "Robust-LongSAGE (RL-SAGE): A Substantially Improved LongSAGE Method for Gene Discovery and Transcriptome Analysis". Plant Physiology. 134 (3): 890–897. doi:10.1104/pp.103.034496. ISSN 1532-2548. PMC . PMID 15020752.
- Shendure, J. (2008). "The beginning of the end for microarrays?". Nature Methods. 5 (7): 585–7. doi:10.1038/nmeth0708-585. PMID 18587314.
- Matsumura, H.; Bin Nasir, K. H.; Yoshida, K.; Ito, A.; Kahl, G. N.; Krüger, D. H.; Terauchi, R. (2006). "SuperSAGE array: the direct use of 26-base-pair transcript tags in oligonucleotide arrays". Nature Methods. 3 (6): 469–74. doi:10.1038/nmeth882. PMID 16721381.