Jump to content

Cancer Genome Anatomy Project

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Nervousnerve (talk | contribs) at 14:29, 6 September 2014. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

The Cancer Genome Anatomy Project (CGAP), created by the National Cancer Institute in 1997 and introduced by Al Gore, is an online database on normal, pre-cancerous and cancerous genomes. It also provides tools for viewing and analysis of the data, allowing for identification of genes involved in various aspects of tumor progression. The goal of CGAP is to decipher the molecular anatomy of the cancerous cell, with all data generated readily available to the public with no restrictions. There is also a focus on development of software tools that improve the usage of large and complex datasets [1][2]. The project is directed by Daniela S. Gerhard, and includes sub-projects, with notable ones including the Cancer Chromosome Aberration Project (CCAP) and the Genetic Annotation Initiative (GAI).

The eventual outcomes of CGAP include establishing a correlation between a particular cancer's progression with its therapeutic outcome, improved evaluation of treatment and development of novel techniques for prevention, detection and treamtent. This is achieved by characterisation of biological tissue mRNA products.

Research

Background

The fundamental cause of cancer is the inability for a cell to regulate its gene expression. To characterise a specific type of cancer, the proteins that are produced from the altered gene expression or the mRNA precursor to the protein can be examined. CGAP works to associate a particular cell's expression profile, or molecular signature, which is essentially the cell's fingerprint, with the cell's phenotype. Therefore expression profiles exist with consideration to cancer type and stage of progression [3].

Sequencing

CGAP's initial goal was to establish a Tumor Gene Index (TGI) to store the expression profiles. This would have contributions to both new and existing databases [4]. This produced two sets of libraries, the dbEST and later SAGE. This was performed in a series of steps [3]:

  • Cell contents are washed over plates with poly T sequences. This will bind Poly-A tails that exist only on mRNA molecules, therefore selectively keeping mRNA.
  • The isolated mRNA is processed into a cDNA transcript through reverse transcription and DNA polymerisation reactions.
  • The resulting double stranded DNA is then incorporated into E.coli plasmids. Each bacterium now contains one unique cDNA and is replicated to produce clones with the same genetic information. This is termed a cDNA library.
  • The library can then sequenced by high-throughput sequencing techniques. This can characterise both the different genes expressed by the original cell and the amount of expression of each gene.

The TGI focused on prostate, breast, ovarian, lung and colon cancers at first, and CGAP extended to other cancers in its research. Practically, issues arose which CGAP accounted for as new technologies became available.

Many cancers occur in tissues with multiple cell types. Traditional techniques took the whole tissue sample and produced bulk tissue cDNA libraries. This cellular heterogeneity made gene expression information in terms of cancer biology less accurate. An example is prostate cancer tissue, where epithelial cells have been shown to give rise to cancer, only consist 10% of the cell count. This lead to development of laser capture microdissection (LCM), which can isolate individual cell types individual cells, which gave rise to cDNA libraries of specific cell types [4].

The sequencing of cDNA will produce the entire mRNA transcript that generated it. Practically, only part of the sequence is required to uniquely identify the mRNA or protein associated. The resultant part of the sequence was termed the expressed sequence tag (EST) and is always at the end of the sequence close to the poly A tail. EST data are stored in a database called dbEST. ESTs only need to be around 400 bases long, but with NGS sequencing techniques this will still produce low quality reads. Therefore, another method called serial analysis of gene expression(SAGE) is also used. This method identifies, for each cDNA transcript molecule produced from a cell's gene expression, region only 10-14 bases long anywhere along the read sequence, sufficient to uniquely identify that cDNA transcript. These bases are cut out and linked together, then incorporated into bacterial plasmids as mentioned above. SAGE libraries have better read quality and generate a larger amount of data when sequenced [4].

Resources

Following sequencing and establishment of libraries, CGAP incorporates the data along with existing data sources and provides various databases and tools for analysis.

Digital Differential Display

An early technique used by CGAP is digital differential display (DDD), which uses the Fisher exact test to compare libraries against each other, in order to find a significant difference between populations. CGAP ensured that DDD was able to compare between all cDNA libraries in dbEST, and not just those which were generated by CGAP [4].

Genomic Annotation Initiative

The goal of the Cancer Genome Anatomy Project Genome Annotation Initiative (CGAP-GAI) is to discover and catalogue single nucleotide polymorphisms (SNPs) that correlate with cancer initiation and progression [4]. CGAP-GAI have created a variety of tools for the discovery, analysis and display of SNPs. SNPs are valuable in cancer research as they can be used in several different genetic studies, commonly to track transmission, identify alternate forms of genes and analyze complex molecular pathways that regulate cell metabolism, growth, or differentiation [5].

SNPs in the CGAP-GAI are either found as a result of resequencing genes of interest in different individuals or looking through existing human EST databases and making comparisons [2]. It examines transcripts from healthy individuals, individuals with disease, tumour tissue and cell lines from a large set of individuals therefore the database is more likely to include rare disease mutations in addition to high frequency variants [6]. A common challenge with SNP detection is differentiation between sequencing errors with actual polymorphisms. SNPs that are found undergo statistical analysis using the CGAP SNP pipeline to calculate the probability that the variant is in fact a polymorphism. High probability SNPs are validated and there are tools available that make predictions as to whether function is altered [2].

To make the data easily accessible CGAP-GAI has a number of tools which can display both a sequence alignment and assembly overview with context to sequences from which they were predicted. SNPs are annotated and integrated genetic/physical maps are often determined [6].

Cancer Chromosomal Aberration Project(CCAP)

Genomic instability is a common feature of cancer; therefore understanding structural and chromosomal abnormalities can give insight into the progression of disease. The Cancer Chromosome Aberration Project (cCAP) is a CGAP supported initiative used for defining chromosome structure and to characterize rearrangements that are associated with malignant transformation [7][4]. It incorporates the online version of Mitelman’s database, created by Felix Mitelman, Bertil Johansson and Fredrik Mertens prior to the creation of CGAP, another compilation of known chromosomal rearrangements. The CCAP has several goals: [7]

• Integration of cytogenetic and physical maps of the human genome • Generate a clone repository of BAC clones across the genome that are genetically and physically mapped • Develop a platform for parallel database correlation of cancer associated aberrations (FISH-mapped BAC clone database) • Integrating three cytogenetic analyses techniques (spectral karyotyping, comparative genome hybridization, and FISH) to refine defining nomenclature for karyotypic aberrations.

There is cytogenetic information from over 64,000 patient cases, including more than 2000 gene fusions, contained in the database [1].

See also

References

  1. ^ a b Riggins, G. J. (2001). "Genome and genetic resources from the Cancer Genome Anatomy Project". Human Molecular Genetics. 10 (7): 663–667. doi:10.1093/hmg/10.7.663. ISSN 1460-2083.
  2. ^ a b c Strausberg, Robert L.; Buetow, Kenneth H.; Emmert-Buck, Michael R.; Klausner, Richard D. (2000). "The Cancer Genome Anatomy Project: building an annotated gene index". Trends in Genetics. 16 (3): 103–106. doi:10.1016/S0168-9525(99)01937-X. ISSN 0168-9525.
  3. ^ a b "Understanding Cancer". Retrieved 2014-09-04.
  4. ^ a b c d e f Krizman, David B.; Wagner, Lukas; Lash, Alex; Strausberg, Robert L.; Emmert-Buck, Michael R. (1999). "The Cancer Genome Anatomy Project: EST Sequencing and the Genetics of Cancer Progression". Neoplasia. 1 (2): 101–106. doi:10.1038/sj.neo.7900002. ISSN 1476-5586.
  5. ^ Clifford, R. (2000). "Expression-based Genetic/Physical Maps of Single-Nucleotide Polymorphisms Identified by the Cancer Genome Anatomy Project". Genome Research. 10 (8): 1259–1265. doi:10.1101/gr.10.8.1259. ISSN 1088-9051.
  6. ^ a b Clifford, Robert J.; Edmonson, Michael N.; Nguyen, Cu; Scherpbier, Titia; Hu, Ying; Buetow, Kenneth H. (2004). "Bioinformatics Tools for Single Nucleotide Polymorphism Discovery and Analysis". Annals of the New York Academy of Sciences. 1020 (1): 101–109. doi:10.1196/annals.1310.011. ISSN 0077-8923.
  7. ^ a b "The Cancer Chromosome Aberration Project (CCAP)". Retrieved 2014-09-05.


External Links