Human Genome Project

The Human Genome Project (HGP) is a project to map the human genome down to the nucleotide (or base pair) level and to identify all the genes present in it.

History and ongoing developments

The Project was launched in 1986 by Charles DeLisi, who was then Director of the US Department of Energy's Health and Environmental Research Programs. The goals and general strategy of the Project were outlined in a two-page memo to the Assistant Secretary in April 1986, which helped garner support from the DOE, the United States Office of Management and Budget (OMB) and the United States Congress, especially Senator Pete Domenici. A series of Scientific Advisory meetings, and complex negotiations with senior Federal officials resulted in a line item for the Project in the 1987 Presidential budget submission to the Congress.

Initiation of the Project was the culmination of several years of work supported by the US Department of Energy, in particular a feasibility workshop in 1986 and a subsequent detailed description of the Human Genome Initiative in a report that led to the formal sanctioning of the initiative by the Department of Energy^[1]. This 1987 report stated boldly, "The ultimate goal of this initiative is to understand the human genome" and "Knowledge of the human genome is as necessary to the continuing progress of medicine and other health sciences as knowledge of human anatomy has been for the present state of medicine." Candidate technologies were already being considered for the proposed undertaking at least as early as 1985^[2].

James D. Watson was Head of the National Center for Human Genome Research at the National Institutes of Health (NIH) in the United States starting from 1988. Largely due to his disagreement with his boss, Bernadine Healy, over the issue of patenting genes, he was forced to resign in 1992. He was replaced by Francis Collins in April 1993 and the name of the Center was changed to the National Human Genome Research Institute (NHGRI) in 1997.

The $3-billion project was formally founded in 1990 by the United States Department of Energy and the U.S. National Institutes of Health, and was expected to take 15 years. In addition to the United States, the international consortium comprised geneticists in China, France, Germany, Japan, and the United Kingdom.

Due to widespread international cooperation and advances in the field of genomics (especially in sequence analysis), as well as huge advances in computing technology, a rough draft of the genome was finished in 2003 (announced jointly by former US president Bill Clinton and British Prime Minister Tony Blair on June 26, 2003), two years earlier than planned.

President Clinton had already awarded the Citizen's medal to DeLisi for his seminal role in the Project, in January 2003, before the completion of the Project was announced.

In May 2006, another milestone was passed on the way to completion of the project, when the sequence of the last chromosome was published in the journal Nature. [1]

The project is still ongoing and not complete. Further anticipated announcements will be on areas like the role of "junk DNA" and reading and using the sequence of the human genome.

The role of Celera Genomics

In 1998, an identical, privately funded quest was launched by the American researcher Craig Venter and his firm Celera Genomics. The $300 million Celera effort was intended to proceed at a faster pace and at a fraction of the cost of the roughly $3 billion taxpayer-funded project.

Celera used a newer, riskier technique called whole genome shotgun sequencing, which had been used to sequence bacterial genomes.

Celera initially announced that it would seek patent protection on "only 200-300" genes, but later amended this to seeking "intellectual property protection" on "fully-characterized important structures" amounting to 100-300 targets. Contrary to its public promises, the firm eventually filed patent applications on 6,500 whole or partial genes.

Celera also promised to publish their findings in accordance with the terms of the 1996 "Bermuda Statement," by releasing new data quarterly (the HGP released its new data daily), although, unlike the publicly-funded project, they would not permit free redistribution or commercial use of the data.

In March 2000, President Clinton announced that the genome sequence could not be patented, and should be made freely available to all researchers. The statement sent Celera's stock plummeting and dragged down the biotech-heavy Nasdaq. The biotech sector lost about $50 billion in market capitalization in two days.

Although the working draft was announced in June 2000, it was not until February 2001 that Celera and the HGP scientists published details of their drafts. Special issues of Nature (which published the publicly-funded project's scientific paper) and Science (which published Celera's paper) described the methods used to produce the draft sequence and offered analysis of the sequence. These drafts are hoped to comprise a 'scaffold' of 90% of the genome, with gaps to be filled later.

The competition proved to be very good for the project. The rivals agreed to pool their data, but the agreement fell apart when Celera refused to deposit its data in the unrestricted public database GenBank. Celera had incorporated the public data into their genome, but forbade the public effort to use Celera data.

On 14 April 2003, a joint press release announced that the project had been completed by both groups, with 99% of the genome sequenced with 99.99% accuracy.

Each draft sequence has been checked at least four to five times to increase 'depth of coverage' or accuracy. About 47% of the draft were high-quality sequences. The final version will have been checked eight to nine times giving an error rate of 1 in 10,000 bases.

HGP is one of several international genome projects aimed at sequencing the DNA of a specific organism. While the human DNA sequence offers the most tangible benefits, important developments in biology and medicine are predicted as a result of the sequencing of model organisms, including mice, fruit flies, Zebra Danio, yeast, nematodes, and many microbial organisms and parasites.

In October 2004, researchers from the International Human Genome Sequencing Consortium (IHGSH) of the HGP announced a new estimate of 20,000 to 25,000 genes in the human genome^[3]. Previously 30,000 to 40,000 had been predicted, while estimates at the start of the project reached up to as high as 2,000,000.

Goals

The goals of the original HGP were not only to determine all 3 billion base pairs in the human genome with a minimal error rate, but also to identify all the genes in this vast amount of data. This part of the project is still ongoing although a preliminary count indicates about 30,000 genes in the human genome, which is far fewer than predicted by most scientists.

Another goal of the HGP was to develop faster, more efficient methods for DNA sequencing and sequence analysis and the transfer of these technologies to industry.

The sequence of the human DNA is stored in databases available to anyone on the Internet. The U.S. National Center for Biotechnology Information (and sister organizations in Europe and Japan) house the gene sequence in a database known as Genbank, along with sequences of known and hypothetical genes and proteins. Other organizations such as the University of California, Santa Cruz [2], and ENSEMBL present additional data and annotation and powerful tools for visualizing and searching it. Computer programs have been developed to analyse the data, because the data themselves are difficult to interpret without them.

The process of identifying the boundaries between genes and other features in raw DNA sequence is called genome annotation and is the domain of bioinformatics. While expert biologists make the best annotators, their work proceeds slowly, and computer programs are increasingly used to meet the high-throughput demands of genome sequencing projects. The best current technologies for annotation make use of statistical models that take advantage of parallels between DNA sequences and human language, using concepts from computer science such as formal grammars.

Another, often overlooked, goal of the HGP is the study of its ethical, legal, and social implications. It is important to research these issues and find the most appropriate solutions before they become large dilemmas whose effect will manifest in the form of major political concerns.

All humans have unique gene sequences, therefore the data published by the HGP does not represent the exact sequence of each and every individual's genome. It is the combined genome of a small number of anonymous donors. The HGP genome is a scaffold for future work in identifying differences between individuals. Most of the current effort in identifying differences between individuals involves single nucleotide polymorphisms and the HapMap.

Benefits

The work on interpretation of genome data is still in its initial stages. It is anticipated that detailed knowledge of the human genome will provide new avenues for advances in medicine and biotechnology. Clear practical results of the project emerged even before the work was finished. For example, a number of companies, such as Myriad Genetics started offering easy ways to administer genetic tests that can show predisposition to a variety of illnesses, including breast cancer, disorders of hemostasis, cystic fibrosis, liver diseases and many others. Also, the etiologies for cancers, Alzheimer's disease and other areas of clinical interest are considered likely to benefit from genome information and possibly may lead in the long term to significant advances in their management.

There are also many tangible benefits for biological scientists. For example, a researcher investigating a certain form of cancer may have narrowed down his search to a particular gene. By visiting the human genome database on the worldwide web, this researcher can examine what other scientists have written about this gene, including (potentially) the three-dimensional structure of its product, its function(s), its evolutionary relationships to other human genes, or to genes in mice or yeast or fruit flies, possible detrimental mutations, interactions with other genes, body tissues in which this gene is activated, diseases associated with this gene or other datatypes.

Further, deeper understanding of the disease processes at the level of molecular biology may determine new therapeutic procedures. Given the established importance of DNA in molecular biology and its central role in determining the fundamental operation of cellular processes, it is likely that expanded knowledge in this area will facilitate medical advances in numerous areas of clinical interest that may not have been possible without them.

The analysis of similarities between DNA sequences from different organisms is also opening new avenues in the study of the theory of evolution. In many cases, evolutionary questions can now be framed in terms of molecular biology; indeed, many major evolutionary milestones (the emergence of the ribosome and organelles, the development of embryos with body plans, the vertebrate immune system) can be related to the molecular level. Many questions about the similarities and differences between humans and our closest relatives (the primates, and indeed the other mammals) are expected to be illuminated by the data from this project.

The Human Genome Diversity Project, spinoff research aimed at mapping the DNA that varies between human ethnic groups, which was rumored to have been halted, actually did continue and to date has yielded new conclusions. In the future, HGDP could possibly expose new data in disease surveillance, human development and anthropology. HGDP could unlock secrets behind and create new strategies for managing the vulnerability of ethnic groups to certain diseases (see race in biomedicine). It could also show how human populations have adapted to these vulnerabilities.

Whose genome was sequenced?

This answer is posted as supplied by Dr. Marvin Stodolsky, U.S. DOE Office of Biological and Environmental Research, Office of Science. This statement is believed to be in the public domain since it is a work of the United States government.

Whose genome was sequenced in the public (HGP) and private project?

The human genome reference sequences do not represent any one person’s genome. Rather, they serve as a starting point for broad comparisons across humanity. The knowledge obtained is applicable to everyone because all humans share the same basic set of genes and genomic regulatory regions that control the development and maintenance of their biological structures and processes.

In the international public-sector Human Genome Project (HGP), researchers collected blood (female) or sperm (male) samples from a large number of donors. Only a few of many collected samples were processed as DNA resources. Thus the donor identities were protected so neither donors nor scientists could know whose DNA was sequenced. DNA clones from many different libraries were used in the overall project.

Technically, it is much easier to prepare DNA cleanly from sperm than from other cell types because of the much higher ratio of DNA to protein in sperm and the much smaller volume in which purifications can be done. Using sperm does provide all chromosomes for study, including equal numbers of sperm with the X (female) or Y (male) sex chromosomes. However, HGP scientists also used white cells from the blood of female donors so as to include female-originated samples.

In the Celera Genomics private-sector project, DNAs from a few different genomes were mixed up and processed for sequencing. The DNA resources used for these studies came from anonymous donors of European, African, American (North, Central, South), and Asian ancestry. The lead scientist of Celera Genomics at that time, Craig Venter, has since acknowledged that his DNA was one of those in the pool.

Many small regions of DNA that vary among individuals (called polymorphisms) also were identified during the HGP, mostly single nucleotide polymorphisms (SNPs). Most SNPs are without physiological effect, although a minority contribute to the delightful and beneficial diversity of humanity. A much smaller minority of polymorphisms affect an individual’s susceptibility to disease and response to medical treatments.

Although the HGP has been completed, SNP studies continue in the International HapMap Project, whose goal is to identify patterns of SNP groups (called haplotypes, or “haps”). The DNA samples for the HapMap came from a total of 270 individuals: Yoruba people in Ibadan, Nigeria; Japanese in Tokyo; Han Chinese in Beijing; and the French Centre d’Etude du Polymorphisme Humain (CEPH) resource, which consisted of residents of the United States having ancestry from Western and Northern Europe.^[4]

References

^ Barnhart, Benjamin J. (1989). "DOE Human Genome Program". Human Genome Quarterly. 1: 1. Retrieved 2005-02-03.
^ DeLisi, Charles (2001). "Genomes: 15 Years Later A Perspective by Charles DeLisi, HGP Pioneer". Human Genome News. 11: 3–4. Retrieved 2005-02-03.
^ IHGSH (2004). "Finishing the euchromatic sequence of the human genome". Nature. 431: 931–945.
^ Stodolsky, Dr. Marvin Oak Ridge National Laboratory Website

DNA Testing Goes DIY, Associated Press via Wired News, March 07, 2005.

External links

National Human Genome Research Institute (NHGRI). NHGRI led the National Institutes of Health's (NIH's) contribution to the International Human Genome Project. This project, which had as its primary goal the sequencing of the 3 billion base pairs that make up human genome, was successfully completed in April 2003.

Human Genome News. Published from 1989 to 2002 by the US Department of Energy, this newsletter was a major communications method for coordination of the Human Genome Project. Complete online archives are available.
Project Gutenberg hosts e-texts for Human Genome Project, titled Human Genome Project, Chromosome Number # (# denotes 01-22, X and Y). This information is raw sequence, released in November 2002; access to entry pages with download links is available through http://www.gutenberg.org/etext/3501 for Chromosome 1 sequentially to http://www.gutenberg.org/etext/3524 for the Y Chromosome. Note that this sequence might not be considered definitive due to ongoing revisions and refinements. In addition to the chromosome files, there is a supplementary information file dated March 2004 which contains additional sequence information.
The HGP information pages
Ensembl project, an automated annotation system and browser for the human genome
UCSC genome browser, This site contains the reference sequence and working draft assemblies for a large collection of genomes. It also provides a portal to the ENCODE project.
Nature magazine's human genome gateway, including the HGP's paper on the draft genome sequence
Wellcome charitable trust description of HGP "Your Genes, your health, your future".
Learning about the Human Genome. Part 1: Challenge to Science Educators. ERIC Digest.
Learning about the Human Genome. Part 2: Resources for Science Educators. ERIC Digest.
Patenting Life by Merrill Goozner
Prepared Statement of Craig Venter of Celera Venter discusses Celera's progress in deciphering the human genome sequence and its relationship to healthcare and to the federally funded Human Genome Project.
Cracking the Code of Life Companion website to 2-hour NOVA program documenting the race to decode the genome, including the entire program hosted in 16 parts in either Quicktime or RealPlayer format.

[1] Barnhart, Benjamin J. (1989). "DOE Human Genome Program". Human Genome Quarterly. 1: 1. Retrieved 2005-02-03.

[2] DeLisi, Charles (2001). "Genomes: 15 Years Later A Perspective by Charles DeLisi, HGP Pioneer". Human Genome News. 11: 3–4. Retrieved 2005-02-03.

[3] IHGSH (2004). "Finishing the euchromatic sequence of the human genome". Nature. 431: 931–945.

[4] Stodolsky, Dr. Marvin Oak Ridge National Laboratory Website

[1]

[2]

[3]

[4]

v t e Genomics
Fields	Cognitive genomics Computational genomics Comparative genomics Functional genomics Genome project Human Genome Project Metagenomics Human Microbiome Project Pangenomics Personal genomics Population genomics Sociogenomics Structural genomics
Bioinformatics	Biochip Cheminformatics Chemogenomics Connectomics Human Connectome Project Epigenomics Human Epigenome Project Glycomics Immunomics Lipidomics Metabolomics Microbiomics Nutrigenomics Paleopolyploidy Pharmacogenetics Pharmacogenomics Systems biology Toxicogenomics Transcriptomics
Structural biology	Proteomics Human proteome project Call-map proteomics Structure-based drug design Expression proteomics
Research tools	2-D electrophoresis Mass spectrometer Electrospray ionization Matrix-assisted laser desorption ionization Matrix-assisted laser desorption ionization-time of flight mass spectrometer Microfluidic-based tools Isotope affinity tags Chromosome conformation capture
Organizations	DNA Data Bank of Japan (JP) European Molecular Biology Laboratory (EU) National Institutes of Health (USA) Wellcome Sanger Institute (UK)
List Category