UCSC Genome Browser
|Description||The UCSC Genome Browser|
|Research center||University of California Santa Cruz|
|Laboratory||Center for Biomolecular Science and Engineering, Baskin School of Engineering,|
|Primary citation||Tyner & al. (2016)|
The UCSC Genome Browser is an on-line genome browser hosted by the University of California, Santa Cruz (UCSC). It is an interactive website offering access to genome sequence data from a variety of vertebrate and invertebrate species and major model organisms, integrated with a large collection of aligned annotations. The Browser is a graphical viewer optimized to support fast interactive performance and is an open-source, web-based tool suite built on top of a MySQL database for rapid visualization, examination, and querying of the data at many levels. The Genome Browser Database, browsing tools, downloadable data files, and documentation can all be found on the UCSC Genome Bioinformatics website.
- 1 History
- 2 Genomes
- 3 Browser functionality
- 4 Variation data
- 5 Analysis tools
- 6 Creating spreadsheet links to UCSC Genome Browser views
- 7 Customizing Excel links
- 8 Customizing direct links
- 9 Open source / mirrors
- 10 See also
- 11 References
- 12 External links
Initially built and still managed by Jim Kent, then a graduate student, and David Haussler, professor of Computer Science (now Biomolecular Engineering) at the University of California, Santa Cruz in 2000, the UCSC Genome Browser began as a resource for the distribution of the initial fruits of the Human Genome Project. Funded by the Howard Hughes Medical Institute and the National Human Genome Research Institute, NHGRI (one of the US National Institutes of Health), the browser offered a graphical display of the first full-chromosome draft assembly of human genome sequence. Today the browser is used by geneticists, molecular biologists and physicians as well as students and teachers of evolution for access to genomic information.
In the years since its inception, the UCSC Browser has expanded to accommodate genome sequences of all vertebrate species and selected invertebrates for which high-coverage genomic sequences is available, now including 46 species. High coverage is necessary to allow overlap to guide the construction of larger contiguous regions. Genomic sequences with less coverage are included in multiple-alignment tracks on some browsers, but the fragmented nature of these assemblies does not make them suitable for building full featured browsers. (more below on multiple-alignment tracks). The species hosted with full-featured genome browsers are shown in the table.
|human, baboon, bonobo, chimp, gibbon, gorilla, orangutan|
|bushbaby, marmoset, mouse lemur, rhesus macaque, squirrel monkey, tarsier, tree shrew|
|mouse, alpaca, armadillo, cat, Chinese hamster, cow, dog, dolphin, elephant, ferret, guinea pig, hedgehog, horse, kangaroo rat, manatee, Minke whale, naked mole-rat, opossum, panda, pig, pika, platypus, rabbit, rat, rock hyrax, sheep, shrew, sloth, squirrel, Tasmanian devil, tenrec, wallaby, white rhinoceros|
|American alligator, Atlantic cod, budgerigar, chicken, coelocanth, elephant shark, Fugu, lamprey, lizard, medaka, medium ground finch, Nile tilapia, painted turtle, stickleback, Tetraodon, turkey, Xenopus tropicalis, zebra finch, zebrafish|
|Caenorhabditis spp (5), Drosophila spp. (11), Ebola virus, honey bee, lancelet, mosquito, P. Pacificus, sea hare, sea squirt, sea urchin, yeast|
The large amount of data about biological systems that is accumulating in the literature makes it necessary to collect and digest information using the tools of bioinformatics. The UCSC Genome Browser presents a diverse collection of annotation datasets (known as "tracks" and presented graphically), including mRNA alignments, mappings of DNA repeat elements, gene predictions, gene-expression data, disease-association data (representing the relationships of genes to diseases), and mappings of commercially available gene chips (e.g., Illumina and Agilent). The basic paradigm of display is to show the genome sequence in the horizontal dimension, and show graphical representations of the locations of the mRNAs, gene predictions, etc. Blocks of color along the coordinate axis show the locations of the alignments of the various data types. The ability to show this large variety of data types on a single coordinate axis makes the browser a handy tool for the vertical integration of the data.
To find a specific gene or genomic region, the user may type in the gene name, (e.g., BRCA1) an accession number for an RNA, the name of a genomic cytological band (e.g., 20p13 for band 13 on the short arm of chr20) or a chromosomal position (chr17:38,450,000-38,531,000 for the region around the gene BRCA1).
Presenting the data in the graphical format allows the browser to present link access to detailed information about any of the annotations. The gene details page of the UCSC Genes track provides a large number of links to more specific information about the gene at many other data resources, such as Online Mendelian Inheritance in Man (OMIM) and SwissProt.
Designed for the presentation of complex and voluminous data, the UCSC Browser is optimized for speed. By pre-aligning the 55 million RNAs of GenBank to each of the 81 genome assemblies (many of the 46 species have more than one assembly), the browser allows instant access to the alignments of any RNA to any of the hosted species.
The juxtaposition of the many types of data allow researchers to display exactly the combination of data that will answer specific questions. A pdf/postscript output functionality allows export of a camera-ready image for publication in academic journals.
One unique and useful feature that distinguishes the UCSC Browser from other genome browsers is the continuously variable nature of the display. Sequence of any size can be displayed, from a single DNA base up to the entire chromosome (human chr1 = 245 million bases, Mb) with full annotation tracks. Researchers can display a single gene, a single exon, or an entire chromosome band, showing dozens or hundreds of genes and any combination of the many annotations. A convenient drag-and-zoom feature allows the user to choose any region in the genome image and expand it to occupy the full screen.
Researchers may also use the browser to display their own data via the Custom Tracks tool. This feature allows users to upload a file of their own data and view the data in the context of the reference genome assembly. Users may also use the data hosted by UCSC, creating subsets of the data of their choosing with the Table Browser tool (such as only the SNPs that change the amino acid sequence of a protein) and display this specific subset of the data in the browser as a Custom Track.
Any browser view created by a user, including those containing Custom Tracks, may be shared with other users via the Saved Sessions tool.
Many types of variation data are also displayed. For example, the entire contents of each release of the dbSNP database from NCBI are mapped to human, mouse and other genomes. This includes the fruits of the 1000 Genomes Project, as soon as they are released in dbSNP. Other types of variation data include copy-number variation data (CNV) and human population allele frequencies from the HapMap project.
The Genome Browser offers a unique set of comparative-genomic data for most of the species hosted on the site. The comparative alignments give a graphical view of the evolutionary relationships among species. This makes it a useful tool both for the researcher, who can visualize regions of conservation among a group of species and make predictions about functional elements in unknown DNA regions, and in the classroom as a tool to illustrate one of the most compelling arguments for the evolution of species. The 44-way comparative track on the human assembly clearly shows that the farther one goes back in evolutionary time, the less sequence homology remains, but functionally important regions of the genome (e.g., exons and control elements, but not introns typically) are conserved much farther back in evolutionary time.
More than simply a genome browser, the UCSC site hosts a set of genome analysis tools, including a full-featured GUI interface for mining the information in the browser database (the Table Browser), a fast sequence alignment tool (BLAT) that is also useful for simply finding sequences in the massive sequence (human genome = 2.8 billion bases, Gb) of any of the featured genomes.
A liftOver tool uses whole-genome alignments to allow conversion of sequences from one assembly to another or between species. The Genome Graphs tool allows users to view all chromosomes at once and display the results of genome-wide association studies (GWAS). The Gene Sorter displays genes grouped by parameters not linked to genome location, such as expression pattern in tissues.
Many users of the Genome Browser gather data of their own in Excel spreadsheets and would like to create links to the Browser using data in the spreadsheet. For example, a clinical geneticist may have lists of regions for a patient that are duplicated or deleted, as determined by comparative genomic hybridization (CGH). These regions can be the source information for a browser view allowing access to each region with a single click.
|Click to download the spreadsheet:
Careful use of Excel's "copy" and "move" functions should allow the links on this sheet to be used without modification.
The contents of the last cell in the image above (cell G21 in the actual spreadsheet) are as follows:
This example shows how to create a link that turns on specific tracks of interest. In this case, three tracks are explicitly turned on:
Database of Genomic Variants (table: dgv) UCSC Genes (table: knownGene) OMIM Genes (table: omimGene2)
Each track is set to "pack" in the link as follows:
dgv=pack knownGene=pack omimGene=pack
Any track that has been open in a session will remain in the view when the new browser window opens.
A new track can be added using the tableName and a visibility of choice:
&snp142=dense (for hg19 or hg38)
Simply add to the end of the url any other desired tableName=visibility, connected to the url by an ampersand (&). The simplest way to learn the name of the table underlying a track is to do a mouseover in a Genome Browser image and read the url at the bottom of the browser page. The table is shown in the url as
Visibility options include:
hide dense squish pack full
To browser information
Direct links to the browser can be created by adjusting the URL. For example, if one wanted to create a direct link to the SIRT1 gene UCSC details page, also known as the hgGene page, for the human genome hg19 assembly, they could create the following URL where "db=" assigns database and "hgg_gene=" assigns the gene:
Similarly, a direct link to the hg19 database at the SIRT1 gene position can be created by using the knownCanonical database, which is often the longest isoform of the related UCSC gene group:
To host remote user information
You make want to look at the Sharing your annotation track with others section at UCSC. For those who are hosting a track hub, a direct link that will open the browser and display the hub can be created in the following format where the "&hubUrl" gets assigned the online location of your hub.txt file (add db= to refer to the hub's genome):
For those hosting a custom track in an annotation file (in this example a bed file displaying blue and red ticks on the human db=hg19 assembly) they can add &hgt.customText= to point to the URL of your annotation file:
Similarly, for those hosting a custom track can put the annotation details directly into the URL with &hgct_customText, where track type, name, description, visibility, and bigDataUrl can be specified:
Those hosting a session can direct the browser to the location by pointing to hgTracks and adding hgS_doLoadUrl=submit and &hgS_loadUrlName= pointing to the saved session file location.
You can even combine the "&hubUrl" for your Hub with the sessions links hgS_doLoadUrl=submit and &hgS_loadUrlName=, if you have a saved session of your hub at a URL. On top of that, you can even combine that with a URL to custom tracks of your hub "&hgt.customText". Please note that you should put your hub tracks in your trackDb.txt with bigDataUrls to the binary files, such as a bigBed, bigWig, BAM, and VCF, but the below approach is an example where another party could add custom tracks annotations to your hub without having access to edit your trackDb.txt, all saved on a remote server so the data will never expire:
Open source / mirrors
The UCSC Browser code base is open-source for non-commercial use, and is mirrored locally by many research groups, allowing private display of data in the context of the public data. The UCSC Browser is mirrored at several locations worldwide, as shown in the table.
|official European mirror site|
|European mirror—maintained by UCSC at University of Bielefeld, Germany|
|Cold Spring Harbor Lab, NY|
|Aarhus University, Denmark|
|Genomics Virtual Lab, Australia|
- Tyner, C; Barber, GP; Casper, J; Clawson, H; Diekhans, M; Eisenhart, C; Fischer, CM; Gibson, D; Gonzalez, JN; Guruvadoo, L; Haeussler, M; Heitner, S; Hinrichs, AS; Karolchik, D; Lee, BT; Lee, CM; Nejad, P; Raney, BJ; Rosenbloom, KR; Speir, ML; Villarreal, C; Vivian, J; Zweig, AS; Haussler, D; Kuhn, RM; Kent, WJ (29 November 2016). "The UCSC Genome Browser database: 2017 update.". Nucleic Acids Research. PMID 27899642.
- Fujita PA, Rhead B, Zweig AS, Hinrichs AS, Karolchik D, Cline MS, Goldman M, Barber GP, Clawson H, Coelho A, Diekhans M, Dreszer TR, Giardine BM, Harte RA, Hillman-Jackson J, Hsu F, Kirkup V, Kuhn RM, Learned K, Li CH, Meyer LR, Pohl A, Raney BJ, Rosenbloom KR, Smith KE, Haussler D, Kent WJ (Jan 2011). "The UCSC Genome Browser database: update 2011". Nucleic Acids Res. 39 (Database issue): D876–82. PMC . PMID 20959295. doi:10.1093/nar/gkq963.
- Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler D (June 2002). "The human genome browser at UCSC". Genome Res. 12 (6): 996–1006. PMC . PMID 12045153. doi:10.1101/gr.229102.
- Kuhn, RM, Karolchik D, Zweig AS, Wang T, Smith KE, Rosenbloom KR, Rhead B, Raney BJ, Pohl A, Pheasant M, Meyer L, Hsu F, Hinrichs AS, Harte RA, Giardine B, Fujita P, Diekhans M, Dreszer T, Clawson H, Barber GP, Haussler D, Kent WJ (January 2009). "The UCSC Genome Browser Database: update 2009". Nucleic Acids Res. 37: D755–D761. PMC . PMID 18996895. doi:10.1093/nar/gkn875.
- "High-coverage" here means 6x coverage, or six times more total sequence than the size of the genome.
- Kent, WJ. (Apr 2002). "BLAT - the BLAST-like alignment tool". Genome Res. 12 (4): 656–64. PMC . PMID 11932250. doi:10.1101/gr.229202.