|This article is of interest to the following WikiProjects:|
I don't think the term metabase rightly applies to GeneCards and euGenes. The link to GenLoc is broken so I can't tell. I think SOURCE is rightly called a metabase.
I would like to propose the following database classification...
- Primary database - compiles the results of basic scientific experiments. Like a primary witness, it is a basic (first hand) source of data.
- Secondary database - A database including computationally derived information from the primary data. These databases apply processing in the form of various algorithms to produce 'secondary' data from the primary data. A secondary database my link several primary databases using hyperlinks, but no serious integration effort is involved.
- Ternary database - An integrated database which combines primary and or secondary datbases into a derived 'classification' database.
- Middle ware - the technology for producing a ternary database should not be confused with the database iteslf. This is confusing because many middleware technologies develope a ternary database to show off the technology 'in action', and it is hard to distinguish the two. One example of this is the ECOCYC database.
If there are no objections I will add this classification to the mainpage. --188.8.131.52 14:49, 16 Nov 2004 (UTC)
Sorry, I didn't see the TALK before my last edit... I do believe database like euGene should be called meta or secondary dbs, it describes itself as "euGenes provides a common summary of gene and genomic information from eukaryotic organism databases", which fits well to the description I put on the page. What do you think?
I m not aware of the further classification into Ternary dbs in this context. But, please add your knowledge if you have more details on this.
I would suggest putting the more technical things into a seperate topic, like "data integration" or something like it.
Here are a few suggestions to improve this article and the category such databases are linked in:
- categories listed here are a mixture of content and meta descriptions
- meta database is a weak classification, as most bio databases have a mix of direct from the lab data and (meta)data about this data).
- primary, secondary, tertiary classing is another less useful attribute, hard to determine
- the classes well used by biologists deal with content: dna, protein (sequence, structure, interactions), genome, etc.
- rename genome browsers class, which is not a database per se but a view of genome data, to genome databases (several of the items in this are instances of genome databases).
- drop the 'primary' and replace with DNA or nucleic acid for the first sequence database class, as while these historically (in GenBank and EMBL) are the first, today they have many widely used peer databases of other content: literature (Medline/PubMed), microarray/gene expression (ArrayExpress, GEO), ontologies (OBO), phylogenetics, etc.
- the Category:Bioinformatics databases should be renamed to Biological databases. These are data collections from biologists and for biologists, not from/for bioinformaticians.
Dongilbert 04:02, 15 October 2007 (UTC)
Genome browser category doesn't belong
The genome browser category isn't a biological database, but examples of genome informatics tools (of which there are many more), and probably should move elsewhere. I've added separate, more relevant genome databases as a category. Dongilbert 05:34, 15 October 2007 (UTC)
Source for the number of 5000 databases
Link list is poor article style
This article has turned into a List of biology databases, people adding their favorite without regard to whether this helps explain what a biological database is, see Wikipedia:List of guidelines. I suggest all the database examples be moved to a separate page titled List of Biological databases, per Wikipedia:Manual of Style (lists)