Talk:NoSQL

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Databases / Computer science  (Rated Mid-importance)
WikiProject icon This article is within the scope of WikiProject Databases, a collaborative effort to improve the coverage of database related articles on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
 ???  This article has not yet received a rating on the project's quality scale.
 Mid  This article has been rated as Mid-importance on the project's importance scale.
Taskforce icon
This article is supported by WikiProject Computer science (marked as Mid-importance).
 
WikiProject Computer science (Rated Start-class, Low-importance)
WikiProject icon This article is within the scope of WikiProject Computer science, a collaborative effort to improve the coverage of Computer science related articles on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Start-Class article Start  This article has been rated as Start-Class on the project's quality scale.
 Low  This article has been rated as Low-importance on the project's importance scale.
 

Table[edit]

In the short comparison table: I'm not sure that influences and sponsors are related. perhaps it would be better to use 2 different columns such as based on for the influences and support for commercial / free groups that provide support. After all the business model of many open-source NoSQL suppliers is support, making it a prime attribute of their software.aary (talk)

Why was the table removed? If there were concerns about the content, it would be better to replace it than to remove the table. I'm tempted to reverse the edits. Bhaskar (talk) 14:22, 13 November 2009 (UTC)


I'd like to propose changing the NoSQL wiki in the following way:

  1. move the table with all the NoSQL databases to a different page similar to: [Comparison_of_SQL_database_management_systems] or perhaps like here: [Comparison_of_business_integration_software]
  2. elaborate on the different types of NoSQL solutions and their respective uses:
    1. Graph-based databases
    2. Document-based databases
    3. K/V stores
  3. for each one write why and how they are different from SQL and what problems did they come to server

This is just a suggestion, but if no one objects we can all contribute from our expertise and make this a truly detailed overview of the kind of solutions NoSQL offers (what NoSQL means) rather than just a list of available options. aary (talk) 22:54, 13 November 2009 (UTC)

Agreed. Moved the comparison table to structured storage (the formal name used in the papers), however each of the different types of databases deserves (and generally already has) a dedicated page. This article should focus on the NoSQL group itself, who its members are, what it believes, etc. -- samj inout 13:09, 26 November 2009 (UTC)
Is this what the NPOV dispute was about? Can the tag be removed now? Pcap ping 14:03, 9 January 2010 (UTC)

I would like to see some "Pro's/Con's" when to use or not to use NoSQL Databases.
Pros:

  • have tons of data (large datasets)
  • have sparse data
  • fast read/write access (but depends on primary-key/indexing)
  • good for Web 2.0 applications...
  • good communities which will support you
  • schemaless
  • no query language (need Map/Reduce function to query data)

Cons:

  • schemaless (sometimes important for interoperability, reusability)
  • still in development (you could experience some missing features like authentication, performance problems, ...)
  • no query language (need Map/Reduce function to query data)
    • in projects many partners are familiar with SQL
  • you require a server-cluster to get the full performance of NoSQL DBs
  • ACID transactions is sacrificed for performance.
  • using aggregations —Preceding unsigned comment added by 129.26.162.152 (talk) 10:00, 22 September 2010 (UTC)

Legitimacy[edit]

I'm concerned about the legitimacy of this page. The only reference links to a Rackspace blog. The paragraph following the paragraph containing the reference claims that a Rackspace employee coined the term. Without any other corroborating links or references, this entry appears to be little more than a marketing ploy to legitimize the term 'NoSQL' Dancrumb (talk) 20:24, 15 November 2009 (UTC)

If it helps NoSQL is essentially an advocacy group (a "database movement", whatever that is) and the article should focus on it, its members, structure, events, "beliefs", etc. Indeed many who actively promote these new breed of databases (myself included) don't subscribe to the abrasive approach taken by this particular group and I doubt all those listed in the comparison table are willing participants either.
In terms of Wikipedia policy on verifiable notability I suspect that NoSQL would pass, which is why I have resisted the urge to AfD this article and have in fact encouraged them to contribute to Wikipedia rather than create another repository elsewhere. Hopefully the article will improve significantly however as it is currently not great. -- samj inout 12:56, 23 November 2009 (UTC)
I feel this page is totally legitimate. With all the recent press about the NOSQL movement we need to get this removed ASAP. This page can be a great place for people to get more information on alternatives to RDBMS systems. It provides a great service to the community. I vote we add more references and remove the dispute tag now.--Dan 11:40, 4 December 2009 (UTC)
Check the companies on a Google search who are buying keywords for "nosql" -- I believe this is a totally legit topic and will only get biggger. --209.204.139.40 (talk) 06:14, 7 December 2009 (UTC)
Great point about the Google Keywords. I would like to remove the disputed tag today unless I we get rational arguments. --Dan 13:27, 9 December 2009 (UTC)]
It's disputed "again" since NoSQL isn't English, and use as a buzzword by promoters doesn't make it English. Structured storage is already a legit redirect, as are filter, map, reduce and map, filter, reduce. Sadly MapReduce (CamelCaps are not English, folks!) is a heavily overloaded term used by several companies... — Preceding unsigned comment added by Corrector623 (talkcontribs) 17:48, 29 March 2012 (UTC)
I am changing this sentence: "The reduced run time flexibility compared to full SQL systems is compensated by significant gains in scalability and performance." Several RDBMS's have essentially unlimited scalability. — Preceding unsigned comment added by 173.183.76.129 (talk) 18:54, 15 February 2012 (UTC)
Unlimited scale for what? The statement should be clarified, but it's not inaccurate. It's scalable on tasks where traditional RDBMS systems choke, such as graph traversal which would require a join, join, join, join. It all depends on what your doing. Morphh (talk) 21:46, 15 February 2012 (UTC)


This page is a gross misrepresentation of the topic, its emergence and its major players. I am tempted to believe that a few like Rackspace are using the page to promote their own agenda. For example the statement that Eric Evans reintroduced the term is really a stretch. Also, its a shame that there is no mention of Google, who is the flag bearer of the NoSQL story with its BigTable and success around it. Someone needs to delete this entire page and write a fresh and accurate one in its place. — Preceding unsigned comment added by Tshanky (talkcontribs) 00:59, 3 April 2011 (UTC)

Agreed it now has NPOV tag. Attempted to propose some alternate generic names map, filter, reduce and filter, map, reduce which are at least generic English not CamelCaps crap, and also better than fold (higher-order function) ("higher"?!?!?). MapReduce article has similar problems, generic pipeline the simplest DBs are using needs to be described and clearly differentiated from:
The article says " Academic researchers typically refer to these databases as structured storage," So why is structured storage not the primary disambiguation article that explains the differences between the above? — Preceding unsigned comment added by Corrector623 (talkcontribs) 17:43, 29 March 2012 (UTC)

Why not just call the article Non-relational databases or storage? That is essentially what they are, methinks. What would be best, I assume, is an acceptable consensus among academics, which "structured storage" seems to be (even if, by strict definitions, relational databases are also structured storage). Of course, I'm new here and I'm not familiar with the guidelines of Wikipedia so this is only a suggestion. Luord (talk) 00:18, 6 May 2013 (UTC)

ebay does use an RDBMS[edit]

I believe ebay have purchased Greenplum, which is an RDBMS based on PostgreSQL. There's some information on DBMS2's site. Do ebay have any NoSQL implementations?


Note: I do not work for either Greenplum or ebay, but have spoken with Greenplum's sales people, which is how I know about this. Nic Doye (talk) 18:01, 23 November 2009 (UTC) Nic Doye

Also:

Facebook uses MySQL for most of its data management

Twitter uses an MySQL for most of its data management.

I removed their names from the list of companies whose needs go beyond SQL for this reason. I'd like to see reference if we are to restore those names. — Preceding unsigned comment added by Whimsley (talkcontribs) 20:28, 16 April 2012 (UTC)

Taxonomy of NoSQL Systems[edit]

I would like to add a taxonomy section to the main page.

Here is a list that Srini V. Srinivasan posted on the list from Steve Yen's talk at NoSQL Oakland. I have also added a few additional items.

  1. Key-Value APIs
    1. key-value cache (memcached, repcached, coherence, infinispan, eXtreme scale, jboss, cache, velocity, terracotta
    2. key-value store (keyspace, flare, schema-free, RAMCloud)
    3. clustered key-value-store (dynamo, voldemort, Dynomite, SubRecord, MotionDb, Dovetaildb)
    4. ordered-key-value-store (tokyo tyrant, lightcloud, NMDB, luxio, memchachedb, actord)
  2. data-structures database (redis)
  3. tuple-store (gigaspaces, coord, apache river)
  4. object database (ZopeDB, db4o, Shoal)
  5. document store (couchDB, Mongo, jackrabbit, ThruDB, CloudKit, Persevere, Riak Basho, Scalaris, Citrusleaf)
  6. Native XML Databases (MarkLogic, eXist-db)
  7. wide columnar store (BigTable, Hbase, Cassandra, Hypertable, KAI, OpenNeptune, Qbase, KDI)

I do people keep calling Riak a document store if it has a dynamo architecture and does not support changes to a field in a document when in the database? It's because it can store and read documents? How is that different to the pluggable serialization form Voldemort? —Preceding unsigned comment added by 188.82.70.107 (talk) 11:13, 17 October 2010 (UTC)


Jackrabbit NoSQL ?[edit]

In my opinion Jackrabbit is not a noSQL Database. Its a Content Repository like the Ariadne Content repository which provides a unified interface to access the contents of a content repository. Perhaps it internally uses a noSQL database (i don't know), but Jackrabbit is not a noSQL Database! —Preceding unsigned comment added by 129.26.162.152 (talk) 06:59, 22 September 2010 (UTC)

Essence of NoSQL systems[edit]

I think that the essence of the NoSQL is about two things:

  1. Not ACID-compliant because full transaction support becomes too time consuming when the amount of data is large, when the data is distributed and when relatively cheap hardware is used.
  2. No joins between tables. Joins becomes too slow when the amount of data is large. No joins means that data has to be stored in a denormalized format.

It's not about the lack of a schema. Cassandra has structures called column families, it is something simular to a schema and it isn't easy to change such a column family in Cassandra. I tend to think that Cassandra isn't schemaless. Heelmijnlevenlang (talk) 21:32, 4 December 2009 (UTC)

The feature intrinsically defining NoSQL, is that they all are distributed systems for processing BigData. In this category of volumes RDBMSs just fail short and become tedious and unmanageable, and require large and expensive hardware. By adopting distribution, one can run performing data-stores on commodity hardware or even on cloud infrastructure. However, begin distributed, these systems intrinsically become subject to Eric Brewer's CAP theorem: such a system must drop one of the three CAP properties in favor of the other two. Hence, every of the systems on the example list gets very distinctive properties when compared to the others. CAp, CaP and cAP would effectively categorize every NoSQL datastore for its feature set, which every adopter must evaluate to favor one over the other, mostly depending on his own requirements. Amazon e.g., cares less for consistency, hence prefers eventual consistency in its Dynamo database, favoring its always available property. By dropping typical RDBMS properties, hence not supporting SQL, this range of data-stores has been named 'NoSQL' datastores. wimvanleuven (talk) 10:04, 16 December 2009 (UTC)
Brewer's CAP theorem does seem the best way to categorize NoSQL implementations, as well as bearing on their key advantage over RDBMS (large volume). But we don't see to have a CAP entry anywhere in en.wikipedia. http://www.julianbrowne.com/article/viewer/brewers-cap-theorem has some references; do they add up to enough to make a relevant article? Jackrepenning (talk) 21:51, 3 March 2010 (UTC)
See CAP theorem. Heelmijnlevenlang (talk) 20:04, 21 March 2010 (UTC)
I think it is important to differentiate RDBMS (ie the platform) from a relational database (normalised data structures). Data Warehouses are often built on an RDBMS platform and deal with large data volumes and often involve multiple servers. Additionally the data is generally structured in a denormalised manner (eg star schema) and often in a non-row based physical storage format. Finally, bulk updates to warehouses are often done in a way that is not ACID compliant. So they share many of the characteristics of NoSQL, except for the obvious one that they are often queried through SQL (though that is often generated by a tool). But the discussion focuses on transactional workloads rather than the aggregation or analysis associated with datawarehouse activity. It isn't so much data volumes that define the NoSQL use case, but massively concurrent access to detailed data items. —Preceding unsigned comment added by 203.15.73.30 (talk) 00:30, 23 March 2010 (UTC)

VoltDB[edit]

Where does VoltDB fit into the list? I'd like to find a list of RDBMS NoSQL databases. --Ysangkok (talk) 18:35, 10 June 2010 (UTC)

Makes sense to add VoltDB, but since it's new, that's probably the reason it wasn't on the list in the first place. Leave that to the original author to add that if needed. Captchad (talk) 20:53, 16 June 2010 (UTC)

RavenDB[edit]

I believe that RavenDB (pushed hard by Ayende Rahien) also fits the NoSQL, punting itself as a second generation Document DB. http://ravendb.net/features Nonnb (talk) 06:23, 30 May 2012 (UTC)

I think, you are right, but we need an article first! See WP:WTAF for more!--Kgfleischmann (talk) 16:23, 30 May 2012 (UTC)

Relation to article Document-oriented database[edit]

As a sw developer with a memory of more than just the current hype, i'd like to propose merging this articel and the article about 'Document-oriented database'. The Reasons that led to the development of document-oriented databases were more or less the same. The only difference is the usage of the current buzz words in the NoSQL-groups. In a way NoSQL-systems have taken over tzhe role of document-oriented databases with regard to SQL-servers on the other side. 12:55, 30 July 2010 (UTC) —Preceding unsigned comment added by 91.40.129.120 (talk)

Transactional support to BigTable[edit]

There is a paper from 2008 presented by Google on SIGMOD/POD about a transactional manager for BigTable called MegaStore. I can't find the paper "Megastore: A Scalable Data System for User Facing Applications" but there is a description about the presentation in James Hamilton's Blog. Now, I don't know the relation of this system with Percolator, but it should be mentioned in the same section, no? —Preceding unsigned comment added by 188.82.70.107 (talk) 11:24, 17 October 2010 (UTC)

Non SQL vs. NoSQL[edit]

NoSQL equals "not only SQL". But what about "non SQL"?--217.162.253.165 (talk) 16:34, 13 November 2010 (UTC)

I agree. NoSQL means having no SQL processor or SQL query is not supported. --Okisan (talk) 01:24, 9 April 2013 (UTC)


Who cares what we think? It's what real world reliable sources define it as that matters. GimliDotNet (Speak to me,Stuff I've done) 11:50, 9 April 2013 (UTC)

Taxonomy section proposal[edit]

Hello,

Any objections to moving the info in the Taxonomy section to the Structured storage table? 205.228.108.185 (talk) 04:30, 25 February 2011 (UTC)

I oppose, the Structured storage table has a bad qualitiy, for example lot of entries without articles (proofs of notability), see WP:WTAF for more. --Kgfleischmann (talk) 06:50, 25 February 2011 (UTC)
Fair enough, but what you mention is a good reason to improve that table, not to stuff a separate table in this article. 205.228.108.185 (talk) 09:39, 25 February 2011 (UTC)
Begin improveing it, let's continue this discussion afterwards--Kgfleischmann (talk) 12:11, 25 February 2011 (UTC)
Sure, can you be more specific about what you don't like in that table? In fact, can you actually lead by example and start working on it? 220.100.23.139 (talk) 13:40, 25 February 2011 (UTC)
OK, I've given it a go. Can you please have a look and let me know if this is what you had in mind? Are there any more objections? 121.102.42.157 (talk) 23:40, 25 February 2011 (UTC)
Also, I would point out that the tables in the Taxonomy section currently also suffer from the same notability problem, see e.g. Tokyo Cabinet. 219.111.119.62 (talk) 04:28, 5 March 2011 (UTC)
I oppose, as the Taxonomy section is about NoSQL. Sae1962 (talk) 11:09, 9 March 2011 (UTC)
Ditto to Sae1962 — Preceding unsigned comment added by 198.175.196.56 (talk) 17:38, 14 March 2012 (UTC)
According to the lead, Structured Storage is how NoSQL systems are referred-to in academic papers, so the two terms can be used as synonyms, and indeed Structured Storage redirects to NoSQL. OK in theory Structured Storage also includes relational DBMSs, but that is only an argument to move Comparison of structured storage software to Comparison of NoSQL software.
In the Taxonomy section I would certainly welcome some kind of technical description of the various types, with eminent examples for each category. But effectively rebuilding the same table without merging the efforts does not make sense to me. 205.228.108.58 (talk) 02:19, 10 March 2011 (UTC)
I oppose. NoSQL systems usually support Unstructured (or Semi-structured) content and that differentiates them from Structured storage systems that deal poorly with this kind of data. Many of the NoSQL systems also tout not requiring a schema, which is a mainstay of many structured storage systems. EricBloch (talk) 00:11, 19 July 2011 (UTC)

Requested move[edit]

The following discussion is an archived discussion of a requested move. Please do not modify it. Subsequent comments should be made in a new section on the talk page. No further edits should be made to this section.

The result of the move request was: move per consensus that the particular RDBMS is not the primary topic. ErikHaugen (talk | contribs) 17:26, 2 May 2011 (UTC)


NoSQL (concept)NoSQL — To revert the recent unilateral move. I believe the mover is inventing an alternate, novel interpretation of the term "NoSQL" (namely, "NoSQL (RDBMS)", apparently defined as "a relational database management system that intentionally avoids the use of SQL"), which amounts to original research. NoSQL (RDBMS) is also currently a nonexistent page, making any disambiguation unnecessary at this time. I maintain that the "concept" is the primary topic, and the namesake of NoSQL (RDBMS) (which is of questionable notability). --Cybercobra (talk) 05:45, 6 April 2011 (UTC)

Ya, the base definition of NoSQL is that it is non-relational. From (http://nosql-database.org/) NoSQL DEFINITION: Next Generation Databases mostly addressing some of the points: being non-relational, distributed, open-source and horizontal scalable. The original intention has been modern web-scale databases. The movement began early 2009 and is growing rapidly. Often more characteristics apply as: schema-free, easy replication support, simple API, eventually consistent / BASE (not ACID), a huge data amount, and more. So the misleading term "nosql" (the community now translates it mostly with "not only sql") should be seen as an alias to something like the definition above. [based on 5 sources, 11 constructive feedback emails (thanks!)...] Morphh (talk) 15:19, 6 April 2011 (UTC)
  • Support move to NoSQL. "concept" is a horrible disambiguator; if the separate RDBMS article holds up, then it can be linked in a hatnote from this article. SnowFire (talk) 02:59, 30 April 2011 (UTC)
  • Support Vast majority of reliable published sources mean the concept when they say NoSQL. This article should be at the main title. I was very surprised to see it was the opposite recently. With only one other article in the dab page, a hatnote works just fine. Steven Walling 10:05, 1 May 2011 (UTC)
  • Support Per above comment Morphh (talk) 13:25, 1 May 2011 (UTC)
The above discussion is preserved as an archive of a requested move. Please do not modify it. Subsequent comments should be made in a new section on this talk page. No further edits should be made to this section.

NoSQL (RDBMS)[edit]

Should the NoSQL (RDBMS) article be deleted or merged? The sources are extremely weak for that article if not unreliable altogether. I expect it doesn't meet WP:NOTE or WP:V. Morphh (talk) 18:37, 2 May 2011 (UTC)

I agree. WP:PROD-ed accordingly. --Cybercobra (talk) 06:09, 3 May 2011 (UTC)
An IP has since added some sources and deprodded. --Cybercobra (talk) 22:48, 3 May 2011 (UTC)

Redirect to new filter, map, reduce article?[edit]

The vast majority of "no SQL" databases that are actually described that way (as opposed to network databases, semantic web techs like triple stores, object-oriented databases and other ways of storing data) are using the "filter, map, reduce" paradigm. The current article on MapReduce inanely describes it as Google-specific with Riak and many other tools support similar functions.

Taxonomy section should be tabular[edit]

Any objections to having a table instead of a list of lists? It would also allow easy comparisons, sorting, blah. 63.217.82.139 (talk) 07:33, 11 April 2012 (UTC)

Are Network and Graph DBMS: NoSQL, NonSQL or PreSQL????[edit]

Wondering where Network model and Graph DBMS fit in the brave new NoSQL world, is there a consensus --83.104.51.74 (talk) 18:32, 7 September 2012 (UTC)

Removed mention of UnQL[edit]

UnQL was merely a specification which never achieved notability. The Wikipedia article about it was deleted on 2012-09-10: http://en.wikipedia.org/wiki/Wikipedia:Articles_for_deletion/UnQL. And the citation for it here was just an interview with one of its proposers. Peter Gulutzan (talk) 14:41, 23 June 2013 (UTC)

Second paragraph in lead[edit]

This paragraph in the lead ACID vs BASE NoSQL cannot necessarily give full ACID guarantees. Usually eventual consistency is guaranteed for transactions limited to single data items. This means that given a sufficiently long period of time over which no changes are sent, all updates can be expected to propagate eventually through the system. just pops out of nowhere and seems to have no relationship to the previous paragraph and simply not belong in the lead to summarise. I am going to remove it, however, someone who knows the subject may wish to insert it in the appropriate place, and work out what the previous contributor was trying to do. See Wikipedia:Manual of Style/Lead sectionbillinghurst sDrewth 23:50, 9 September 2013 (UTC)

Classification based on data model[edit]

Just to say I've tidied up this table, removing items that aren't in the cited source, adding one that was (Redis), and wiki-linking to main articles where I could. Steven Yen's classification isn't a great source. The original presentation slide (#54, here) is labeled "Northscale, Inc. Proprietary and Confidential" and anyway, by quoting wholesale from his classification, the table is set in concrete. You could probably remove any of his listed technologies that no longer exists, but if you add anything, then it's no longer his document. Ironically, that means that Couchbase doesn't belong in the table, because it didn't exist at the time of Yen's presentation. The best thing would be to find a more authoritative taxonomy. - Pointillist (talk) 14:39, 7 November 2013 (UTC)

Proposing new section: NoSQL Databases on the Cloud[edit]

There is growing use of NoSQL databases in the cloud. A while ago I wrote the entry Cloud database which explains how databases are deployed on the cloud - both SQL and NoSQL. I now want to do an expanded version of the NoSQL side of this topic, as a new section here on the NoSQL page, with a link to Cloud database as main article.

Suggested structure of the new section (comments are welcome):

  1. Intro explaining three deployment models - install on cloud as Virtual Machine Image, Database as a Service, Native Cloud NoSQL DBs (like Amazon SimpleDB)
  2. Table providing notable examples of NoSQL databases available on the cloud in each of these deployment models, for example MongoDB which can be installed directly on Amazon, or as a service via the commercial provider Mongolab.

Columns in the table: Deployment Model | Database Technology | Provider | Cloud-Specific Features | Pricing Model

For example:

  • Virtual Machine Image | Cassandra | Apache Cassandra (link to machine image for Amazon EC2) | None | Open source, Amazon instances pay per use
  • Database as a Service | Cassandra | InstaClustr | Performance tuning, backups etc. | Paid plans based on storage and RAM

Any comments / objections before I move forward? Anne.naimoli (talk) 10:29, 25 December 2013 (UTC)

I see there is a "silent consensus", so I'm moving forward :) Anne.naimoli (talk) 11:54, 29 December 2013 (UTC)