Column family: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Added a diagram
I repeat: I don't see the point of location in cite web templates
Line 11: Line 11:
| url = http://maxgrinev.com/2010/07/09/a-quick-introduction-to-the-cassandra-data-model/}}</ref> In a [[Relational database|relational]] [[database table]], this data would be grouped together within a table with other non-related data.<ref>{{cite web
| url = http://maxgrinev.com/2010/07/09/a-quick-introduction-to-the-cassandra-data-model/}}</ref> In a [[Relational database|relational]] [[database table]], this data would be grouped together within a table with other non-related data.<ref>{{cite web
| accessdate = 2011-03-18
| accessdate = 2011-03-18
| location = http://wiki.toadforcloud.com/
| publisher = Toad for Cloud
| publisher = Toad for Cloud
| title = Column Families 101
| title = Column Families 101
Line 18: Line 17:
Column families are container for columns sorted by their names that can be referenced and sorted by their row key.<ref>{{cite web
Column families are container for columns sorted by their names that can be referenced and sorted by their row key.<ref>{{cite web
| accessdate = 2011-03-29
| accessdate = 2011-03-29
| location = http://www.javageneration.com/
| publisher = Chaker Nakhli's Blog - Yet another technical blog.
| publisher = Chaker Nakhli's Blog - Yet another technical blog.
| title = Cassandra’s data model cheat sheet: Column Family
| title = Cassandra’s data model cheat sheet: Column Family

Revision as of 12:00, 29 March 2011

A column family is a NoSQL object that contains columns of related data. It is a tuple (pair) that consists of a key-value pair, where the key is mapped to a value that is a set of columns. In analogy with relational databases, a column family is as a "table", each key-value pair being a "row".[1] Each column is a tuple (triplet) consisting of a column name, a value, and a timestamp.[2] In a relational database table, this data would be grouped together within a table with other non-related data.[3]

Column families are container for columns sorted by their names that can be referenced and sorted by their row key.[4]

The column consists of a (unique) name, a value, and a timestamp.

Benefits

Accessing the data in a distributed data store would be expensive (time-consuming), if it would be saved in form of a table. It would also be inefficient to read all column families that would make up a row in a relational table and put it together to form a row, as the data for it is distributed on a large number of nodes. Therefore, the user accesses only the related information required.

As an example, a relational table could consist of the columns UID, first name, surname, birthdate, gender, etc. In a distributed data store, the same table would be implemented by creating columns families for "UID, first name, surname", "birthdate, gender", etc. If one needs only the males that were born between 1950 and 1960, for a query in the relational database, all the table has to be read. In a distributed data store, it suffices to access only the second column family, as the rest of information is irrelevant.

Column families vs. rows

Column families have a schemeless nature so that each of their "row"s can contain a different number of columns, and even different column names could be in each row.[5] So, they are a very different concept than the rows in relational database management system (RDBMS)s. This is one of the reasons why the concept is not trivial for a experienced RDBMS expert.

Examples

In JSON-like notation, a super column definition would look as follows:

 {
    name: "homeAddress",
    value: {
        // note the keys are the name of the Column
        street: {name: "street", value: "1234 x street", timestamp: 123456789},
        city: {name: "city", value: "san francisco", timestamp: 123456789},
        zip: {name: "zip", value: "94107", timestamp: 123456789},
    }
 }

An example for a super column family is

 AddressBook = { // this is a ColumnFamily of type Super
    phatduckk: {    // this is the key to this row inside the Super CF
        // the key here is the name of the owner of the address book

        // now we have an infinite # of super columns in this row
        // the keys inside the row are the names for the SuperColumns
        // each of these SuperColumns is an address book entry
        friend1: {street: "8th street", zip: "90210", city: "Beverley Hills", state: "CA"},

        // this is the address book entry for John in phatduckk_s address book
        John: {street: "Howard street", zip: "94404", city: "FC", state: "CA"},
        Kim: {street: "X street", zip: "87876", city: "Balls", state: "VA"},
        Tod: {street: "Jerry street", zip: "54556", city: "Cartoon", state: "CO"},
        Bob: {street: "Q Blvd", zip: "24252", city: "Nowhere", state: "MN"},
        ...
        // we can have an infinite # of ScuperColumns (aka address book entries)
    }, // end row
    ieure: {     // this is the key to another row in the Super CF
        // all the address book entries for ieure
        joey: {street: "A ave", zip: "55485", city: "Hell", state: "NV"},
        William: {street: "Armpit Dr", zip: "93301", city: "Bakersfield", state: "CA"},
    },
 }

Types

Two types of column families exist:[6]

  • Standard column family: contains only columns
  • Super column family: contains a map of super columns.

References

  1. ^ "Column Families 101". Max's Output. Retrieved 2011-03-18.
  2. ^ Max's Output. "A Quick Introduction to the Cassandra Data Model: 1) Cassandra is based on a key-value model". Max Grinev. Retrieved 2011-03-18. A column family is a set of key-value pairs. I know the terminology is confusing but so far it is just basic key-value model. Drawing an analogy with relational databases, you can think about column family as table and a key-value pair as a record in a table.
  3. ^ "Column Families 101". Toad for Cloud. Retrieved 2011-03-18.
  4. ^ "Cassandra's data model cheat sheet: Column Family". Chaker Nakhli's Blog - Yet another technical blog. Retrieved 2011-03-29. A container for columns sorted by their names. Column Families are referenced and sorted by row keys.
  5. ^ Posted by Terry (2010-03-22). "Apache Cassandra Quick tour". Terry.Cho's blog. Retrieved 2011-03-25. One of interest thing is each row can have different scheme. Cassandra row has "emailAddress" ,"age" column. TerryCho row has "emailAddress","gender" column. This characteristic is called as "Schemeless" (Data structure of each row in column family can be different).
  6. ^ "A ColumnFamily Can Be Super Too". Arin Sarkissian. Retrieved 2011-03-18. Now, a ColumnFamily can be of type Standard or Super. What we just went over was an example of the Standard type. What makes it Standard is the fact that all the Rows contains a map of normal (aka not-Super) Columns… there's no SuperColumns scattered about. When a ColumnFamily is of type Super we have the opposite: each Row contains a map of SuperColumns. The map is keyed with the name of each SuperColumn and the value is the SuperColumn itself. And, just to be clear, since this ColumnFamily is of type Super, there are no Standard ColumnFamily's in there. {{cite web}}: line feed character in |quote= at position 54 (help)

See also

External links