Column (database)

From Wikipedia, the free encyclopedia
Jump to: navigation, search

In a relational database, a column is a set of data values of a particular simple type, one value for each row of the database.[1] A column may contain text values, numbers, or even pointers to files in the Operating System.[2] Some relational database systems allow columns to contain more complex data types; whole documents, images or even video clips are some examples.[3] A column can also be called an attribute.

For example, a database that represents companies might have the following columns:

  • ID (integer identifier, unique to each row)
  • Name (text)
  • Address line 1 (text)
  • Address line 2 (text)
  • City (integer identifier, drawn from a separate table of cities, from which any state or country information would be drawn)
  • Postal code (text)
  • Industry (integer identifier, drawn from a separate table of industries)
  • etc.

Each row would provide a data value for each column and would then be understood as a single structured data value, like a company perhaps, as shown above in this particular example. More formally, each row can be interpreted as a relvar, composed of a set of tuples, with each tuple consisting of the two items: the name of the relevant column and the value this row provides for that column.

Column 1 Column 2
Row 1 Row 1, Column 1 Row 1, Column 2
Row 2 Row 2, Column 1 Row 2, Column 2
Row 3 Row 3, Column 1 Row 3, Column 2

Examples of databases: PostgreSQL, MySQL, SQL Server, Access, Oracle, Sybase, DB2.

Coding involved: SQL [Structured Query Language]

See more at SQL.

Field[edit]

The word 'field' is normally used interchangeably with 'column'.[4] However, database perfectionists tend to favor to use the word 'field' to signify a specific value or single item of a column. Therefore, a field is joint of a row and column.

Row database vs column database[edit]

Relational databases mainly use row-based data storage, but column-based storage can be more useful for many business applications. For example, a column database has faster access to which columns can read throughout the range process of a query. Any of the columns are known to serve as an index. Alternatively, row-based applications process only one record at one time and normally need to access a complete record or two. Column databases have better compression as the data storage permits highly effective compression since the majority of the columns cover only a few distinct values compared to the number of rows.[5] Furthermore, in a column store, data is already vertically divided. This vertical organization allows operations on different columns to be processed in parallel. If multiple items need to be searched or aggregated, each of these operations can be assigned to a different processor core. Overall, row-based databases in rows needs to check read though the obligation is to access data from a few columns. Therefore, requests on a large amount of data can take a lot of time, whereas in column database tables, this information is kept physically next to each other, knowingly increasing the speed of certain data queries.[6]

Advantages[edit]

The main benefit of keeping data in a column database is that some queries can come really quickly. For instance, if you want to know the average age of all users, you can easily jump to the area where the 'age' data is stored and read just the data needed instead of searching up the age for each record row by row. During querying, columnar storage avoids going over non-relevant data. Therefore, aggregation queries where one only needs to look up subsets of total data develop more quickly, compared to row-oriented databases.[7]

Furthermore, as the data type of each column is alike, better compression occurs when running compression algorithms on each column, which will help queries churn results more quickly.[8]

Disadvantages[edit]

There are many situations where you multiple fields from each row will be desired. Column databases are usually not the best option for these types of queries. The more fields that need reading per record, the less benefits there are in storing data in a column-oriented fashion. If queries are looking for user-specific values only, row-oriented databases usually perform those queries faster. Secondly, writing new data could take more time in columnar storage.[9] For instance, if you're inserting a new record into a row-oriented database, you can easily write that in one process. However, if you're inserting a new record to a column database, you need to write to each column one by one. This results as it will take longer time when loading new data or updating many values in a columnar database.[10]

See also[edit]

References[edit]

  1. ^ The term "column" also has equivalent applications in other, more generic contexts. See e.g., Flat file database, Table (information).
  2. ^ "Columnar databases in a big data environment". dummies.com (Big dummies book). Retrieved 2015-11-05. 
  3. ^ "What is Database Column? - Definition from Techopedia". Techopedia.com. Retrieved 2015-11-05. 
  4. ^ "An introduction to databases". www.ucl.ac.uk. Retrieved 2015-11-05. 
  5. ^ "Introduction to column oriented databases". 2012-11-30. 
  6. ^ "» SAP HANA Tutorial". saphanatutorial.com. Retrieved 2015-11-05. 
  7. ^ "What's Unique About a Columnar Database? | FlyData". FlyData. Retrieved 2015-11-05. 
  8. ^ "What's So Unique About a Columnar Database?". 2015-02-06. 
  9. ^ "Column Oriented Database Technologies | DB Best Chronicles". www.dbbest.com. Retrieved 2015-11-05. 
  10. ^ "The Database Decision: A Guide". Data Informed. Retrieved 2015-11-05.