User:Mckaysalisbury/Relational database

From Wikipedia, the free encyclopedia

A relational database is simply a database that conforms to the relational model. The term is used to refer to the data, and the structure of that data. The software used to create a relational database is called the Relational Database Management System (RDBMS), but sometimes that software is mistakenly called the relational database.

The term was originally defined and coined by E.F. Codd[1]. Codd's definition is now not the only usage of the term, as many modern DBMS manufacturers have adopted a more relaxed usage of the term.

Definitions[edit]

As mentioned above, a relational database is a database that conforms to the relational model. A relational database could also be defined as a set of relations or a database built in an RDBMS.

An RDBMS is sometimes incorrectly called a relational database. But, strictly speaking, Oracle, Microsoft SQL Server, and MySQL are *not* relational databases. Under popular usage of the term, these software packages are called "Relational Database Management Systems" (RDBMS), and as such they can be used to create relational databases. There is some disagreement as to whether or not they can be considered "relational", because they do not fully conform to the relational model. For a more full explanation on the requirements for a DBMS to be considered "relational", see RDBMS#Current Usage.

Contents[edit]

Strictly speaking, a relational database is merely a collection of relations (frequently called tables). Other items are frequently considered part of the database, as they help to organize and structure the data, in addition to forcing the database to conform to a set of requirements.

Relations or Tables[edit]

A relation is defined as a set of tuples that all have the same attributes. This is usually represented by a table, which is data organized in rows and columns. In a relational database, all of the data stored in a column should be in the same domain (i.e. data type). In the relational model, the tuples should not have any ordering. This means both, that there should be no order to the tuples, and that the tuples should not impose an order of the attributes. This is the same as saying that neither the rows nor the columns should have an order. While this is the desired result, it is not universally achieved. All data stored in a computer has to have an order, as the memory of a computer is linear. Also, when the data is returned, there must be an order in which the data is returned (because all transfer protocols are linear, and coincidentally enough, humans read in a linear fashion). The point here is that this order must never make a logical difference in the system. Frequently orders are imposed which impact performance, but they should never change the result of a query on the database. In practice, several of the DBMSs that are considered "relational" impose an order that makes a logical difference.

Constraints[edit]

Constraints are a way of providing restrictions on the kinds of data that can be stored in the relations. These are usually defined (formally) in the form of expressions that result in a boolean value, indicating whether or not the constraint holds. Constraints are a way of implementing business rules into the database.

Under the strictest sense, constraints are not considered part of the relational database, but because of the integral role which they play in organizing data, they are usually considered part of the database.

Keys[edit]

A tuple usually represents some object, and the data associated with that object, whether that object is a physical object, or a concept. A key is a kind of constraint that that object, or critical information about that object, isn't duplicated. For example, in an immediate family, the head of that family would probably like to have a constraint such that no two people in the family have the same name. If information about family members were stored in a database, a key could be placed over the family member's name. In a University, they have no such luxury. Each student is typically assigned a Student ID, which is then made the key. Keys can have more than one column, for example, a nation may impose a restriction that a province can't have two cities by the same name. So, when cities are stored in a relation, there would be a key defined over province and city name. This would allow two different provinces to have a town called Springfield (because their province would be different), but not two cities with the same name in the same province. A key over more than one attribute is called a compound key. Theoretically, a key can even be over zero attributes. This would enforce that there cannot be more than 1 tuple in the relation.

Most relations have at least one key defined on it. Because a relation is defined as being a set, it can't have duplicate rows. While some DBMSs don't enforce this, if it does, it means that there is always at least one key on each relation, namely the key involving all of the attributes of the relation.

A key could be defined formally by requiring that the cardinality of the relation should be equal to the cardinality of the relation projected over the columns of the key.

A key, in this context, refers to any set of attributes which uniquely span the relation. In particular, this is called a superkey. A candidate key is a minimal superkey, meaning that, none of the attributes in the key could be removed from the key, and still have that attribute set be a key. Some DBMSs have a concept of a primary key, The primary key (usually a candidate key) is the key most often used to identify a tuple. In many RDBMSs, the primary key of a base relvar is the storage key (sometimes clustered key), meaning that that is how the data is stored physically. If the value of the primary key is actual interesting data with logical ties to the data (like a name) for the tuple, it is called a natural key. If the key is generated and doesn't have any logical connection to the rest of the data in the tuple, it is called a surrogate key. Other candidate keys that were not chosen as the primary key are called alternate keys.

Foreign Keys[edit]

A foreign key is not a key by the previous definition. Rather, a foreign key is a reference to a key in another table. Meaning that the referencing tuple has, as it's attributes, the values of a key in the referenced tuple that corresponds to the relationship.

For example, a company has several departments, and each employee is a member of one department. This is enforced in the database by a foreign key. Imagine there is a relation for departments. Each department has a department name, and a surrogate key for the department called "DepartmentID". The employee relation would have "DepartmentID" as an attribute, with a foreign key reference to the aforementioned surrogate key of the department relation. The DBMS would then enforce that each employee could not be created without specifiying a valid department tuple, and department tuples could not be deleted if there are employees referencing that department.

A foreign key could be described formally as "For all tuples in the referencing relation projected over the referencing attributes, there must exist a tuple in the referenced relation projected over those same attributes such that the values in each of the referencing attributes match the corresponding values in the referenced attributes".

Transition Constraints[edit]

A transition constraint is a way of enforcing that the data doesn't enter an impossible state because of a previous state. For example, it shouldn't be possible for a female to change from being "married" to being "single, never married". The only valid states after "married" might be "divorced", "widow", or "deceased".

Other constraints[edit]

Other constraints of various different kinds can be created to enforce various kinds of business rules. They can be as simple as "the number of cars an individual owns must be non-negative" or complex patterns like "If the work that an employee performs is 'Hazardous Materials Transport' then that employee's age must be 18 years of age, and the employee's certifications must include 'Hazmat endorsement', and company insurance for that employee must include life insurance."

Other Relvars[edit]

A relvar is a "relation variable". In a relational database, all data is stored and accessed via relations. The data that is actually stored in the database are stored as relations. These relations are sometimes called "base relvars". This is equivalent to a "table". Other relvars do not have their data stored in them, but are a result of applying relational operations, to other relvars. These relvars are sometimes called "derived relvars", meaning that their information is derived from other sources. These are equivalent to "views" or "queries". Derived relvars are convenient in that though they may grab information from several relvars, it is presented externally as a single relvar for a simpler perspective. Also, it can be used as an abstraction layer.

Derived relvars are not always considered part of a relational database, partially because they are not essential to the functioning of the database.

Stored Procedures[edit]

A stored procedure is executable code that is associated with the database. Stored procedures usually store how to perform common operations, like inserting a tuple into a relation, or gathering statistical information about usage patterns. Frequently they are used as an application programming interface for security or simplicity. These are usually written as Imperative programming code extending the Data Definition Language and/or the Data Manipulation Language for the DBMS.

Stored procedures are not always considered part of a relational database, partially becacuse they are not essential to the functioning of the database.

Indexes[edit]

An index is a way of providing quicker access to the data in a relational database. Indexes can be created on any combination of attributes on a relation. Then when tuples in a relation need to be looked up, similar to how a book's index works, the index can be accessed. Rather than having to check all of the tuples, the index tells the DBMS where the tuple is. Indexes are usually implemented via B+ trees.

Indexes are usually not considered part of the database, as they are considered an implementation detail, though indexes are usually maintained by the same group that maintains the other parts of the database.

Relational operations[edit]

Queries made against the relational database, and the derived relvars in the database are expressed in a relational calculus or a relational algebra. Some relvars merely restrict the tuples that are returned, rename the attributes, or remove some of the attributes (often: project) from the result set. These three operations are called unary operations. Other operations that can be performed involve combining more than one relvar together. Examples of these are set union, set intersection, cartesian product (often called a "cross product"), and various different kinds of joins. These are all binary operations.

Normalization[edit]

Normalization is a process of altering the structure of the database to make the database conform to one or more best practices, to assist in performance and ease of data manipulation. By far, the most common forms of normalization applied to databases are called the normal forms. The most common normal forms that databases are normalized to are:

  • First normal form
Ensures that the RDBMS can access all of the data using relational means.
  • Second normal form
  • Third normal form
  • Boyce-Codd normal form
  • Fourth normal form
  • Fifth normal form
All reduce functional dependencies, and therefore, various kinds of data duplication, and therefore ease of correctly modifying the data.

Competition[edit]

International Data Corporation reports that as of 2005 the worldwide market for RDBMSs is 9.4% over the prior year at $14.6 billion[2]. Even though relational systems are popular, there are other options.

Older models[edit]

Flat file databases are simple. This prevents them from storing complicated information easily, but in flat file systems can provide better performance. Databases based on the hierarchical model and the network model are older, and not as robust as the relational model.

Objects[edit]

Object databases allow for even more complicated data structures, and are easier to integrate with Object-oriented programming tools. The price of these two features comes with an increased cost of development.

The relational model has a strong basis in predicate logic and set theory, so RDBMSs can find optimization patterns that most humans typically can't. In object databases, programmers can implement complicated access patterns themselves, and might be able to get better performance than an RDBMS, but with substantially increased query writing time.

Relational[edit]

Most commercial RDBMSs do not strictly conform to the relational model. In recent years, there has been siginificant growth on DBMSs that more faithfully adhere to the relational model. Dataphor, Duro, and Rel are three such implementations. They believe that being stricter adherents to Codd's 12 rules will make their systems easier to use. Hugh Darwen in presentations mentions that DBMSs that fail to adhere to the relational model are making logical differences, and each of these logical differences are big differences[3].

References[edit]

  1. ^ Codd, E.F. (1970). "A Relational Model of Data for Large Shared Data Banks". Communications of the ACM. 13 (6): 377–387.
  2. ^ 2005 RDBMS Market
  3. ^ Slideshow for his presentation The Askew Wall

See also[edit]

Application[edit]

Implementation[edit]

Theory[edit]

History[edit]

External links[edit]