# Database storage structures

Database tables and indexes may be stored on disk in one of a number of forms, including ordered/unordered flat files, ISAM, heap files, hash buckets, or B+ trees. Each form has its own particular advantages and disadvantages. The most commonly used forms are B+ trees and ISAM. Such forms or structures are one aspect of the overall schema used by a database engine to store information.

## Unordered

Unordered storage typically stores the records in the order they are inserted. Such storage offers good insertion efficiency (${\displaystyle O\left(1\right)}$), but inefficient retrieval times (${\displaystyle O\left(n\right)}$). Typically these retrieval times are better, however, as most databases use indexes on the primary keys, resulting in retrieval times of ${\displaystyle O\left(\log n\right)}$ or ${\displaystyle O\left(1\right)}$ for keys that are the same as the database row offsets within the storage system.[citation needed]

## Ordered

Ordered storage typically stores the records in order and may have to rearrange or increase the file size when a new record is inserted, resulting in lower insertion efficiency. However, ordered storage provides more efficient retrieval as the records are pre-sorted, resulting in a complexity of ${\displaystyle O\left(\log n\right)}$.[citation needed]

## Structured files

### Heap files

• Simplest and most basic method
• insert efficient, with new records added at the end of the file, providing chronological order
• retrieval inefficient as searching has to be linear
• deletion is accomplished by marking selected records as "deleted"
• requires periodic reorganization if file is very volatile (changed frequently)
• efficient for relatively small relations as indexing overheads are avoided
• efficient when retrievals involve large proportion of stored records
• not efficient for selective retrieval using key values, especially if large
• sorting may be time-consuming
• not suitable for volatile tables

Heap files are lists of unordered records of variable size. Although sharing a similar name, heap files are widely different from in-memory heaps. In-memory heaps are ordered, as opposed to heap files.

### Hash buckets

• Hash functions calculate the address of the page in which the record is to be stored based on one or more fields in the record
• ‘occupancy’ is generally 40% to 60% of the total file size
• unique address not guaranteed so collision detection and collision resolution mechanisms are required
• Chained/unchained overflow
• Pros and cons
• efficient for exact matches on key field
• not suitable for range retrieval, which requires sequential storage
• calculates where the record is stored based on fields in the record
• hash functions ensure even spread of data
• collisions are possible, so collision detection and restoration is required

### B+ trees

These are the most commonly used in practice.

• Time taken to access any record is the same because the same number of nodes is searched
• Index is a full index so data file does not have to be ordered
• Pros and cons
• versatile data structure – sequential as well as random access
• access is fast
• supports exact, range, part key and pattern matches efficiently.
• volatile files are handled efficiently because index is dynamic – expands and contracts as table grows and shrinks
• less well suited to relatively stable files – in this case, ISAM is more efficient

## Data orientation

Most conventional relational databases use "row-oriented" storage, meaning that all data associated with a given row is stored together. By contrast, column-oriented DBMS store all data from a given column together in order to more quickly serve data warehouse-style queries. Correlation databases are similar to row-based databases, but apply a layer of indirection to map multiple instances of the same value to the same numerical identifier.