Jump to content

Hash join

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Hjzla (talk | contribs) at 23:55, 18 March 2006. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

The Hash Join is an example of a join algorithm and is used in the implementation of a relational database management system.

The basic problem of a join algorithm is to find, for each distinct value of the join attribute, the set of tuples in each relation which display that value.

Applying the hash join algorithm on a inner join of two relations proceeds as follows: first prepare a hash table for the smaller relation, by applying a hash function to the join attribute of each row, and then scan the larger relation and find the relevant rows by looking in the hash table.

Hash joins require an equi-join predicate (a predicate comparing values from one table with values from the other table using the equals operator '='). Hash joins have the advantage that each table is read only once and that no sort operations are required. In their simplest form they require that the smaller table fits in memory. Various schemes (e.g. Grace Join or Hybrid Hash Join) have been proposed to handle cases where the smaller table does not fit into memory.

Hash joins were first mentioned in 1984 in a paper "Hashing Methods and Relational Algebra Operations" by Kjell Bratbergsengen.