BK-tree

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

A BK-tree is a metric tree suggested by Walter Austin Burkhard and Robert M. Keller[1] specifically adapted to discrete metric spaces. For simplicity, consider integer discrete metric . Then, BK-tree is defined in the following way. An arbitrary element a is selected as root node. The root node may have zero or more subtrees. The k-th subtree is recursively built of all elements b such that . BK-trees can be used for approximate string matching in a dictionary.[2][example needed]

Example[edit]

An example of BK-tree

This picture depicts the BK-tree for the set of words {"book", "books", "cake", "boo", "boon", "cook", "cake", "cape", "cart"} obtained by using the Levenshtein distance

  • each node is labeled by a string of ;
  • each arc is labeled by where denotes the word assigned to .

The BK-tree is built so that:

  • for all node of the BK-tree, the weight assigned to its egress arcs are distinct;
  • for all arc labeled by , each descendant of satisfies the following equation: :
    • Example 1: Consider the arc from "book" to "books". The distance between "book" and any word in {"books", "boo", "boon", "cook"} is equal to 1;
    • Example 2: Consider the arc from "books" to "boo". The distance between "books" and any word in {"boo", "boon", "cook"} is equal to 2.

Insertion[edit]

The insertion primitive is used to populate a BK-tree according to a discrete metric .

Input:

  • : the BK-tree;
    • denotes the weight assigned to an arc ;
    • denotes word assigned to a node );
  • : the discrete metric used by (e.g. the Levenshtein distance);
  • : the element to be inserted into ;

Output:

  • The node of corresponding to

Algorithm:

  • If the is empty:
    • Create a root node in
    • Return
  • Set to the root of
  • While exists:
    • If :
      • Return
    • Find the child of such that
    • If is not found:
      • Create the node
      • Create the arc
      • Return

Lookup[edit]

Given a searched element , the lookup primitive traverses the BK-tree to find the closest element of . The key idea is to restrict the exploration of to nodes that can only improve the best candidate found so far by taking advantage of the BK-tree organization and of the triangle inequality (cut-off criterion).

Input:

  • : the BK-tree;
  • : the corresponding discrete metric (e.g. the Levenshtein distance);
  • : the searched element;
  • : the maximum distance allowed between the best match and , defaults to ;

Output:

  • : the closest element to stored in and according to or if not found;

Algorithm:

  • If is empty:
    • Return
  • Create a set of nodes to process, and insert the root of into .
  • While :
    • Pop an arbitrary node from
    • If :
    • For each egress-arc :
      • If : (cut-off criterion)
        • Insert into .
  • Return

See also[edit]

References[edit]

External links[edit]

  • A BK-tree implementation in Common Lisp with test results and performance graphs.
  • An explanation of BK-Trees and their relationship to metric spaces [3]
  • An explanation of BK-Trees with an implementation in C# [4]
  • A BK-tree implementation in Lua [5]
  • A BK-tree implementation in Python [6]