Talk:Extendible hashing

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Computing  
WikiProject icon This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
 ???  This article has not yet received a rating on the project's quality scale.
 ???  This article has not yet received a rating on the project's importance scale.
 

Implementations[edit]

I just found that gdbm implements this algorithm. Might be of interest for students. --phil (talk) 20:51, 26 June 2009 (UTC)

Ronald Fagin has his own page here on Wikipedia, how to link it? I ? Unicode (talk) 12:40, 10 September 2009 (UTC)

Sample Code[edit]

I think i found a bug in the implementation: What happens if during a split the items of the bucket can't be redistributed (i.e. all items stay in one bucket) because one additional bit is not enough to distinguish between the hashvalues of the key.

Example

I try to add subsequently keys whose hashvalues are (I use the most significant bits). The capacity of my buckets are 1.

100
110

in the beginning everything is empty

gd=0        
[---] ---> [---]        d=0

Adding 100

gd=0       
[---] ---> [100]        d=0

Adding 110

[---] ---> [100] + 110  d=0  bucket is full => initiate split
                                               result of split: [100] d=1 [---] d=1

split according to most significant bit => *boom* all key-values stay in one bucket

the right thing to do would be doubling the directory more than once

or more algorithmically speaking: double size of directory until at least one value of the bucket to split is moved into the new bucket

situation after *one* split and *one* doubling of directory

gd=1
[0--] --- > [---]       d=1
[1--] --- > [100] + 110 d=1 

=> double directory and split *again*   result of split [100] d=2 [110] d=2

gd=2
[00-] ---> [---] d=1
[01-] -------^
[10-] ---> [100] d=2
[11-] ---> [110] d=2

I'd suggest the following fix.

    void put(K k, V v) {
        Page<K, V> p = getPage(k);
        if (p.full()) {
            Page<K, V> p1;
            Page<K, V> p2;
            do {
                p1 = new Page<K, V>();
                p2 = new Page<K, V>();
                // double directory
                if (p.d == gd) {
                    List<Page<K, V>> pp2 = new ArrayList<EH2.Page<K, V>>(pp);
                    pp.addAll(pp2);
                    ++gd;
                }
                // redistribute values
                for (Map.Entry<K, V> entry : p.m.entrySet()) {
                    K k2 = entry.getKey();
                    V v2 = entry.getValue();
 
                    int h = k2.hashCode() & ((1 << gd) - 1);
 
                    if (((h >> p.d) & 1) == 1) {
                        p2.put(k2, v2);
                    } else {
                        p1.put(k2, v2);
                    }
                }
                // update directory
                for (int i = 0; i < pp.size(); ++i) {
                    if (pp.get(i) == p) {
                        if (((i >> p.d) & 1) == 1) {
                            pp.set(i, p2);
                        } else {
                            pp.set(i, p1);
                        }
                    }
                }
                // try to add new value into correct page
                int h = k.hashCode();
                if (((h >> p.d) & 1) == 1) {
                    if (!p2.full()) {
                        p2.put(k, v);
                    }
                } else {
                    if (!p1.full()) {
                        p1.put(k, v);
                    }
                }
                p1.d = p2.d = p.d + 1;
                // if all values went to one page we have to do everything again
            } while (p1.empty() || p2.empty());
 
        } else {
            p.put(k, v);
        }
    }
}

Of course you have to add something like an empty()-method which returns (m.size() == 0) into the page class

212.5.16.228 (talk) 16:38, 1 July 2010 (UTC)