Talk:Trie

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Computer science (Rated B-class, High-importance)
WikiProject icon This article is within the scope of WikiProject Computer science, a collaborative effort to improve the coverage of Computer science related articles on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
B-Class article B  This article has been rated as B-Class on the project's quality scale.
 High  This article has been rated as High-importance on the project's importance scale.
 

Erroneous Python example[edit]

The Python example seems to be wrong as it lacks a recursive "find". — Preceding unsigned comment added by 178.2.130.142 (talk) 10:12, 15 February 2014 (UTC)

Disadvantages[edit]

I think there should be a section describing the disadvantages of tries. Currently, there are a few disadvantages listed under the "comparison with other data structures" section.

And one disadvantage that I don't see mentioned anywhere is the space/time tradeoff of managing child nodes: either you need a (potentially large and sparse) array of indexes at each node, or you need to implement a secondary search algorithm to find the appropriate child(eg, the Python example uses a hash lookup over a dictionary of children). While the indexed approach may not be an issue for relatively shallow tries using ASCII (ie, the example at the top of the page), it will quickly become an issue with deep tries and/or large value spaces (eg, Unicode). Perhaps someone with more practical experience can comment on this? —Preceding unsigned comment added by 64.94.198.98 (talk) 17:21, 25 May 2010 (UTC)

Misleading Diagram[edit]

A better diagram for the page would show a trie where there are more than two edges from each node. The current diagram makes a trie seem too much like a binary search tree, and the idea that each node can have multiple edges depending on the lexicon is lost. —Preceding unsigned comment added by 134.197.40.199 (talk) 13:29, 11 February 2009 (UTC)

This has now been fixed. me_and (talk) 09:39, 20 May 2010 (UTC)

As replacement of other data structures (hash table)[edit]

The article currently says that "The worst-case lookup speed in an imperfect hash table is O(log(N)) time." I would think that the worst case would be O(N) in the case where all keys were mapping to the same value, and the resulting linked list of length N must be traversed. Tom Hubbard 02:27, 11 September 2007 (UTC)

OK, it looks like someone fixed this -- thanks. Tom Hubbard (talk) 14:00, 28 November 2007 (UTC)

Complexity of tries[edit]

I am just wondering, Wolfskeeper, if you yet understood complexity theory and tries, or if you just go around breaking articles because you did not (yet). Perhaps we should sort this out before you make further changes, that might need to be undone as well. --ISee 10:28, 28 July 2005 (UTC)


The pseudo code for lookup looks wrong. This will never finds strings that are prefixes of other strings that are also stored in the trie. Am I mistaken?

  • That's exactly what I thought. I think the other algorithm was actually wrong entirely, since it would only match true if there was a NULL node (no node at all) for the end of the word. I've made the correction of adding a "terminal" field which denotes the end of a word. —EatMyShortz 13:52, 30 April 2006 (UTC)


"Average lookup speed is theoretically the same, but a trie is faster in practice." That seems like a highly dubious claim, unsupported by evidence! In my experience the opposite is usually true (the main benefit of tries is thus that it can grow easily, and it may take less space since prefixes of keys are shared).

  • The performance is highly related to the usage pattern. In a hashtable, you can retrieve a value while touching only in 1-2 cache lines on average, whereas in a trie this can be upto its depth (length of the prefix). However, a HT will almost always require these cache fetches whereas especially in a tight loop the trie will often have most or all of the lines cached (and esp for negative results). --66.7.145.210 18:27, 5 October 2006 (UTC)
    • Also as I know, a hashtable lookup requires the search-key to be hashed before while the lookup in a trie can start almost instantly. Does anybody else know the Judy Arrays or Google sparsehash? I think these are an implementation of tries (at least its JudySL class) altough the term "trie" is never used to describe both of them. --195.3.81.25 10:15, 27 August 2007 (UTC)


I'm not certain-enough of this to edit the document without research, but... If you are going to take the length of keys into account when discussing the complexity of lookup with tries then you should do the same with the other methods --otherwise you are comparing apples to oranges. Normally in discussing binary trees and hash tables, it is assumed that the keys are small constant-sized objects. When discussing tries, you have to think of strings. In this case, each comparison in a binary-tree search is a O(m) string compare so the complexity of the lookup is not O(log n) but O(log n*m). Similarly, the expected lookup time for a hash table is O(m), not O(1) because forming the hash typically takes O(m) time (you can avoid looking at all characters, but that probably effects the expected lookup time). --David Gudeman —Preceding unsigned comment added by 69.181.140.228 (talk) 04:22, 14 July 2008 (UTC)

Diagram Numbering[edit]

In the diagram of trie on the right hand side of http://en.wikipedia.org/w/index.php?title=Trie&oldid=65534606 the elements in the trie are numbered. What do these numbers represent? I can't for the life of me work it out - could an explanation also be added to the article?

I don't think they are intended to have any meaning. The use of numbers may not have been the most effective way to make their point, which is that only nodes with valid keys store values. Notice that the "t" and "te" nodes are not marked with numbers. This is because "t" and "te" are not real English words; therefore, they cannot serve as keys to any value (this _seems_ to be the idea). I don't think readers will be confused by this in general though, because the third paragraph provides an explanation. Maybe this was added after you made your request. Danielx 08:19, 21 August 2006 (UTC)

I agree with Danielx about the meaning of the numbers (i.e. they are merely arbitrary "values" associated with the string "keys"). However I have a problem with the diagram caption. Since, as Danielx points out, "t" and "te" are not marked with numbers then they are not "keys", but just key prefixes. The caption lists them as if they were keys and I believe that may be confusing. Paritybit (talk) 03:48, 10 September 2008 (UTC)

Binary?[edit]

It looks like Tries are not binary (Ie. A node doesn't have at most two children). This would be much more clear if the diagram featured a node with more than two children. (Whoops, forgot to sign, here it goes:) --Thenickdude 05:03, 2 September 2006 (UTC)

True, that. I should fix the diagram. Deco 02:33, 2 September 2006 (UTC)
I updated it so a couple nodes have three children. Superm401 - Talk 23:03, 8 March 2009 (UTC)

Knuth reference[edit]

When looking up the reference by Knuth (The Art of Computer Programming, Volume 3: Sorting and Searching). I couldn't find a third edition, only a second edition from March 1998 (first print)

More information can also be found on the publishers website: http://www.aw-bc.com/catalog/academic/product/0,1144,0201896850,00.html

Maybe someone with more experience (this is my first Wikipedia post) could verify this and modify the article.

Thanks in advance Jwk3402 13:04, 2 April 2007 (UTC)

Other names[edit]

Is discrimination tree another name for a trie? Wikipedia doesn't have an entry for D. T. and Google isn't helping. -- Ralph Corderoy 11:50, 28 May 2007 (UTC)

Huffman coding[edit]

Maybe Huffman coding should be added to "See also", since the basic idea seems to be very similar to me. Subwy (talk) 20:22, 13 August 2008 (UTC)

Yeah sort of. You can interpret a Huffman tree as being a trie, and Huffman coding as describing the search paths followed when you look up a sequence of characters in that trie. It's not a bad analogy. Dcoetzee 20:43, 13 August 2008 (UTC)
I too think so, I am adding it to the article. --Jyoti (talk) 06:46, 15 November 2008 (UTC)

Why did Edward Fredkin choose that word?[edit]

Since he pronounced it homophonous to ‘tree’, didn't he realize that it was a pretty stupid choice, because that would make it impossible to distinguish the words in speech? If he was so desperate to combine ‘tree’ and ‘retrieve’, surely he could have done better? Shinobu (talk) 22:06, 5 October 2008 (UTC)

Example for implicit branch labels?[edit]

The article currently states that the label of the branches is often implicit in the ordering of the branches. I'm trying to think of a good example of this but all I can think of are binary prefix codes, which sort of use binary lookup trees anyway. All the examples in the article seem to assume every node has a map of its children (though my Haskell isn't all that up to scratch). Perhaps an explicit example would be useful... Wppds (talk) 09:05, 8 January 2009 (UTC)

If keys were made of us US-ASCII lower-case letters, and each node had an array of 26 child pointers, you wouldn't have to say "this character is a 'c' ", because you could know that from the pointer to the node being in the 3rd position. — Preceding unsigned comment added by JimJJewett (talkcontribs) 21:51, 7 May 2013 (UTC)

Section on lexicographic sorting using a trie is unclear and in some cases, wrong[edit]

Only a pre-order traversal can be used to sort the keys in lexicographic order. Furthermore, a lexicographic ordering of edge pointers within each node is required in order to use the pre-order traversal. In other words, the children of a node would have to be traversed in lexicographic order.

Please see this link for a reference: http://books.google.com/books?id=G_U8ghxhT4wC&pg=PA379&dq=lexographic+sorting+using+trie

Calvin (talk) 17:22, 18 February 2009 (UTC)

Lexicographic ordering and post-order traversal[edit]

Calvin is right: the article mistakenly claims that post-order traversal of the trie will result in lexicographically-reversed output. That's quite misleading and wrong. A better, pedagogical approach might be to consider an Euler traversal (Euler traversal)(generalized to trees with more than 2 children) in this part of the discussion. An Euler tour of a tree structure covers both pre-order and post-order traversal of a tree structure in the same pass. Pre-order events can be used to construct and output the strings in lexicographical order but you also need to capture post-order events in order to know when to remove characters from the prefix string.

--Here2L8 (talk) 20:26, 5 April 2009 (UTC)

trie = DFA ?[edit]

Other than the confusing name, how is this different from a deterministic finite automaton? 12.108.187.222 (talk) 20:25, 7 May 2009 (UTC)

It's a special case of a DFA -- precisely, it's a DFA which is a tree. --196.215.6.126 (talk) 14:35, 17 July 2010 (UTC)

sorting[edit]

In the sorting section, the two statements

Pre-order traversal is a kind of depth-first traversal. In-order traversal is another kind of depth-first traversal that is more appropriate for outputting the values that are in a binary search tree rather than a trie.

are confusing. Should one of them be breath-first traversal? Kowloonese (talk) 17:37, 30 October 2009 (UTC)

Compared to hash tables[edit]

The article currently says, “[…] tries have the peculiar feature that the time to insert, or to delete or to find is almost identical … (unlike say hash tables where deletion is extremely fast, but finding and especially insertion is considerably slower).“ The part in parentheses makes no sense. First of all, finding must be performed before deletion so deletion cannot be faster (and isn’t). It also shouldn’t be faster than insertion, neither in open nor in closed addressing. I’m removing that part since it’s wrong (or at best easily misunderstood), and also unnecessary. —Preceding unsigned comment added by 77.186.145.151 (talk) 15:51, 18 July 2010 (UTC)

To be fair I could interpret this as referring to deletion via an iterator, which could maintain a pointer to the data item, in which case it does appear to represent an actual (but relatively unimportant) distinction between tries and hash tables. I support removing it. The distinction of actual importance is the prefix sharing by tries. Dcoetzee 18:33, 18 July 2010 (UTC)
I think the phrase in parentheses is simply reversed -- Hash tables may take extraordinarily long to delete an element if using a closed hash, because the entire table needs to be re-hashed to settle any collisions that will no longer occur. That is, subsequent searches will need to know that they should consider the deleted table entry a possible collision. This can be fixed by using tombstone values, or instead using a chained/open hash table. —Preceding unsigned comment added by 216.239.45.4 (talk) 19:24, 25 August 2010 (UTC)

Is a trie a data structure or an algorithm?[edit]

The opening sentence says a trie is a data structure. But then the first section is headed Advantages relative to other search algorithms, and it begins, "Unlike most other algorithms, tries...".

So is a trie a data structure or an algorithm? On this talk page under trie = DFA ? an anonymous poster concludes that a trie is a DFA. That seems to be more accurate than either data structure or algorithm. IOLJeff (talk) 14:27, 3 July 2011 (UTC)

It's a data structure of course. It stores data. Like other data structures, it comes with algorithms for accessing (searching) and updating it - that's what first section seems to be talking about. You could view a trie as a DFA. But note that you can insert or remove strings from tries at any time and this results in different DFA's, so this viewpoint is restricted only to immutable tries. -- X7q (talk) 18:18, 3 July 2011 (UTC)

Advantages/Disadvantages mess[edit]

The article is currently a real mess when it comes to trying to understand why and when Tries are used instead of hash tables, for example. There is the "advantages" section, and there is the "instead of other data structures" section, both trying to say the same thing, but both completely missing the point:

Tries are mostly (or only?) used to to hold a relatively-dense list of words - it could be a spell-checking dictionary (the only good example given in this article), a routing table (see Patricia Tree, a variant of trie), and so on. The advantages of tries over hash tables here are: 1. They often take less memory (as whole words aren't stored), 2. They are MUCH faster to build given a sorted word list (as well as easily saving a sorted word list), 3. While hash-tables can only do exact word matching, tries make it easy to do longest prefix match, approximate string matching, and so on.

195.110.40.7 (talk) 08:47, 13 March 2012 (UTC)

There are also some statements in there for which we have to let's say make charitable assumptions in order for them to be true at all - for instance, the claim that searching a binary tree takes O(m log n) time, which assumes that comparing a key takes m time. Elsewhere the article describes hash tables as constant-time, which requires the opposite (and more customary) assumption that key-sized operations are constant-time. I suggest removing the entire advantages/disadvantages section. It does not seem to reflect nor contribute to a good understanding of what's actually going on in these data structures, and it is is uncited. 130.179.29.61 (talk) 19:18, 15 March 2013 (UTC)

Please verify the "A compiling Python version" version[edit]

The code uses a defaultdict, but then the algorithms sometimes rely on getting a KeyError, which is precisely what a defaultdict with a factory prevents. Perhaps setdefault would work better.

JimJJewett (talk) 21:59, 7 May 2013 (UTC)

The "compiling Python version" compiles but doesn't work indeed :) Like you say, the "remove" never hits the False clause because of the factory passed to defaultdict, and prefix goes out of range when going down to empty string. I've re-written it to work as folows, although the prefix method should be removed, or the lookup method should be re-written not to cache searches so that they make sense together. BACbKA (talk) 19:04, 12 May 2013 (UTC)

Since nobody seems to care, I'm moving my code into the article, even though I haven't addressed the prefix vs lookup issue. At least it works now... Thanks for the alert, JimJJewett! BACbKA (talk) 21:07, 11 June 2013 (UTC)

Confusion about keys[edit]

Would be clearer to change: Unlike a binary search tree, no node in the tree stores the key associated with that node

to

Unlike a binary search tree, no node in the tree stores the entire key associated with that node --129.42.208.183 (talk) 22:00, 20 January 2014 (UTC)

Your formulation would suggest that nodes store part of the key. That's not true: it's the edges (links) that encode the keys, not the nodes. The nodes only carry values or "final" bits. QVVERTYVS (hm?) 22:33, 20 January 2014 (UTC)