Talk:Binary search tree

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Computing (Rated B-class, Low-importance)
WikiProject icon This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
B-Class article B  This article has been rated as B-Class on the project's quality scale.
 Low  This article has been rated as Low-importance on the project's importance scale.
 
WikiProject Computer science (Rated B-class, High-importance)
WikiProject icon This article is within the scope of WikiProject Computer science, a collaborative effort to improve the coverage of Computer science related articles on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
B-Class article B  This article has been rated as B-Class on the project's quality scale.
 High  This article has been rated as High-importance on the project's importance scale.
 

"Value" versus "Key"[edit]

The definition and description use the term "value", and some of the code examples use the term "key". I believe they are interchangeable in this article. We should pick one and be consistent. I propose we pick "key". I think it is more precise. —Preceding unsigned comment added by Adamuu (talkcontribs) 21:20, 21 July 2009 (UTC)

(This issue is hardly unique to this article.) " ...each node has a comparable key (and an associated value)' certainly implies that they are distinct, but what - a tree is not a hash!

68.183.92.210 (talk)

Nodes with Equal Sort Value[edit]

Does anyone know what a tree with duplicate sort values looks like? -Smoke003723 —Preceding unsigned comment added by 203.143.250.6 (talk) 10:41, 7 July 2008 (UTC)

"Introduction to Algorithms" by Cormen, Liesersonand ,Rivest define the binary search tree as: "Let x be a node in a binary search tre. If y is a node in the left subtree of x, then key[y] <= key[x]. If y is a node in the right subtree of x, then key[x] <= key[y]" (Section 13.1 p 245 16th printing 1996). According to this then the duplicates could be on either side of a node which seems a little odd. Functions like equal_range() in C++ STL which do a binary search in a sorted vector return a range of values via a pair of iterators. —Preceding unsigned comment added by 67.164.72.147 (talk) 20:13, 28 February 2010 (UTC)

bst[edit]

Shouldn't the definition of BST be that every node's right subtree has values greater than or equal to the node's value? The current definition says it's strictly greater, which isn't the case.

You can put equal nodes either on the left or the right, but not both. It's an arbitrary decision. I'll say this somewhere. Deco 01:17, 12 May 2005 (UTC)

I disagree. Here is an excerpt from the lecture I once took at UC Berkeley: "A binary search tree satisfies the _binary_search_tree_invariant_: for any node X, every key in the left subtree of X is less than or equal to X's key, and every key in the right subtree of X is greater than or equal to X's key.". The full lecture can be found here (http://www.cs.berkeley.edu/~jrs/61b/lec/26). I've also verified this with my reference book on algorithms.

I'm sorry, it can be done this way (although there are authoritative references which don't do it this way). It makes some operations more difficult, such as finding all the nodes with a given value, but it really makes almost no difference which way it is defined. In the article I've used your definition, then clarified that in some definitions equal values can be excluded from the left or right side. Deco 01:02, 14 May 2005 (UTC)

By this definition self-balancing binary search trees are not binary search tree. Rotations do not keep this _binary_search_tree_invariant_. To be concrete: Consider an empty AVL tree and insert three equal values. After the rotation there are equal values to both sides of the root node. So you either need to exclude equal values from binary search trees or allow equal values to both sides of a node. E.g: "The left subtree of a node contains only values less or equal to the node's value. The right subtree of a node contains only values greater than or equal to the node's value." 141.84.26.134 01:04, 27 February 2007 (UTC)

It is confusing that the article starts by claiming that equal nodes should always be in the right subtree, while the example image puts equal nodes (such as the seven, which equals the root) in the left subtree. 128.84.153.196 13:29, 29 September 2006 (UTC)

Copyright violation?[edit]

Why does the blurb about Optimal BSTs have a reference to "section 15.5 of Introduction to Algorithms" ? Is this text copied and pasted out of a book? Is this a (c) violation?

Copyright does not cover ideas but an explicit realization of those ideas (see idea-expression divide). The text here, written by me, was based on my understanding of the material in CLRS and not copied verbatim from that book. Deco 00:41, 15 December 2005 (UTC)

Why are the examples in Python?[edit]

I know that the example algorithms are merely for show - but wouldn't examples in a more mainstream language like C/C++ or Java be better for the purpose of understanding? The Phython deletion algorithm, for instance, uses return structures that are foreign to C/C++ or similar languages.

If you think a mainstream programmer might find them difficult to understand that's a legitimate concern. I agree that "find_remove_max" is quite obscure in its use of tuples of two return values. On the other hand, I think the use of tuples or records combined with pattern matching really abbreviates things like the insertion and search algorithms, while still being pretty clear, which is a pretty good reason to use them instead of C/C++/Java. I'm going to look at rewriting parts of the deletion code in a more intuitive way... er, as soon as I learn a little Python. :-) Deco 22:27, 15 February 2006 (UTC)
I decided to write "destructive" versions of insert and delete in C++, which lends itself quite well to this purpose, as you can see from the relative simplicity of the resulting code. I left in the Python insertion to demonstrate the concept of persistent trees, but took out the excessively complex Python deletion example. I hope this makes things clearer for you! Deco 03:48, 16 February 2006 (UTC)

Yes - thank you very much. The C++ example is much more clear to me - and you are right, it was the returning tuples that was causing my head to spin. --208.186.134.102 06:00, 16 February 2006 (UTC) Wouldn't the code in the new C++ delete example cause memory leaks when you set "node=node->right" and don't delete node->right? (or node->left)... (thanks for adding it... it's a good, simple example. very clean) ++Ryangardner

Good point. I've gotten too used to garbage collection. :-) Thanks for your positive feedback. Deco 06:27, 16 February 2006 (UTC)

I see a few other (minor) flaws in the algorithm on the page. If you are deleting a leaf node, you must remove the link from the parent of that node to the node itself. I also think that the delete method as it is implemented would cause an error, because you are deleting the node and then trying to get data from it... Ryangardner

Slightly off topic, but could the python examples be made clearer by the use of exceptions instead of returning numbers (eg. -1, -2)? --RohanMitchell (talk) 00:10, 11 June 2008 (UTC)

I also find it bizarre & irritating that the examples are in python. C is the appropriate choice here. —Preceding unsigned comment added by 124.168.85.52 (talk) 14:53, 30 August 2010 (UTC)

I can't understand the "Traversal" example because I don't know what "callback" is or does. An example in another language is needed.

68.183.92.210 (talk) — Preceding undated comment added 00:22, 15 May 2014 (UTC)

Merge with binary search algorithm[edit]

It seems to me this article details the same concept as Binary search algorithm. I suggest a merger.He Who Is 01:59, 25 May 2006 (UTC)

No, they are very different. The binary search algorithm is an algorithm that works on sorted arrays. To use it, you need an array and you need to sort it beforehand (which is expensive). The binary search tree is a data structure. Unlike arrays, binary search trees do not preserve order of insertion. Binary search trees are used because they have the advantage that the tree structure encodes the relative ordering of the elements (i.e. it is always "sorted"), without needing to sort. --Spoon! 00:04, 24 August 2006 (UTC)

Errors[edit]

The deletion algorithm should check for the case of both left and right being NULL before checking either side.

No, the code works fine - I tested it. If both sides are NULL, then instead of assigning the other side it will assign NULL (after deleting the node) - which is correct behavior. —Preceding unsigned comment added by 68.207.121.5 (talk) 05:35, 9 November 2007 (UTC)

Indeed, the deletion doesn't seem to work. Could anybody fix this? Jogers (talk) 15:14, 6 September 2006 (UTC)
Yup, it does seem to be broken. Here is some code that should work. It compiles, but I don't feel up to verifying that it works 100% by actually testing it. It will be better than what's there, though...
 void DeleteNode(struct node* node) {
     struct node **parentLink;
 
     // If there is no parent, we are a root node
     if (parent == NULL) { 
         free(node);
         return;
     }
       
     // Figure out which child of the parent we are
     if (node == node->parent->left)
         parentLink = &(node->parent->left);
     else if (node == node->parent->right) 
         parentLink = &(node->parent->right);
 
     // This will handle cases where both children are null.
     if (node->left == NULL) { 
         *parentLink = node->right;
         free(node); // Free any memory allocated for the node
     } else if (node->right == NULL) {
         *parentLink = node->left;
         free(node); // In C++, delete node could be used here
     } else {
         // Follow the right side of the tree until there is an empty space
         struct node* temp = node->left;
         while (temp->right != NULL) {
             temp = temp->right;
         }
         node->value = temp->value;
         DeleteNode(temp);
     }
 }
Wolever 02:50, 31 March 2007 (UTC)

I worked on understanding the code in the article and found some issues. I described them on my blog, see http://kasterma.wordpress.com/2008/06/15/the-wikipedia-entry-on-bst/ 20:50, 15 June 2008 (GMT+1) —Preceding unsigned comment added by 83.117.8.28 (talk)


I hate to keep just pasting code around like this, but here's my take on the C version of removeNode.

void removeNode(bst *myBST, bstnode* node) {
	bstnode* other = NULL;
	if (node)
	{
		if (node->left)
		{
			other = node->left;
			if (other->right)
			{
				while (other->right)
				{
					other = other->right;
				}
				other->parent->right = NULL;
			}
		}
		else if(node->right)
		{
			other = node->right;
			if (other->left)
			{
				while (other->left)
				{
					other = other->left;
				}
				other->parent->left = NULL;
			}
		}
		if (other){
			if (myBST->head == node)
			{
				myBST->head = other;
				other->parent = NULL;
			}
			else
			{
				other->parent = node->parent;
				if (node == node->parent->left)
				{
					node->parent->left = other;
				}
				else
				{
					node->parent->right = other;
				}
			}
		}
		free(node);
	}
}

Tiger of Doom (talk) 01:33, 26 June 2008 (UTC)

Does it necessarily need a total ordering?[edit]

C++ STL's "Sorted Associative Containers" (which includes "set", "multiset", "map", and "multimap" classes, all of which are binary search trees) only requires its key types to be a strict weak ordering: http://www.sgi.com/tech/stl/SortedAssociativeContainer.html --Spoon! 00:08, 24 August 2006 (UTC)

You're right, a strict weak ordering suffices to order the tree and enable useful operations. The details behind this are rather complicated though and I think would be best relegated to their own section. Deco 04:09, 24 August 2006 (UTC)

Optimal binary search trees[edit]

In the section about Optimal binary search trees, it is mentioned that:

We can then use a dynamic programming solution, detailed in section 15.5 of Introduction to Algorithms by Thomas H. Cormen Sec Edition, to construct the tree with the least possible expected search cost.

I wish that someone who has the book (or has a nearby library from which he can borrow it) could take the time to write about it.

Thanks. --Hosam Aly 20:26, 7 January 2007 (UTC)

And the given code-piece "procedure Optimum Search Tree" is obviously incomplete. --Nomen4Omen (talk) 21:00, 22 December 2012 (UTC)

Insertion[edit]

I think the insertion code is not very clear, so I suggest this one, written in pseudocode:

x is current node, y always points to x's parent.

Tree-Insert(T, z)
 y = null                        //x will be set to root, root's parent is null
 x = root(T)
 while x != null do              //we will traverse the tree until we find the leaf
    y = x                        //we are going to change x, so store x as it will became next parent
    if key(z) < key(x)           //if the value of inserted node is lesser then actual node value, move left
      then x = left(x)
      else x = right(x)
                                 //now we should have y pointing to the leaf, and x to null (notice when the while loop terminates)
 parent(z) = y                   //this leaf will become our parent
 if y = null                     //this is a special case, when we are inserting into an empty tree
   then root(T) = z              //so the inserted node will be the root node
   else
     if key(z) < key(y)          //tree is non-empty, so insert the node
       then left(y) = z
       else right(y) = z


Kremso 17:15, 11 January 2007 (UTC)

Heck, why not just re-write all the code in psudo code? It will keep people from directly copy+pasting it in to their code (which generally won't work so well) and it is quite a bit easier to understand. Wolever 02:50, 31 March 2007 (UTC)
I agree. Pseudocode would be much better than the current two-language mess (but it would be even worse if we had Python, C and pseudocode). LittleDantalk 04:39, 15 April 2007 (UTC)
I disagree. Using a specific language like C or C++ makes it clear how the memory is being modified. If you use pseudo code it might not be clear the operations, because generally much of the traversal code deals with double pointers (or references to pointers). You want to be able to modify not only the children of the node, but also may want to NULL out the node itself (after freeing it), so any other nodes that point to the node will point to a NULL reference. That might not be clear if using psuedo code. —Preceding unsigned comment added by 68.207.121.5 (talk) 15:18, 10 November 2007 (UTC)

The C++ code shown in the example is basically a C code with stripped structs, can we add the structs and call it C? Using pointers and not using templates makes it an eyesore for a C++ programmer. — Preceding unsigned comment added by 77.92.12.214 (talk) 07:15, 1 December 2012 (UTC)

Deletion[edit]

"Once we find either the in-order successor or predecessor, swap it with N, and then delete it. Since both the successor and the predecessor must have fewer than two children, either one can be deleted using the previous two cases. In a good implementation, it is generally recommended[citation needed] to avoid consistently using one of these nodes, because this can unbalance the tree."

I disagree with this. Depending on which side of a node the equal values go to (left or right), you can only base this deletion schema in predecessor OR sucessor (not arbitrary). There are cases where taking the wrong choice will make equal values appear to the side they shouldn't be in. 13:11, 5 June 2007

I think the point is mute, as well. If you were truely concerned with balanced trees, you would use red-black or AVL trees. Otherwise, it is pretty hard to ensure balanced trees unless you carefully insert the whole dataset in a specific order —Preceding unsigned comment added by 68.207.121.5 (talk) 15:21, 10 November 2007 (UTC)
I'm pretty sure you're right if you allow equal nodes in the tree, you have to be consistent in your insertion and deletion policy. If you disallow them, then it's pretty obvious that you'll get an unbalanced tree if you consistently choose to fiddle with the left or right subtree.
Aside: Am I the only one that thinks Wikipedia is getting a bit funny with the citations for the bloody obvious? It really does feel like http://bash.org/?757724 sometimes. (`Here is my impression of Wikipedia. "There are five fingers on the human hand [citation needed]"') 129.11.126.179 20:05, 12 June 2007 (UTC)

I'm just wondering whether the example code for Deletion is correct... couldn't a tree be built such that it would lose additional nodes given the example code (e.g. if your nodes are integers and you add them via the above code in the order 6 3 5 4 and then delete 5 would you not then lose Node 4 as well?

I created an iterative delete function for a binary search tree:

//by: Eric Nevala
void BSTree::makeEmpty()
{
	if(root == NULL)
		return;
	else
	{
 
		/*
		My thinking: Recursive deletes are ugly. It requires a helper function and consumes 
		stack space for big trees. Here is my iterative approach to deleting every node in a 
		binary search tree:
		I need a list of every node in the tree. Since I don't know the size of the tree to
		begin with, I will need a dynamic sizing datastructure. Linked lists of some sort? I use
		the STL deque. The list will need to be a list of root nodes which need to be deleted.
		before each root node is deleted, any child nodes it has will be added to the list. Then,
		the root node is deleted from memory and then dequeued from the list. The next node in the
		queue is processed until there are no nodes left in the list. Thus, the tree is deleted.
		*/
 
		deque<Node*> DeleteList;
		deque<Node*>::iterator it;
 
		//we KNOW that we have a root node, so we push that onto the list. 
		//It should be the only item in the list.
		DeleteList.push_back(root);
 
		//set the iterator to the beginning of the list
		it = DeleteList.begin();
 
		//as long as the iterator doesn't reach the end of the list, there are things to delete
		while(it != DeleteList.end())
		{
			//set a current pointer to the front of the list
			Node* Current = DeleteList.front();
 
			//if there is a left node, add it to the back of the list.
			if(Current->left != NULL)
				DeleteList.push_back(Current->left);
 
			//if there is a right node, add it to the back of the list.
			if(Current->right != NULL)
				DeleteList.push_back(Current->right);
 
			//delete the current node data and the node
			delete Current->data;
			delete Current;
 
			//remove the current node from the front of the list
			DeleteList.pop_front();
 
			//reset the iterator to the front of the list
			it = DeleteList.begin();
		}
 
		root = NULL;
	}
}

omega n squared worst case for insertion?[edit]

The current text says:

In either version, this operation requires time proportional to the height of the tree in the worst case, which is O(log n) time in the average case over all trees, but Ω(n2) time in the worst case.

Isn't Omega(n) the worst case?

-- nyenyec  23:15, 10 November 2007 (UTC)

Yes, assuming n is the number of items already in the tree. It's Ω(n2) in the worst case to insert n items, and that's probably where the mix-up came from. Dcoetzee 23:24, 10 November 2007 (UTC)

Clarity Rewrite[edit]

I'm a student at Cal Poly State University San Luis Obispo, and as part of a class on teaching technical subjects we were asked to find an article that was unclear or otherwise inaccessible for the wider audience and clarify it, with an emphasis on improving understanding from a teaching point of view.

Most of the changes I am making are directed towards improving the _clarity_ of the subject matter, and NOT an attempt to make them more factually accurate.

One large change I made is to remove the section on BST's being sets/multisets from the introduction. I'm sure most people reading this article aren't interested in graph/set theory, so that information is more pertinent in other articles. Feel free to reintegrate that information if you'd like, but in my opinion it doesn't belong in this article, or if it does it belongs in it's own section.

My wiki markup skills are not very good, so feel free to correct any incorrect wiki-links I may inadvertently create. I've done my best to make them at least link somewhere correct, but I can't get the hang of linking to sections in other articles. Mgius (talk) 23:14, 26 February 2008 (UTC)

Improved Image[edit]

 the left subtree of a node contains only values less than the node's value;

So the image can contain one or more duplicated values, to show where they go. —Preceding unsigned comment added by 79.36.54.169 (talk) 18:02, 12 June 2008 (UTC)

The search tree may become:

 def in_binary_tree(node, key):
     if node is None:
         return False # key not found
     if key < node.key:
         return in_binary_tree(node.left, key)
     elif key > node.key:
         return in_binary_tree(node.right, key)
     else: # key is equal to node key
         return True # found key

This allows to look for None items too

External link[edit]

I have done several articles on BST, which can be located at Binary Search Tree Tutorial. Java and C++ implementations. I am asking senior wikipedia editors to approve this link and add it to the current article. Thanks in advance. —Preceding unsigned comment added by 92.39.232.52 (talk) 23:12, 19 January 2009 (UTC)

Iterative search algorithm[edit]

I added an iterative algorithm to search a BST in C++. I also made the intro clearer. I am pretty sure the code is correct, but any tips would be helpful. GRHooked (talk) 05:15, 12 July 2009 (UTC)

this recent edit[edit]

Regarding the commit comment, there is no way for val to be neither equal to, less than nor greater than next->value, well, technically, there is. What if either value is NaN? -- RoySmith (talk) 18:45, 1 August 2010 (UTC)

Observations[edit]

Hi, What do you think of adding an observations section with regard to the data structure? it's like deductions based on its state. for example, to know the expected range of numbers below you, if you're a left child, go up to get the end, and then left to get the beginning of the range.

more observations, every node's children are already sorted, so if you delete the node's parent, you can add that node to the tree's root. just one insertion. this would be simpler for people to understand, though not most effective. Another observation is how to get the next or previous. for example, go to the right child and then choose left repeatedly to get the next item. if you don't have a child, then it depends if you're a left or right child. if you're a left, then your parent is the next. if you're a right, you need to go up until you see the first node that is a left-child, and then get its parent. etc... What about modified structures where the root contains the 'count', or has 'max' and 'min' fields, so it can have fewer operations in such cases? (for example, directly adding to the max reference, instead of recursing the tree)

So should I edit away? or how things really work here? sorry, still a beginner. I don't have much background in CS, but I wrote a binary-tree vocabulary once.

worst-case theta(n^2) for sorting[edit]

I changed this from theta(n^2) to O(n^2).. it does not make sense for something to have worst case complexity defined by theta notation. Theta notation bounds from above and below by a constant factor so saying that it is theta(n^2) implies that it is at least n^2 time (e.g. big-omega). —Preceding unsigned comment added by 128.12.16.152 (talk) 07:29, 7 December 2010 (UTC)

There is no parent pointer (there shouldn't be)[edit]

The Python code assumes a parent pointer. The definition of a BST does not include a parent pointer. --Ysangkok (talk) 16:18, 31 March 2013 (UTC)

No Duplicates?[edit]

It seems that there is disagreement in the definition of a binary search tree. The article currently says that there must be no duplicates (and there are sources online that back it up), but many standard implementations of BSTs support duplicates, and Nick Parlante, at Stanford, says that they can contain duplicates (http://cslibrary.stanford.edu/110/BinaryTrees.html). Does anyone have an authoritative source (eg, CLRS or something by Knuth) on the subject? Meviin (talk) 21:15, 27 November 2013 (UTC)

There's also this - in direct contradiction - "The BST property--every node on the right subtree has to be larger than the current node and every node on the left subtree has to be smaller than (or equal to) the current node"
It's not even mentioned in many trees-related articles I've seen here. Either duplicates are permitted or they're not, but it's often not stated. If there's some disagreement about this, at least note it, rather than hoping no on will notice the omission. Also, if you are going to assume yes or no, make sure that all the rest of the article complies.

--Akwillis1 (talk) 22:58, 17 August 2014 (UTC) A value is greater, lesser, or equal to the node, it can't be a combination, so effectively there can't be duplicates. If these rules are broken then the invariant associated with the bst breaks. You can effectively think of a bst as a concrete implementation of the abstract data type Set.

isBST method code[edit]

The code checks the validity of BST is, in my opinion, wrong. At least, clarification should be added. First, I think rightData & leftData are always MinValue & MaxValue (respectively), instead of changing and checking all the nodes across the tree. Second, it seems like these fields should be vice versa (i.e, the right should be left and vice versa). 06:41, 9 July 2014 (UTC)

I was thinking the same thing, and am glad to see I'm not the only one. I'll look into it more. Huttarl (talk) 21:05, 19 July 2014 (UTC)