Talk:Red–black tree

From Wikipedia, the free encyclopedia
Jump to: navigation, search
          This article is of interest to the following WikiProjects:
WikiProject Computing (Rated C-class, Low-importance)
WikiProject icon This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
C-Class article C  This article has been rated as C-Class on the project's quality scale.
 Low  This article has been rated as Low-importance on the project's importance scale.
 
WikiProject Computer science (Rated C-class, High-importance)
WikiProject icon This article is within the scope of WikiProject Computer science, a collaborative effort to improve the coverage of Computer science related articles on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
C-Class article C  This article has been rated as C-Class on the project's quality scale.
 High  This article has been rated as High-importance on the project's importance scale.
 

Properties[edit]

Property 1 is trivial. Property 2 prevents the children of the root node from being red-black trees, since their roots may need to be recolored black. This seems like an undesired property in a theoretical context. Property 3 is useful, but only because it allows us to say "black" instead of "black or NIL" or "not red." Properties 4 and 5 are the important invariants.

Given this, why not renumber the properties? 99.11.197.75 (talk) 00:42, 5 May 2011 (UTC)

NIL nodes yet again[edit]

The current description of "NIL" nodes is confusing. First, I would suggest that it is more precise to call a "NIL node" "the empty tree". Then one could say that NIL is a symbol for the empty tree. Second, the use of NIL nodes is not peculiar to red-black trees. In fact, many tree implementations use the empty tree such that every non-empty tree has two children, and one doesn't need separate cases for nodes with two children, one left child, one right child, or no children. However, it should be clear that this is an implementation detail, not a requirement. Calling these NIL nodes leaves is unnecessarily confusing, as a "leaf node" could also refer to a node with both children empty.

The only thing about NIL nodes that is special for red-black trees is that it is useful to consider them to be black. 99.11.197.75 (talk) 00:04, 5 May 2011 (UTC)

Insert description more complicated than necessary?[edit]

The following is from [1]:

                        B(z)
                        /  \
                      R(x)  d
                      /  \
                     a   R(y)
                         /  \
                        b    c
  
                         ||
                         \/

      B(z)              R(y)                 B(x)
      /  \              /  \                 /  \
    R(y)  d    =>     B(x) B(z)    <=       a   R(y)
    /  \              / \  / \                  /  \
  R(x)  c             a b  c d                 b   R(z)
  /  \                                             /  \
 a    b                                           c    d

                         /\
                         ||

                       B(x)
                       /  \
                      a   R(z)
                          /  \
                        R(y)  d
                        /  \
                       b    c

These changes are followed by recursion if R(y)'s new parent is also red. The only other step is to recolor R(y) Black if it is the new root. 128.146.32.245 (talk) 00:08, 29 April 2011 (UTC)

Mistake in the formal proof?[edit]

I think the following sentence from the formal proof is wrong (problematic part highlighted):

As such it has two children both of which have a black-height of either bh(v') or bh(v')-1 (depending on whether v' is red or black).

Given the definition of the black-height bh (relevant part highlighted):

bh(v) = the number of black nodes (not counting v if it is black) from v to any leaf in the subtree (called the black-height).

I think the black-height of the children is bh(v') or bh(v') - 1 depending on whether these children are red or black, and not depending on v' color.

Also, the sentence may be interpreted as if both children have the same color, which is not necessarily the case (the children may have different colors).

Can someone confirm? These details do not invalidate the proof, but they make it a bit harder to follow. Olivier Teuliere (talk) 10:16, 21 March 2011 (UTC)

I made that edit, and you are correct. I'm sorry about that, I've reverted it. 193.77.101.149 (talk) 09:44, 23 July 2011 (UTC)

Deletion error?[edit]

It cannot be correct to say that if N is the new root then the tree is balanced. In a three element tree where all elements are black deleting the root node produces a tree where the black height is different on either left or right unless the child element is coloured red in an additional step. Or have I missed something here? 79.69.42.27 (talk) 21:23, 17 July 2010 (UTC)

The various deletion cases all refer to a node with at most one (non-leaf) child. The case of nodes with 2 children (like in your example) is dealt with in the first paragraph of the "Removal" section. Olivier Teuliere (talk) 09:56, 21 March 2011 (UTC)

rotate missing[edit]

for completeness, adn to be able to see what is going on at one point, there should be implementations of the rotation functions available. —Preceding unsigned comment added by 213.61.9.75 (talk) 09:20, 9 June 2010 (UTC)

tree[edit]

I would like to say thank you for this great article. Also, it might be good to link to http://webpages.ull.es/users/jriera/Docencia/AVL/AVL%20tree%20applet.htm. It's a java applet that demonstrates a red-black tree, as well as AVL and splay trees. --Anonymous CS Student

I think the tree in the image is actually invalid, since the left-left-left path and the left-right-right path only passes through two black nodes. Κσυπ Cyp   01:46, 30 Nov 2003 (UTC)

Now right-right-right-... passes through 3 black nodes, the rest through 2. Swapping the colour of 15 and 17 should work. Nice tree, though... Κσυπ Cyp   01:41, 4 Dec 2003 (UTC)
Oops, I'm a complete idiot. I was trying not to copy an existing image, but I guess I didn't understand red-black trees as well as I thought. Sorry for the problems.
Derrick Coetzee 14:58, 4 Dec 2003 (UTC)
I fixed the image, but I'm wondering if it could be misleading now, since no node now has a red and a black child (not counting nil children). I don't want to give any misimpressions.
Derrick Coetzee 15:05, 4 Dec 2003 (UTC)
The tree seems correct now. To have a node with both a red and a black child, it would be possible to remove node 6, and invert the colour of node 1, 8 and 11. And then if the tree looks too small, maybe add a red child for 15. Nice tree, in either case. Κσυπ Cyp   16:58, 4 Dec 2003 (UTC)

tree does not seem correct. The shorted path is not all black nodes.

The tree image is not correct since nodes 6, 22, and 27 are red even though they are also leaves. Red-Black trees don't allow for red leaf nodes. —Preceding unsigned comment added by 99.5.101.121 (talk) 06:08, 30 May 2009 (UTC)

About that whole nil-leaves discussion: Why do you need to remember that there are nil leaves? What does this actually do to change the definition? The example tree without the nil leaves would not violate the definition, and I cannot think of any others that would. Tjdw 19:43, 30 May 2004 (UTC)

It would violate the requirement that all leaves are black. It may be a good idea to add this to the article. I think nil leaves aren't strictly necessary, but red-black trees are typically presented this way. Derrick Coetzee 22:18, 30 May 2004 (UTC)
More importantly, it would violate the rule that each direct path must have the same number of black nodes. If one disregards the leaves, a black root with a black left child would pass the rule (the only path is two black nodes). Include the leaves and one can more clearly see that there's actually 3 paths, one with 2 black nodes, and two with 3 black nodes (including the leaves).
The picture should be updated to include a black node that has a red and a black child, this would make it more clear why the leaves are important to include in the counting.
It is possible though to have a correctly balanced red black tree without taking leaves into account, by replacing the counting rule by three simpler rules:
1) Any black node, except for the root node, must always have a sibling (ie, it cannot be an only child).
2) A red node that has a sibling always has exactly two black children.
3) A red node that has no sibling always has no children. John Hendrikx (talk) 13:10, 4 March 2010 (UTC)

This article leaves me confused as to how the behaviour and performance of a red-black tree differs from other binary trees. Could someone who understands the subject please add a note explaining why one might choose this structure over e.g. AVL trees? Alternatively, might a new topic like "binary tree implementations: advantages and disadvantages" be helpful?

A comparison is a great idea; of course, we know their theoretical (asympototic worst-case) performances are all the same. I don't really know how they differ in practice. Maybe someone does? Derrick Coetzee 14:09, 27 Aug 2004 (UTC)

On another note, this page needs well-organized, detailed explanations of the operations and their performance. I'll do this eventually if no one else does. Derrick Coetzee 16:22, 27 Aug 2004 (UTC)

And sure enough, I did, complete with pretty diagrams. It'd be really helpful if someone could fact-check me, though — this stuff is pretty complicated. Derrick Coetzee 07:41, 18 Sep 2004 (UTC)

I'm not sure if the description of the insertion operation is complete. Consider the following tree(. is a placeholder, ignore it):
0(B)
.\
.45(R)

After inserting the value 207, the algorithm yields:
207(B)
./
45(B)
.\
..0(R)

Which isn't a valid Red-Black tree(it violates property 4). --Ryan Stone

You may find it easier to write your trees using s-expression syntax, like this:
(0 black (45 red nil nil) nil)
(0 black (45 red nil nil) (207 black nil nil))
This insertion does fall into Case 2, but this tree does not actually occur, because newly inserted nodes are always initially coloured red. The actual tree produced would look like this:
(0 black (45 red nil nil) (207 red nil nil))
This satisfies property 4. If you think the text is unclear about this, please edit. Perhaps a diagram for Case 2 would help. Derrick Coetzee 22:29, 20 Sep 2004 (UTC)
That tree isn't a binary search tree, is it? You have 0 at the root and 47 in the left subtree, which can't happen.
No, it isn't; just the example they gave. Good catch. I'm sure they meant to put a negative number, though, just as you meant to put 45. Derrick Coetzee 07:34, 21 Sep 2004 (UTC)
Ok, maybe I'm just applying the algorithm wrong, but here's what I get:
Step 1: Start with: (0 black nil (45 red nil nil))
Step 2: Insert value 207 into tree as in normal binary search tree: (0 black nil(45 red nil (207 red nil nil)))
Step 3: Parent is red, uncle is black; apply case 4(rotate around parent): (0 black nil(207 red (45 red nil nil) nil))
Step 4: Apply case 5(rotate around grandparent): (207 black (0 red nil (45 black nil nil)) nil)
--Ryan Stone

Ok, I think that I've found the algorithm that will handle the above case(and similar ones correctly):
Cases 1-3 are unchanged
Cases 4 and 5 only apply when the parent(P in the diagram) is a left child of the grandparent
Case 6: The parent(P) is red, the uncle(U) is black, the new node(N) is the left child of P and P is a right child of its parent(G). Perform a right rotation on P, and then apply case 7 to P.
Case 7: The parent(P) is red, the uncle(U) is black, the new node(N) is the right child of P and P is a right child of its parent(G). Perform a left rotation on G, paint G red and paint P black.
I've tried implementing this and it seems to work, but I'm hesitant to make any changes to the article until somebody who actual knows this stuff confirms that this should work.
--Ryan Stone

It's true that when P is the right child of its parent, we're in a different, symmetrical situation, where N must be the opposite child of its parent, and rotations are performed in the opposite direction. I admit this isn't obvious. One way of dealing with this is to add the steps you have described. Another way is to say that when P is on the right side, right and left are reversed throughout steps 4 and 5; we are effectively acting on the mirrored tree. I've added a note to this effect — do you think this is a good solution? Derrick Coetzee 15:56, 21 Sep 2004 (UTC)
A similar note was added for deletion, where left and right must be reversed if N is the right child instead of the left. Although I didn't put it in the article, reversing left and right throughout is always okay, because you're effectively acting on the mirrored red-black tree, which is a binary search tree when considered with the opposite order. Derrick Coetzee 16:03, 21 Sep 2004 (UTC)


I don't understand a couple of things about the deletion algorithm. The cases are supposed to handle the case deleting a black node with two black children, at least one of which is leaf. However, wouldn't property 4 be violated if a black node had one child who was a leaf and the other child was black but was not a leaf node?(in other words, wouldn't be simpler just to say the cases apply to a black node with two leaf nodes as children?)

My second question is about case 3 of the deletion algorithm. I can see that the algorithm will ensure that all paths through P will have one fewer black node. However, won't any path that does not pass through P be unaffected, and have one more black node than any path through P?
--Ryan Stone

Ok, I see the problem. If you look closely at case 2a of the San Diego State University: CS 660, after colouring the sibling red, you colour the parent double black and then perform the same procedure on it. I think that this also answers my first question, as well.
--Ryan Stone
I've changed the article to note that the parent node becomes double-black after case 3 is applied. Am I correct when I say that a double-black node has one fewer black node along any paths running through it? Also, is my wording clear enough?--Ryan Stone 19:03, 22 Sep 2004 (UTC)


Case 3 still mentions double-black nodes. I'm not sure how to reword that to avoid using the term double-black; anyone else have any ideas?--Ryan Stone 16:13, 24 Sep 2004 (UTC)

I've removed the reference to double-black nodes in case 3 of deletion. I've also removed the statement in case 1 that it applies when the root node was deleted and replaced. Because the rebalancing algorithm is recursive, it's possible to fall into case 1 even if the root node was not deleted.--Ryan Stone 21:24, 29 Sep 2004 (UTC)

I've slightly reworded deletion case 3 to make it clearer why property 4 is still violated. I've made a change to the operations paragraph. Case 3 of the insertion rebalancing procedure and case 3 of the deletion rebalancing procedure are the only two recursive cases. Both only involve colour changes; not rotations, so we can say that the insertion and deletion operations require O(log n) colour changes and no more than 2 rotations. Because colour changes are so quick(in practice, merely changing the value of a variable), it gives a better picture of the speed of the insertion and deletion algorithms IMO.

Now, the article states that insertion and deletion require O(log n) space. I assume that's because of the stack space needed when we hit a recursive case. However, aren't both algorithms tail-recursive?(ie wouldn't it be possible to convert them to iterative implementations?) The algorithms are complex enough that I'm not sure if I'd like to make that change, but I think that it would be quite possible to do so.--Ryan Stone 16:03, 30 Sep 2004 (UTC)

Ok, I understand now. Persistent implementations for Red-Black Trees require O(log n) space, but not the normal version.--Ryan Stone 14:50, 1 Oct 2004 (UTC)
Hey, thanks for your changes. I believe they're all correct. Sorry that the statement about the persistent implementations was unclear. I was confused why the deletion algorithm didn't seem to be recursive before, so that helps. Thanks again. Derrick Coetzee 22:03, 1 Oct 2004 (UTC)

I don't understand the purpose of property #2, "All leaves are black", if you're going to use "nil" nodes for all leaves, and then say you don't even need to draw them. If you just got rid of property #2, and "nil" nodes, would it not work just as well?

(Yes, the nil nodes confused me, too. You started out by saying nodes hold data, and leaf nodes are black, and then show a diagram where the leaves appear to be red -- and then 3 paragraphs later say that the boxes that are shaped differently from the other nodes and don't hold any data are actually the leaves.)

The terminology section does conflict with the use of nil leaves. Leaf nodes do not store data in this presentation. It is possible to eliminate nil leaves by removing rule 2 and changing rule 3 to "red nodes only have black children". Unfortunately, this introduces a large number of special cases in the later proofs, since we have to deal with each internal node having 1 or 2 children. By introducing nil leaves and colouring them black, we can now say there is a black child where before we'd have to say "black or no child". I would like to admit the simplest presentation possible of the main invariants, but not if it sacrifices the overall presentation too much. Thanks for your feedback though; I realise nil leaves are often confusing and will think about how to ameliorate this problem. Deco 12:15, 28 Dec 2004 (UTC)

Leaf nodes again[edit]

Note that the use of the leaf-nodes in the articles conflict with the link in the terminology section.--Loading 12:11, 11 May 2005 (UTC)

Perhaps these "Nil" nodes code be done away with altogether? I think it'd be easier just to say "black or missing" (or "non-red") than to introduce a funny construct to make certain special cases disappear. It's also not that hard in actual code to do without them.--ghakko

Well, it's easy to say "black or missing" but it's not so easy to draw black or missing in the diagrams. I copied the nil leaf presentation from an online source, but it could be presented either way, it would just be a lot of work to change over and I don't see a particularly compelling reason not to do it this way. Deco 19:42, 14 May 2005 (UTC)
A very important information is missing: Red Black Trees are isomorphics to 2-3-4 Trees (B-Trees of order 4) (http://pedia.nodeworks.com/2/2-/2-3/2-3-4_trees/). 22:15, 28 May 2005 (UTC)

I think, we should add an explanation to the Delete case about the possibility that N (the deleted node's child) may be a "Nil" node. In order that the tree remains well defined, N must remain a leaf node even after all transformations. You can verify (for example from the images) that it will, but I think it is not obvious. Should I add the explanation? Another thing is that maybe the text would be simplified if we used consistently "P" instead of "N's parent" throughout the text. What do you think? Drak 22:58, 6 January 2006 (UTC)

Go for it. I'll review your changes. Thanks for looking over this. Deco 00:39, 7 January 2006 (UTC)

Deleting a red node[edit]

In the early paragraphs of the Deletion section, you state "If we are deleting a red node, we can simply replace it with its black child, if any." This is indeed the case if there is at most one child. If this red node has two children, however, the algorithm becomes a lot more complicated, unless you simply replace the red node by one of the children and reinsert every element of the other child's subtree. Schellhammer

You can assume that the node has at most one child. The reason for this is explained in the previous paragraph. I'm sorry that it wasn't clear. The idea is that you delete the node exactly as you would delete a node in a BST. To do this, you must find a node that immediately precedes or follows the node to be deleted in inorder traversal order. We need to copy its value into the node to be deleted, then delete the node it was copied from. Such a node will always have zero or one children, which simplifies the remaining cases. Deco 23:52, 28 July 2005 (UTC)
Okay, got you now. Thanks!
Something else. I was just wondering: in deleting a node with two children, is there's a simple way to determine if replacing the value by the largest preceeding or the smallest following value is the faster alternative? Schellhammer
The simplest way I can think of is to add two values to each node indicating the number of steps needed to reach the smallest and largest node in that subtree. These are easy to maintain in log time and allow you to choose optimally. This requires quite a bit of space and time to maintain though; the most effective strategy is probably just to choose one at random. Deco 02:10, 3 August 2005 (UTC)

I think I encountered a problem in the deletion algorithm yet again connected with the leaf-nodes. You state: "If the node has no non-leaf children, we simply remove it." However, if the node is black and has only leaf children (which are also black), we reduce the black height in this branch if we simply remove the node. Consider the tree built by inserting the integers 1 to 4 and then delete the integer 1. Here's a nice applet which can be used to see the restructuring following the deletion. (It is German, but if you consider the translations Anfang=start Zurück=back Einfügen=insert Weiter=proceed and Löschen=delete (mark node to delete after hitting the button) it should be easy to handle.) Schellhammer

Thanks for catching this. I'm afraid I've forgotten far too much about red-black trees. I believe we proceed through all the cases whether or not the child is a leaf node or non-leaf node, and they all work out if the child is simply a leaf node since they don't assume that N has children. Deco 02:10, 3 August 2005 (UTC)
For what it's worth, to avoid further correctness issues I'm going to try to come up with some ML code that precisely mirrors the discussion. I should be able to test it and inject it into the article then to provide some concreteness. Deco 02:52, 3 August 2005 (UTC)
I've just finished implementing the red-black tree as Visual-Basic classes (don't ask why!) and I took the deletion algorithm from this wiki-page. Empiric evidence (i.e. deleting any one of 1000 nodes in the tree) shows that the algorithm now is correct. Schellhammer 13:16, 3 August 2005 (GMT+1)
Thanks for your feedback, that's reassuring. I've got some C code for insertion and as soon as I figure out how the article formatting will work, I'll add a bit of code to each case. Then I'll do the same for deletion. This should help clarify things further. Out of curiosity, does your code use parent pointers or do you get around this somehow? The description in this article seems to almost require it. Deco 19:19, 3 August 2005 (UTC)
Yes, my code uses parent pointers as well. Actually, I don't know how you could do without, seeing that Insertion Case 3 has a recursive call upwards. By the way, the note in the diagram for this case should read "May violate property 2". Schellhammer 11:13, 4 August 2005 (UTC)
It's possible to do without parent pointers by keeping a linked list during downward traversal; push a pointer to the current node onto the list just before descending into a child. The list elements can allocated on the stack. My C implementation was like this because I was being silly and trying to save space.
The only other way I can think of handling Insertion Case 3 without parent pointers by having all the insertion functions return boolean values; this would let the caller check if the rebalancing has propagated upwards and continue it if necessary. Deletion is like this, but much messier (because of the need to swap two elements just before starting). I had to do it this way in Haskell (which has no pointers), and it was very painful.
Ghakko 14:30, 4 August 2005 (UTC)

The code[edit]

I've now added some C code to this page. All this code has been compiled and thoroughly tested, not just against crashes but that the red-black properties are preserved after every single insert/delete operation. I tried to keep the code as simple and readable as possible, and in line with what the text and diagrams describe to the letter. I hope it adds some concreteness to the article. I chose C because it's good for this presentation, since it deals well with trees with parent pointers (cyclic data structures are kinda annoying in functional languages) and is familiar to most programmers. Deco 07:32, 4 August 2005 (UTC)

Nice job! That'll help a lot in future implementations. However, in the Deletion part, you state "the insertion code works with either representation", i.e. designing leaf-nodes as node objects or NULL pointers. I think you rather meant the insertion algorithm, since you use constructions like child->color in the delete_one_child function and child may be a leaf if the original n (the one we deleted) had no non-leaf children. Similarly, calling any of the delete_caseX functions with a NULL pointer as parameter will cause problems. The code probably still works if you'd add the (admittedly irritating) if-statements everywhere to catch this case. Schellhammer 11:32, 4 August 2005 (UTC)
Sorry, I meant the code in the insertion section. There was only one place in which a leaf could be encountered during insertion, but deletion turned out to be much trickier, because it may be that the parameter n is a leaf, which prevents me from accessing any of the surrounding nodes if it's represented with NULL. I would have to additionally pass in at least its parent, which requires updating the parent as we perform rotations, and it's just a mess. Deco 17:30, 4 August 2005 (UTC)

I find it confusing that the insertion code is using NULL-pointer as leaves, but the removal code suddenly implies that you use real nodes as leaves. Perhaps it should be either both null pointer, or both real nodes. 21:53, 12. March 2010 —Preceding unsigned comment added by 85.177.97.53 (talk) 20:53, 12 March 2010 (UTC)

Errors in algorithm[edit]

(copied from an e-mail sent to Deco by Pavel)

case 3. if G was a root, you violate rule 2.

Also, if its parent is red, it violates rule 4. This is mentioned in the text and the code, and used to be in the image, but I wanted to keep the image easy to update. I fixed the text to mention that it might violate rule 2 also, and to be clearer that we always recursively process G.

case 5. path from leaf 4 and 5 contains 2 black nodes, path from other leafs contains 1 black nodes, that violates rule 5.

The triangles are not meant to represent leaves but subtrees. Some subtrees have black circles at the top to indicate that their root must be black. The rotation does not alter the number of black nodes along any path (it is the same before and after), and so does not threaten rule 5. The reason rule 5 is not violated initially is that subtrees 4 and 5 are not as deep as subtrees 1, 2, and 3; in fact, the uncle may even be a leaf itself.
If there's any way I can clarify this further please ask. Thanks again. Deco 14:16, 18 August 2005 (UTC)

Top-Down Insertion[edit]

Can someone change this to the more effecient single pass top-down insertion method?

It sounds interesting. Can you give a reference? However, I am not completely sure if it should replace the algorithm explained in the article. One of the objectives of this article is to explain the well-established algorithm for red-black trees, and I think the current article does it very well (thank you, authors!). Of course a note about a more efficient algorithm would be helpful.
Here's a reference on the top-down insertion: [1]. The notes paraphrase Data Structures & Problem Solving Using Java by Mark Allen Weiss.
The top-down approach to insertion works by preventing insertion case 3. During the tree descent, any black node with two red children is recolored to red and its children recolored to black (like in case 3). This preserves property 5 but may violate property 4. When property 4 is violated, it is guaranteed that the uncle of the new red node is black, since the grandparent node would have been recolored during the descent if both of its children were black. (See the reference if this is confusing.) As a result, one or two rotations (like in case 5) can be applied to restore property 4. The search then continues.
Once the leaf is ready to be inserted, insertion case 3 is impossible. This prevents the need to recursively ascend back up the tree. This makes the entire operation top-down. Rotations may still be required to preserve properties 4 and 5 after inserting the leaf.
Top-down insertion seems to be less efficient than bottom-up insertion. More rotations are required since each insertion can require multiple sets of rotations. Rather than letting the leaf node "push up" on the tree to find the proper rotation point, the top-down approach rotates at every such rotation point along the path on the way down. I will try to make a diagram to explain this better.
I wrote a C program to test my hypothesis that the top-down insertion is less efficient, and it appears to be correct. A top-down insertion requires more rotations for the same data set. However the top-down algorithm does work and it may make sense to publish it anyway. Cintrom 22:39, 5 May 2006 (UTC)
Thank you, Cintrom! I see that even if the top-down algorithm is less efficient than the bottom-up one, it is sometimes useful for educational purposes because of its simplicity. In addition, it is probably good to keep in mind that there is also a top-down algorithm for red-black trees, because it may come in handy when you derive a new algorithm from red-black trees.

Error in Insertion Case 4?[edit]

According to the rules of a R-B tree, "both children of a red node are black". In the Case 4 example, N and P are both red after insertion.

Case 4 does not complete the insertion, as you can see from the sample code - the node P is then processed using Case 5. The text says, "then, we deal with the former parent node P using Case 5". Deco 17:33, 17 October 2006 (UTC)

In case 4 the code that calls insert_case5 is indented as if there were a line with an else missing, but from the narrative, I think the code is right, but the indention is misleading. Also, is it too obvious to say in the last sentence that property 4 is still violated? I found the parenthesized comment about the sub-tree labelled "1" confusing, so I tried to touch it up. The other two things I left alone. Bill Smith 23:35, 24 November 2006 (UTC)

I implemented the insertion algorithm in OCaml and verified that the indention was wrong, so I corrected it.Bill Smith 17:11, 26 November 2006 (UTC)
The misleading indention was probably a result of tabs. There is no else. Your clarifications are welcome. Deco 14:42, 27 November 2006 (UTC)

The code[edit]

Is the C code correct? For example:

node grandparent(node n) {
    return n->parent->parent;
}

I don't think that's correct: shouldn't dots (.) instead of arrows (->) for accessing members in a struct that isn't referenced by a pointer? And isn't the parent member a pointer to the parent struct? If it is, then the function definition should be:

node* grandparent(node n)

Tell me if I'm wrong.

--wj32 talk | contribs 07:22, 8 November 2006 (UTC)

You are correct that the arrow operator is used with pointers. I just assumed that somewhere in the code that isn't shown there was "typedef struct rbtreenode * node;". This explains the lack of the struct keyword, as well. Since the article is about the data structure rather than the C language, I don't think that noting C language syntax is strictly necessary. If we were going to present a play-by-play on how to implement a red/black tree in C, we'd have to note that code is required to keep track of which node is the root node, maybe reccomend the definition of a structure to hold the root and a comparator function pointer, etc.
Perhaps the wording should be changed to something like "We'll demonstrate each case with example C-like code"? Conor H. 06:54, 23 November 2006 (UTC)
This is actual C code copied from a real working C program. That was how I verified its correctness. Yes, node is a pointer type. I suppose my stylistic conventions are a bit unusual. Deco 05:53, 28 November 2006 (UTC)

I think that the assertion that deletions are O(1) complexity must be incorrect. If the deleted node has two subtrees, then it must find the rightmost node of the left subtree or leftmost node of the right subtree. This find is admittedly quick since it is just following pointers, but nevertheless could be the height of the tree, so it is O(log(n)). Any comments? Caviare 01:08, 25 January 2007 (UTC)

Yes, it should be log n... just check other reference sources and that's what they'll say. enochlau (talk) 02:47, 25 January 2007 (UTC)

npov[edit]

A- This article isn't neutral

B- It uses "we" an unnecessary number of times.

You need to actually provide instances of NPOV. You can't slap a template up there and expect us to guess at what you're talking about. I'm removing the template until you do so. TheWarlock 01:37, 9 May 2007 (UTC)
Oh, and I forgot to mention: sign your posts. Put four tildes after your post so we know who you are. TheWarlock 01:38, 9 May 2007 (UTC)
Should it still be changed to use a non-we structure? j.engelh 08:20, 11 May 2007 (UTC)
It is preferable to have it use "we" than for it to be unintelligible but use the third-person out of grammatical tradition. If you can rephrase out the "we" and still have the article at least as understandable as it is now, go for it. TheWarlock 01:01, 12 May 2007 (UTC)
The use of "we" has nothing to do with NPOV. If you don't like the style feel free to fix it, just make it readable. Dcoetzee 20:06, 11 May 2007 (UTC)
Just out of curiosity, what does "neutral" even mean when you're talking about a computer data structure? Whether the math is correct? I suppose you could argue over whether it or some other structure is best used in <something> - but that would necessarily mean an example, which either means it's citing someone else saying something, or it's OR, which is bad on the grounds that it's OR. Now, maybe this article did have OR in it - the article itself reads like a chapter out of "Data Structures and Algorithms" by Weiss.. — Preceding unsigned comment added by Jimw338 (talkcontribs) 03:19, 27 March 2012 (UTC)

Mistake in proof of correctness in Step 3?[edit]

The article currently says

Note that this [call to insert_case1] is the only recursive call, and it occurs prior to any rotations, which proves that a constant number of rotations occur.

This is far from evident. I'm not saying it's false that a constant number of rotations occur, only that this can't properly be called proof. It's easy to imagine a recursive function that makes its recursive call before doing some chunk of work (like a rotation) and yields a linear number of chunks of work done. For example, consider

postorder_tree_print(node n)
{
    postorder_tree_print(n->left);
    postorder_tree_print(n->right);
    print_node(n);
}

The only recursive calls occur before print_node, but calling postorder_tree_print on the root of a tree results in a number of calls to print_node equal to the number of nodes in the tree, which is not a constant.

Like Heawood reading Kempe's four-color proof, I'm smart enough to see the mistake but I'm not smart enough (or at least not interested enough) to fix it.

Pqrstuv 23:13, 1 July 2007 (UTC)

The fact that it's the only recursive call, and that it returns immediately after calling it, is what makes this evident. Apologies for not being clearer - I'll attempt to reword it. Dcoetzee 00:31, 19 January 2008 (UTC)

Removed from article[edit]

Every simple path from a node to a descendant leaf contains the same number of black nodes, either counting or not counting the null black nodes. (Counting or not counting the null black nodes does not affect the structure as long as the choice is used consistently.). (Someone needs to fix this wording, since it is not true in its present form. It could be interpreted to mean that a path from node A to a descendant leaf would have the same number of black nodes as a path from node B to a descendent leaf. It must be made clear that the "from a node" is fixed but the "descendant leaf" is allowed to vary.)

The section from (Someone needs... has been removed because it's not part of the article, it's a comment on the article. Text moved here so that someone who knows about the subject can hopefully fix it. -- Roleplayer (talk) 11:49, 18 January 2008 (UTC)

Possible insert case 5 error[edit]

Insertion, case 5 says:

"In this case, a right rotation on the parent P is performed; the result is a tree where the former parent P is now the parent of both the new node N and the former grandparent G."

It should say

"...a right rotation on the GRANDparent G is performed;..."

Most implementations define the rotation functions RotateRight(Node* n) or RotateLeft(Node* n) such that n is the root of the subtree being rotated, rather than one of the children. The code to implement the line in question would then be:

RotateRight(g);

rather than

RotateRight(p);

This could be just a terminology issue, but it will confuse people when they read source code and see irreconcilable differences with what is listed here.

Iterative implementation[edit]

The current article contains the iterative version (the version without using recursion) of implemenation of the removal operation. However, I doubt its usefulness because of the following reasons. Before removing the code, I would like to know what others think.

  • I do not think a long code like this fits well in the Wikipedia. The aim of the article is not to build a ready-to-use library, but to explain clearly what a red-black tree is, including how it works and how it is used. The article does its job well without this code.
  • It contains a small bug. In the lines
    /* delete_case3(n): */
     
    //s = sibling(n);//not needed
    
    the assignment to s is needed, as is explained in the text of case 2. I hesitate to correct just this because I am not sure about the correctness of the other part.
  • IMHO this code is hard to understand because almost everything is written inside the single for(;;) loop. Cases 4–6 (and possibly also case 2) should be moved out of the loop for better readability.

In addition, I do not see any reasons to include the iterative implementation only for the removal operation and not for the insertion operation. If the current code ever helps understanding of how removal works, then I think the iterative implementation of insertion would also be helpful. --fcp2007 (talk) 14:31, 4 May 2008 (UTC)

I personally am using the iterative implementation in benchmarks vs treaps. I find it very useful. It's kind of funny... I only read this discussion because I wanted to post about the bug described above! The same bug exists in case 6. Smilindog2000 (talk) 14:48, 21 May 2008 (UTC)
I agree about removing the iterative implementation personally. Nobody should be using code from Wikipedia articles, they're not appropriately licensed - the code is only there to explain the concepts. Dcoetzee 02:07, 25 December 2008 (UTC)

C code vs. pseudo-code[edit]

I find that pseudo-code would be far more appropriate for this article. After all, the article is about a data structure and the theory behind implementing it. A reference implementation should be hosted elsewhere, not inside an encyclopedia. I would recommend replacing the C code with pseudo-code, however I didn't want to do this myself until certain everybody agrees. -- Jerome Baum —Preceding unsigned comment added by 77.176.73.67 (talk) 01:19, 15 December 2008 (UTC)

There used to be pseudocode. I replaced it with C code fragments originally because of a series of spurious "fixes" that made the cases incorrect; it was the only way I could verify with any certainty that the code was valid, since with runnable code these fixes could be tested. However, nowadays I've moved more towards using pseudocode and the code block templates to control spurious fixes. I think the article should also give more attention to the diversity of different methods for implementing the operations, and to various choices that can be made for the properties. Dcoetzee 01:50, 25 December 2008 (UTC)

citation for Rudolf Bayer[edit]

Paper title: Symmetric binary B-Trees: Data structure and maintenance algorithms there is a pdf available here for 34 bucks. http://www.springerlink.com/content/qh51m2014673513j/ —Preceding unsigned comment added by 140.177.205.91 (talk) 20:55, 14 August 2009 (UTC)

"Simple path"?[edit]

The fifth listed property under "Properties" gave me long pause. Red-black trees do not contain cycles. Qualifying "paths" with "simple" suggests otherwise, and merely confused me. Is there any reason "simple" should not be deleted? Strebe (talk) 20:42, 12 April 2010 (UTC)

I think the intended meaning is that the path does not repeat edges. For example, it doesn't go from the root down to the leaf, then back up, then back down to some other leaf (this is normally called a walk). In the context of binary trees I think this clarification is unnecessary. Dcoetzee 21:31, 14 April 2010 (UTC)

Is insertion case 4 mandatory? why?[edit]

I am very new to this subject, but with careful study, I am able to follow this article. However I found insertion case 4 to be quite confusing. It took a little too much pondering and a little guessing to get to where I think I understand it. A couple points:

1) Unlike all the other cases, this case doesn't appear to ever complete any insertions. This is kinda confusing in itself. So I suggest that it is explained that this step does not solve a particular case, but rather converts the case, so it's the same as case 5. (I would do this edit myself if I was sure I'm right about what's going on.)

2) The language makes this step sound optional. ie it says that the rotation "can be performed" not "must be performed" or "is performed". This coupled with my confusion as to why this is being done in the first place makes for a confused me. So I suggest that this step be made explicitly mandatory (assuming I'm right and it is.)

I hope the above feedback enables someone to make this article even more clear.

Thank you, - JasonWoof (talk) 02:12, 19 April 2010 (UTC)

Removal:case 3[edit]

>> Case 3: P, S, and S's children are black
We remove N and all N's childs are leafs, so if P ans S are black, S's children should be leafs too(becouse we must have the same number of black nodes on every path). On the picture we see them as nodes that have children... —Preceding unsigned comment added by 91.103.66.4 (talk) 16:32, 18 May 2010 (UTC)

Incomplete case 5[edit]

When inserting 3, 2, 1 in order you get the following tree:

 3[black] (child: 2[red] (left: 1[red]))

And then you run into case 5, which attempts to rotate using 1's grand parent (3) as pivot, which fails, because 3 has no parent. In this case, 2 should have been the pivot. So the wording instead of:

In this case, a right rotation on the parent of P is performed;

should be

In this case, a right rotation on the parent of P, G is performed, if G is not the root node. In that case, a rotation on P is performed.

And the code accordingly

 rotate_right( g->parent ? g : n->parent );
 rotate_left( g->parent ? g : n->parent );

Also it should be clarified that the "rotate_"-functions use the Pivot as the argument (which is the explained in Tree rotation)

--77.8.172.59 (talk) 02:59, 20 May 2010 (UTC)

You're right about the pivot being the argument, but your suggestion for a fix is wrong. I don't know why, but I've tried with and without your suggestion, and your suggestion causes problems. ;-) ArlenCuss (talk) 02:59, 20 April 2011 (UTC)

Insertion Case 3[edit]

According to the wiki:

Case 3: If both the parent P and the uncle U are red, then both nodes can be repainted black and the grandparent G becomes red (to maintain Property 5 (All paths from any given node to its leaf nodes contain the same number of black nodes)).

Shouldn't it be to maintain Property 4? As Property 5 isn't violated by the addition of a red node.

Edit: After proper analysis I think both Properties should be mentioned. Property 4 for the reason why P and U are turned black and Property 5 for the reason why G turns red. RevoltPT (talk) 15:04, 11 June 2010 (UTC)

C++ reference[edit]

The "often" in "In the C++ Standard Template Library, the containers std::set<Value> and std::map<Key,Value> are often based on red-black trees" does not make sense to me...134.58.42.46 (talk) 15:34, 22 December 2010 (UTC)

Most (possibly all) implementations of these STL containers use red-black trees, but any data structure that fulfills the behavioral requirements could be used. Hence, “often”. Strebe (talk) 19:04, 22 December 2010 (UTC)
I'll change it to "typically". (Prevents giving the impression that any red-black implementation might actually turn out to be based on a different data structure on Wednesdays and bank holidays.)  Card Zero  (talk) 08:26, 7 February 2011 (UTC)

Possible Error in "Removal" Section[edit]

We're referring to this paragraph:

"The complex case is when both M and C are black. (This can only occur when deleting a black node which has two leaf children, ..... (S cannot be a leaf because if N is black, which we presumed, then P's one subtree which includes N counts two black-height and thus P's other subtree which includes S must also count two black-height, which cannot be the case if S is a leaf node)."

Our comments: Since N, which used to be called C, is a leaf, P's one subtree which includes N can not count two black heights as stated in the original article.

Also note that this changes a lot of the cases in the Removal procedure. For example, any figure that shows N having subtrees hanging from it is incorrect.

In fact, in case M and C are both black, perhaps the way to do Removal is just to replace M with C, thereby messing up the black height. This is then to be followed by some form of black height rebalancing which we haven't thought out completely.

Vitit Kantabutra, Idaho State University, Pocatello, ID, U.S.A. Vkantabu (talk) 20:12, 3 November 2011 (UTC)

"All leaves are the same color of the root"[edit]

Hi,

In the section Red-black_tree#Properties, it's said in point 3 number that "All leaves are the same color of the root". However, on the red black tree example picture, there are 3 leaves which are red. What did I miss? :)

Regards, J 195.83.155.55 (talk) 03:31, 28 May 2012 (UTC)

Nodes are circles; leaves are boxes. Glrx (talk) 18:03, 30 May 2012 (UTC)

Request move[edit]

The following discussion is an archived discussion of a requested move. Please do not modify it. Subsequent comments should be made in a new section on the talk page. Editors desiring to contest the closing decision should consider a move review. No further edits should be made to this section.

The result of the move request was: not moved. Favonian (talk) 19:11, 9 August 2012 (UTC)


Red–black treeRed-black tree

Per WP:COMMONNAME. It is spelled with a hyphen in almost every single source in google books and google scholar.


Some will quote WP:DASH without providing any source that backs the dash spelling, but the Manual of Style warns that "The title of an article should be based on the Article titles policy.", and WP:COMMONNAME is part of that policy. --Enric Naval (talk) 19:41, 2 August 2012 (UTC)

  • Oppose. It is a tree of red and black nodes. The conjunction is "and", so WP:DASH#En dashes: other uses number 2 applies. See example of "red–green colorblind". COMMONNAME does not address punctuation style; whether hypen or dash, we are using the common name. See WP:MOS#Allowable typographical changes, styling of dashes and hypens bullet. Glrx (talk) 23:32, 2 August 2012 (UTC)
  • Oppose – sources that have a style of distinguishing the roles of hyphen and en dash as WP has (see MOS:DASH) do use the en dash for this one. Selected sources: [2], [3], [4], [5]. I never move articles without verifying the sources support my interpretation of the structure that the hyphen or en dash represents, in this case an alternation between parallel items, not a red-black color like a hot poker. I suspect that ErikHaugen also wouldn't move without checking. In fact, I see an article cited already with the en dash correctly in its title, from sciencedirect.com, a publisher that has a style that distinguishes. Enric, please stop running around doing controversial moves that are contrary to WP style. Styling is an an MOS issue, not a COMMONNAME issue; there's no controversy about the name or the spelling; it's just a matter of styling the puctuation per our MOS. Dicklyon (talk) 00:35, 3 August 2012 (UTC)
  • Oppose—I have an occasional looks at your contribs list, Enric. This time I was disappointed to see that there's some kind of anti-MoS punctuation/typography campaign aswing. Please do not change against the style guides, which have been carefully built through consensus, without a very good reason: I don't see one, and citing poor typographical practices elsewhere isn't going to cut the mustard. Tony (talk) 04:51, 3 August 2012 (UTC) PS, I::Enric, I'm not sure about these ones from Googlebooks: [6], [7], [8]. Don't they look like en dashes to you? And these from Scholar: [9], [10], [11]. Just checking that we're working with the same data. Tony (talk) 07:32, 3 August 2012 (UTC)
  • Oppose Per Dick and Tony. —Ruud 12:19, 3 August 2012 (UTC)
  • (replying to several people above) OK, another styling choice. But there are hundreds of sources from reputable publishers using a hyphen (in google books[12] and in google scholar[13]). You are hand-picking the few sources that use a dash and claiming that those use the "correct" styling. I have a problem when someone claims that the huge majority of the sources uses the "wrong" styling. --Enric Naval (talk) 15:51, 3 August 2012 (UTC)
The way to test whether sources disagree here is to find a source that uses the en dash to connect parallel terms, but nevertheless uses the hyphen in red-black tree. If you find more of those than sources using the en dash, then there's a case to be made that sources prefer the hyphen. But the "hundreds" that you refer to are just the typical majority of sources who have a style of representing the role of the en dash (or long hyphen as some guides call it) with an ordinary hyphen. These are not "wrong", just a different style that what we have in MOS:DASH; the hyphen would be "wrong" in our style, but not in theirs. Dicklyon (talk) 16:36, 3 August 2012 (UTC)
There have been numerous occasions where Tony has tried—sometimes succesfully, sometimes not—to force his arguably idiosyncratic views on proper use of spelling, grammar and typography upon us, against proper use of proper nouns, proper names, common names and common sense, generally wasting a lot of editor's time in the process for little to no gain. This isn't one of them. Even without discounting the poor typographical choices made in many sources, you have not demonstrated the use of a hyphen is the ubiquitously used common name of this data structure. As Dick pointed out red–black here indicates "an alternation between parallel items, not a red-black color", making a strong argument for choosing this particular typography from among the choices offered to use by the sources. Ergo, leave the article where it is and has been for a long time. —Ruud 16:51, 3 August 2012 (UTC)
Now I see why they call you Ruud. Dicklyon (talk) 16:54, 3 August 2012 (UTC)
Either because that was the name my parents gave me, or because you're mispronouncing it. —Ruud 16:57, 3 August 2012 (UTC)
Both, probably. Dicklyon (talk) 17:48, 3 August 2012 (UTC)
In all fairness, Knuth—a recognized expert on both typography and computer science—in The Art of Computer Programming, as well as the standard textbooks by Cormen et al. and Goodrich & Tamassia and the article by Guibas & Sedgewick introducing this term, all write "red-black tree" with a hyphen. —Ruud 18:15, 3 August 2012 (UTC)
In Knuth's style, the hyphen would be correct; he advocates, and is perfectly entitled to, a style in which the en dash is used for nothing but number ranges. If you want to make a point of someone choosing the hyphen when their style uses en dash to connect parallel items, you'll have to look harder. Dicklyon (talk) 21:22, 4 August 2012 (UTC)
That was not my point, or I wouldn't still be opposing this move. Just an observation that the world at large doesn't seem to care as much about "correct" usage of hyphens and dashes, or doens't quite agree with us what a correct usage constitutes; I had expected at least one of those sources to use a hyphen. Out of interest, where did D.E.K. state the en dash should only be used for number ranges? —Ruud 01:01, 5 August 2012 (UTC)
On p.4 of the TeXbook, he gives only the one use for the en dash. He doesn't explicitly say there are no others, but that's his implication. Dicklyon (talk) 01:29, 5 August 2012 (UTC)
  • N.B.: WP:COMMONNAME does not apply to punctuation; punctuation decisions are left to our internal style standards. Powers T 23:31, 4 August 2012 (UTC)
The above discussion is preserved as an archive of a requested move. Please do not modify it. Subsequent comments should be made in a new section on this talk page or in a move review. No further edits should be made to this section.

Origin of name[edit]

Source: https://class.coursera.org/algo-003/lecture/76 - Coursera Algorithms : Design and Analysis course by Tim Roughgarden of Stanford

At about 5:08 in this lecture Tim explains the origin of the name as he had asked the question to Guibas. Indeed it acquired the name from the paper mentioned in the article. The reason was due to new limited colour printing technology of the journal which the authors were keen to use and have red and black pictures of the structure. The story has a twist in that due to a problem the technology wasn't available for use, but the name stuck.

Sedgewick explains the naming here in Algorithms, Part 1: https://class.coursera.org/algs4partI-002/lecture/50 at 31:55 — Preceding unsigned comment added by 82.34.15.102 (talk) 09:16, 10 March 2013 (UTC)

David — Preceding unsigned comment added by 86.18.202.252 (talk) 02:06, 5 March 2013 (UTC)

How does U in case 4 not have any leaves?[edit]

I know why: Because it would cause there to be 3 black nodes from the root node to the leaves for the leaves attached to U, as opposed to 2 black nodes elsewhere. The only thing is, How would the program know not to create nodes? Does it check all the paths or something? --Cornince (talk) 11:03, 27 May 2013 (UTC)


Usage in functional programming[edit]

The article says "Red–black trees are also particularly valuable in functional programming, where they are one of the most common persistent data structures" which is wrong. The most used data structure is AVL trees because they are simpler and support set operations easily. Currently in the functional programming standard libraries - AVL are used by OCaml, FSharp - Red-black are used by SML-NJ - Size balanced trees are used by Haskell — Preceding unsigned comment added by 90.46.151.222 (talk) 01:26, 12 December 2013 (UTC)
Cite error: There are <ref> tags on this page, but the references will not show without a {{reflist}} template (see the help page).