Talk:AVL tree

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Computer science (Rated C-class, High-importance)
WikiProject icon This article is within the scope of WikiProject Computer science, a collaborative effort to improve the coverage of Computer science related articles on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
C-Class article C  This article has been rated as C-Class on the project's quality scale.
 High  This article has been rated as High-importance on the project's importance scale.
 
edit·history·watch·refresh Stock post message.svg To-do list for AVL tree:

Just to make it explicit, here are some things that need to be done on this page:

  • Some new diagrams showing the balance values for each node
  • A detailed description of the insertion and deletion operations, along with diagrams showing the relevent transformations
  • A comparison to other types of self-balancing binary search trees

Derrick Coetzee 22:34, 20 Sep 2004 (UTC)

Priority 5

Vacuous Page - No Content[edit]

Now that you have removed the 4 language pages, your presentation on AVL Trees is almost non-existent. Come on guys, it has been more than 50 years since the invention of AVL Trees - is that the best you can do?NNcNannara (talk) 03:21, 9 August 2016 (UTC)

By the way, the definition of Balance Factor is still incorrect (and it is practically all that is left).NNcNannara (talk) 03:22, 9 August 2016 (UTC)

What is it for ?[edit]

Nowhere in this article is a mention of what these trees are used for. Which problem do they solve ? — Preceding unsigned comment added by 2A01:E35:1382:320:845A:88D5:419:48D0 (talk) 13:05, 31 July 2015 (UTC)

Balance Factor?[edit]

balance factor = height of left subtree minus height of right subtree. read CLRS book for more info — Preceding unsigned comment added by 98.209.224.234 (talk) 23:44, 28 March 2013 (UTC)

Balance Factor![edit]

The balance factor is the height difference between two subtrees, which for a node are its right and left children.

The range in the article is confusing because it is switched. 1 means left heavy and -1 means right heavy. Most people would do it left to right like the normal number sequence (-1, 0, 1). It also lies about the range which is actually [1, 0, 1-] and exact number only are allowed. You can get it by height(node.children.left) - height(node.children.right). However this is expensive so the difference in height (balance factor) is updated as the tree is updated which is log n because a change in a node only affects parents.

Comment[edit]

The term balance factor is mentioned several times in this article but not really explained. I understand what an unbalanced tree is generally but I do not know what balance factor is exactly (or how to calculate it etc) and this article makes no attempt at introducing it (or linking to it somewhere else). I think it is reasonable to assume people learning about AVL's might also not know about balance factors and they (balance factors) should be clarified or linked early in the article. Just my 2c. — Preceding unsigned comment added by 24.87.108.195 (talk) 01:51, 25 July 2012 (UTC)

Question[edit]

Question from an amateur: are AVL trees generally considered worse (in terms of performance) than red-black or splay trees? Also, are they easier to implement? I suspect both to be the case. If so, someone should mention that in the article. User:MIT Trekkie 17:44, 2 Dec 2004 (UTC)

AVL Trees will give better search times than Red Black Trees. I believe the Splay Trees give better performance if you're consistently searching for a small subset of all the items in the tree. I'm not sure, but I think that AVL Trees give better search times if you do approximately the same number of searches for each item in the tree, and you do the searches in a random order. I couldn't tell you how the insertion and deletion operations will compare. --Ryan Stone 01:16, 12 Dec 2004 (UTC)
AVL Trees and also Red Black Trees both perform in O(lg n). However, for AVL trees it is < 1.44 lg n, whereas for Red Black Trees it is < 2 lg n, hence AVL trees are faster. However, AVL trees are much more complicated to implemenent. Gulliveig 04:47, 11 June 2006 (UTC)
I strongly disagree. I have yet to find an explanation of red-black trees which actually explains WHY they work - rather than merely listing the rules for insertion and deletion and going "there you are!". When I see red-black implementations, I typically see comments like "I don't really understand this, but it works!". AVL trees on the other hand I found I was able to *comprehend*, to understand *why* they work, easily. Once you understand why something works, implementation is basically simple, because you know what has to be done and why. Toby Douglass (talk) 10:42, 27 March 2009 (UTC)
The various operations need to be considered. AVL tree has quite excellent search, and sometimes poor insert characteristics. To make a statement so general isn't neutral in my opinion. For large database operations on disk, for example, especially for permanent records which are never deleted, AVL outperforms practically all other types of trees. So not only should the performance of various operations be compared, but also the type of application to which the structure will be used. Ste4k 18:25, 2 July 2006 (UTC)
So what would be an example of a time when you should use Red-Black trees then? -- Solberg 03:11, 26 August 2006 (UTC)Solberg
How can anyone claim that Red-Black trees are easier to implement? There's no obvious justification for this argument, and given that they both do the rotation thing (which is the only hard part) I'm having trouble imagining a justification. I'm betting Red-Black trees were merely described in detail in some textbook in whcih AVL trees were summarized. —The preceding unsigned comment was added by 67.169.10.74 (talkcontribs) .

Note[edit]

In the german wikipedia we use some very good pics to explain the balancing. see de:AVL-Baum. The are on the commons, so perhaps they are what youre searching for? -- de:Benutzer:MGla

Lazy deletion[edit]

Quoting from the article:

Deletion from an AVL tree may be carried out by rotating the node to be deleted down into a leaf node, and then pruning off that leaf node directly. Since at most log n nodes are rotated during the rotation into the leaf, and each AVL rotation takes constant time, the deletion process in total takes O(log n) time.
Practically, this is a large overhead and complex to program. Therefore, it's more common to implement a lazy deletion -- leave the deletion target in place, flag it as "deleted", and replace it with an inserted node if they would occupy the same spot.

I assume that the tree will not (usually) be balanced after a lazy deletion, so an implemention which uses lazy deletion is not strictly an AVL tree.--Malcohol 15:53, 28 June 2006 (UTC)

Actually, as long as you "clean out" the tree whenever it falls below a certain percentage full, such as 50%, I think you still get amortized O(log n) operations. In any case, whether or not this is "properly" an AVL tree, it's pretty clearly a very simple variation on one. Deco 11:54, 29 June 2006 (UTC)
If you eagerly delete any node that has at most one child, then at most one half of all nodes will be pending deletion.

While we're on the topic of deletion (operation description missing btw), the text quoted above mentions rotating a node downwards into a leaf. One may also rotate a node downwards until it has only one child, and delete it by replacing it by its only child. Jesselong 18:45, 30 October 2006 (UTC)

Insert Operation[edit]

Quoting from the article:

Insertion into an AVL tree may be carried out by inserting the given value into the tree as if it were an unbalanced binary search tree, and then retracing one's steps toward the root, rotating about any nodes which have become unbalanced during the insertion (see tree rotation). Since at most 1.5 times lg n nodes are retraced on the journey back to the root, and each AVL rotation takes constant time, the insertion process in total takes O(log n) time.

This is correct, but imprecise: After the insert, only one single or double rotation is required. Afterwards, the tree will be balanced, and no more rotations are necessary. However, finding the node that needs to be rotated about requires a recursive ascent, along which one may update the balance factors. See Knuth, Art of Comp. Prog., Vol. 3, Section 6.2.3, p. 454ff (for example, p. 454 "the new trees have height h + 2, which is exactly the height that was present before the insertion; hence the rest of the tree (if any) that was originally above node A always remains balanced", or p. 456f Algorithm A, which executes either A8 or A9 exactly once, or p. 461 Table 1, which gives probabilities when each case occurs etc). marcus

I'm not sure I understand what you're quoting. what does h represent?
Hmm, I don't believe that this is correct. Feel free to grab a sheet of paper or a white board, as I'm not going to be doing the ascii drawings here.
h refers to the height of the rotated subtree, starting at the root of the rootation or double-rotation. Please look up Knuth for more specific details. If you think this is incorrect, you may just as well provide some drawings: If you submit the to Knuth, you can cash in some real money (he rewards bug spotting with cash). marcus

Imagine inserts into a BST in the following order: 3 1 8 2 5 10 4 7 9 11 6. This is now still an AVL tree with no rotations necessary. Then, if a 6 is inserted, it is unbalanced. No single or double rotation at any level, will fix this imbalance. This structure can similarly be represented infinitely up the tree, requiring that each rotation fix the level above it, thus requiring a non-constant number of rotations. I can't prove that it is Log(n) rotations (like it could be log2n or something), but I'm sure it's more than one. In fact, I think that, the simplicity of what the article currently states might not be enough, because I think that actually getting this table into an "AVL" balanced state requires a much more complicated algorithm than just the analyzing the heights of the nodes on the way up the tree. McKay 19:27, 15 August 2006 (UTC)

I believe you mean inserting 6 after "3 1 8 2 5 10 4 7 9 11" (without the trailing 6). After that, the root has 3 with balance factor "+" (right subtree larger than left by 1), and the right subtree starts with 8 and is perfectly balanced (all balance factors 0). Now insert a 6 in the right subtree, then the balance factor of 8 becomes "+" while the tree is now unbalanced at the root (because of the 3 "+"). A single double rotation at the root makes the tree balanced, as it must. The new tree is [5: [3: [1: nil . 2] . 4] . [8: [7: 6. nil] [10: 9 . 11]]]. marcus
Another note: O(Log n) rebalances can be necessary after a delete operation. The worst case example are "Fibonacci-trees". Also, insertion of course remains O(log n), because the place for insertion needs to be looked up and the right parent node for a rotation or double rotation needs to be found, updating balance factors on the way up. marcus

unsourced?[edit]

This appears to be only a POV statement: "While AVL trees are theoretically quite sound, they are not commonly implemented due to their high implementation complexity to keep it balanced..." Ste4k 18:20, 2 July 2006 (UTC)

(also completely false) —The preceding unsigned comment was added by 67.169.10.74 (talkcontribs) .

I managed insertion and am 90% through deletion but that is subjective. What is certain is that the article here does not always help. It has a few things that are contrary to expectation and a lot of ambiguity. The deletion instructions almost make sense in principle until you hit line 4 which consists of waffle. It then isn't clear how to join that with the retrace steps below as a result. The prototype for insert has some confusing bits as well. The rotate node function and the image don't match up because it is taking the top node to rotate rather than the bottom.

Reason for revert.[edit]

I reverted quite a bit of work done by User:Toby Douglass. I've got reasons, so I'm writing them here:

  • first edit Not all AVL trees are balanced after every insert. If many operations are going to be performed, sometimes the tree doesn't always meet the requirements. It is possible to have a balance factor off by more than 2. Also, insertions aren't always this simple. Take for example the example I gave above, under "Insert Operation"
I could be wrong, but I think the normal case is to insert or delete and then rebalance; it's not that common for a tree to bear multiple inserts or deletes and then be rebalanced. For sure this happens, and obviously it helps optimise the tree since you don't need to do a slow rebalance so often, but it's a more advanced use and I would say doesn't rightfully belong in an initial explanation of the tree.
I would definitely disagree here. Which is more common? I'm not sure, but if there are several inserts or deletes to be performed it is more efficient to hold the balancing until it's done. I think the article should mention both.
I have no objection to both being present, but for sure, the simple, *pedagogical*, case where a single insert and rebalance should be present. It is this approach which is most useful to someone learning how AVL trees work, since it is the simplest AVL tree. If you wish to document the more sophisticated trees, please do so. I won't be doing that, since the tree I've done rebalances after each insert and so I'm not familar enough with the solution to write it up. Toby Douglass 21:17, 21 December 2006 (UTC)
  • second edit Deletions can't be done that way. What if the deleted node has both left and right children, and the inorder successor has both left and right children. Two children will be lost in the shuffle.
I could definetely be wrong here :-) I'm under the impression that it is impossible for the in-order successor to have both left and right children; if it were so, the in-order successor would actually be the left child of that supposed sucessor (assuming a tree with smaller nodes to the left side). Note also that this is the method used in the AVL applet linked to from the article.
Yes, you are mostly correct in this case. I was mistaken in this fact. The inorder successor (or predecessor) can't have both left and right children. But in my defense, the description you gave did forget to mention what should be done with the successor's children.
  • third edit typo from second edit.
  • fourth edit incorrect assertion, see first edit.
  • fifth edit dependent on changes from second edit.
  • sixth edit not proper usage of Big O. Constants like the 1.44 and the 2 are dropped. Lookups for both are big O identical. O(log n).
This notation is used in one of the links from the AVL page, which goes to a University study by a couple of professors comparing AVL with RB. Also, I think it is normal practise to write order-of to whatever degree of accuracy is appropriate for the point you're making - and here, although it's quite right to say both AVL and RB are log n, AVL is a *better* log n than RB and I assert that is the correct notation to represent that information.
Just because it's used elsewhere, doesn't mean it's right. Saying O(2) is wrong, because you should be using O(1). Smiliarly O(1.44 log n) is also wrong. If you want to talk about the order without using Big O, that might be appropriate for this article.

I'm not saying that the page can't be improved. Far from it. I think it's got a lot of necessary changes that need to be made, but I can't work on it right now. I think this will be my next project though. (this was really me) McKay 14:53, 22 September 2006 (UTC)

Let me know what you think of my rebuttals. Right now I accept your point about multiple inserts, and that should be explained in the text, and I'll point out the algorithm description presumes a single insert/delete; we need to discuss the in-order successor issue and see if delete can be done in that way or not; I disagree that the O() notion is improper. Toby Douglass 16:12, 22 September 2006 (UTC)
Well, I'm pretty sure that it is incorrect. Feel free to read up on Big_O_notation, and tell me where I'm wrong (my argument is the line that says "coefficients become irrelevant as well" and "Big O captures what remains"). One "additional" point of my argument is that you use the tightest Big O possible, and while something may take f(x) = Log2n, The big O is O(g(x)), g(x) = log(x). The definition of Big O says that as long as I can come up with constants to satisfy
for all x > a0, c0g(x) > f(x).
My Big O is satisfied. So in all cases, I can choose a0 = 3, c0 = 3. Now I can say that all are identical from a big O perspective. log x, 2 log x, 1.44 log x. Each can be said to be big O of both of the other two. Coefficients should be ignored in big O. McKay 15:57, 26 September 2006 (UTC)
You're both right. For the purposes of the discussion, it is important to note the constants hidden by the O-notation, but it is technically inappropriate to use O(c*n) for c a constant. Thus, unless exact runtimes are known (up to a scale), the best wording for comparison of algorithms asymptotically equivalent is e.g.: "Both are O(lg n) but the constants hidden by the notation are much smaller in algorithm X." --Quintopia 22:19, 10 November 2006 (UTC)
Very very close. The purpose of big oh, is for order of magnitude differences of the worst case scenario with respect to n. Because of this, constants are necessarily removed for the purpose of Big oh. "Constants hidden by the notation" is a wee bit misleading, because it's actually more precisely "constants hidden by the worst case scenario are much smaller in algorithm X" Big O was chosen to do a purpose, and it does it well (order of magnitude differences of the worst case scenario with respect to the size of the input set), but it need not be used in every case. Saying "constants hidden by the worst case scenario are much smaller in algorithm X" is almost worthless. A much more valuable approach would to probably do a big Theta, or an average case. The problem here is that average case for an AVL tree and for Red Black trees is non-trivial to calculate. So, the better way of comparing algorithms is using the easy Big-O, "Both are O(log n)" talking about the "constants hidden by the [Big-O] notation" is demeaning (for lack of a better word) to the notation, and doesn't really mean much. Who really cares if red black trees are actually 1.333 times worse in the worst case. These worst case scenarios aren't even likely, so the Big-O is sufficent. McKay 08:48, 11 November 2006 (UTC)
Some people care deeply! this is why Oracle uses AVL trees and not red-black trees. Also, if you come to choose a tree, why choose the less efficient tree when you could just as well choose one that is better?
Order-of is usually used in the way largely described here - you deal, as the name implies, with the order of magnitude of the operation. However, I see absolutely no harm and instead, benefit, in also using the notion to express smaller differences when the point of that particular use is simply to indicate that that information is available.
My view in this is that this deep concern to be exactly legal in the use of order-of notation is harmful. It's unimaginative and impedes explaining the concept in hand to the reader. Toby Douglass 21:17, 21 December 2006 (UTC)

Problem?[edit]

In the depicted unbalanced and balanced trees, the balancing of the leftmost 3-element subtree doesn't appear to be able to be done by tree rotations as they are defined on the tree rotations page. When the 9 is rotated out and the 14 in, the twelve will switch to the opposite side, maintaining the imbalance. When this imbalance is "fixed" by another rotation, it will simply be returned to its original state. Thus, this subtree can never be balanced by rotations. Clearly, it is a special case for implementation, and should be noted either on this page or the tree rotations page. I intend to do so if no one objects. --Quintopia 22:13, 10 November 2006 (UTC)

Well, I don't think the article necessarily needs to explain how how to convert an arbitrary binary search tree like that 3-element subtree you mentioned into an AVL tree. But I agree, it would be interesting to state an algorithm how to do it. A simple one might be to create an empty tree, then walk the original unbalanced tree in-order, and insert each node into the new tree, using the AVL-insert. That should take O(n*log n). --Allefant 14:24, 24 November 2006 (UTC)

Diagrams[edit]

There are some diagrams of AVL trees at Commons:Category:AVL-trees that may be helpful. Deco 06:43, 21 February 2007 (UTC)

Rewrite[edit]

the example section got needlessly complex with references to null structures... I did a rewrite of that section, feel free to adjust or rever to a previous version if you like. I also removed the images because rotations aren't covered in this article directly, but in the tree rotation article. McKay 22:32, 14 May 2007 (UTC)

Explanation[edit]

I'm angry. I created a new section, trying to write a *human-readable* explanation of AVL trees (e.g. one which you can read near enough from first principles, without having to already know so much computer science that you probably already know what AVL trees are anyway), did quite a bit of work and the section is a work-in-progress; I've not finished, and I'm currently creating a set of images to explain the flow of explanation.

And you - Mckaysalisbury - have taken out 4kb of what I've written without so much as a by-your-leave.

I'm going to reinstate the material you've taken out and if you don't like it, and you think it should be removed, you can damn well show enough respect to discuss it here first.

Toby Douglass 07:21, 15 May 2007 (UTC)

I did mention it here. One of the wikipedia policies is Be Bold. I was doing just that. I also mentioned that a revert or adjustment was allowed (encouraging you to be bold)
I read that. You wrote the "example section". The section is called the "explanation section". I thought that comment was about a different edit. Moreover, Be Bold doesn't mean you get to Be Rude. When people put time and effort in, wiping it out with a mass removal without prior discussion or mentioning it or even mentioning to them in their talk is *unnecessary*. There is nothing stopping you showing consideration; we're not in a race.
The "Explanation" section was rewritten by Toby Douglass. This is his version. If you like I'll provide a more critical commentary of why I removed the content.
  • It isn't written in the formal tone of an encyclopedia (consistent mentionings of "you", weird phrases like "the tricky bit", ...)
Quite so. Work in progress.
  • It fails to adhere to the WP:MOS (wikilinks were non-existent, as were any formatting, spacing strange)
Quite so. Work in progress.
  • I find it funny that the version I wrote is criticised as requiring so much knowledge of computer science, but the version I removed contained phrases like "NULL pointer" which is strictly a computer science term.
Quite so. Work in progress. Moreover, there are but one or two cases of such usage and such usage is very simple and common; anyone reading it will be in the computing field and will possess basic knowledge. I want to avoid more domain specific knowledge.
  • It contains misleading OR phrases about a "naive first try at doing this" mentioning element counts, which I claim are needlessly complicated for AVL presentation. AVL works off of the height of a subtree, not the element counts. Why needlessly add element counts?
Because it helps explain the AVL concept. AVL is not immediately obvious. I want to lead people into it, showing them that you can keep information to know about rebalancing, and then dealing with the most obvious idea (which most people seem to come across by themselves) of element counts. Understanding why element counts *DON'T* work is almost properly tantamount to understanding why and how AVL trees DO work.
  • The section seems to have poor writing style, in that it leaves the reader hanging
Quite so. Work in progress. It isn't finished. I've spent the last two days enhancing a bitmap API I've written so I can programmatically produce image files for the article. I want to demonstrate how the element count approach fails and then show the AVL tree behaviour.
  • On the other hand. I did like how toby did try to create a section about why AVL trees are better
So, I felt it was best to rewite the section in proper tone, adhering to MOS, remove CS terms, remove OR (though I could source much of this), simplify explanation, while keeping the intent of the author, who, wanted to make the section better. McKay 19:20, 15 May 2007 (UTC)
Second level comments from Toby Douglass 14:20, 16 May 2007 (UTC)
Yes, I understand what you were trying to accomplish, please don't get mad at me for improving a section that you admit to being a work in progress. I liked a few of the things you were trying to accomplish, so I went ahead and improved based on your ideas. I didn't like the specific direction you were taking the content, so I took it in a direction I thought was better.
I'm mad at you for taking about SO much material without a by-your-leave, not for what you were trying to achieve. Your goal does not justify or permit your method. Toby Douglass 17:34, 16 May 2007 (UTC)
I mentioned it on the talk page. You yourself said the material was a work in progress. I felt like the content you added was a blister on the page, and needed to be rectified as soon as possible. The first thing I did was start to clean it up, and then I realized that I objected to the content on the page. So I rewrote it. The content was factually inaccurate due to your OR. McKay 18:12, 16 May 2007 (UTC)
I'm simplifying the concept of NULLs and saying "doesn't have a child". It shows the concept without dealing about implementation specific issues. Threading and other related concepts might use the data and would make it not null but still not have children. Mentioning NULL is inappropriate.
I concur that NULL is inappropriate. Toby Douglass 17:34, 16 May 2007 (UTC)
Your opinion about element counts doesn't belong in wikipedia. Personally I think element counts *do* work, but it's more complicated than your model. Height balancing was chosen because it's simpler than element count, but still produces the requisite O(log(n)) insertions and lookups. In any case, mentioning element counts in terms of AVL is OR and doesn't belong here. AVL is about height balancing, that's what should be explained, I don't think this article should go off in a different direction than what AVL actually is. McKay 15:43, 16 May 2007 (UTC)
Eh, what? my opinion that they're a useful explanatory aid? that opinion doesn't belong in wikipedia? surely the whole of the wikipedia is people's opinions about what explains things? the purpose of an article is to explain what it is about. AVL is not immediately intuitive. There will be MANY ways in which people can be led to understand. In what way *CAN* my choice of explanation be "an opinion which doesn't belong in wikipedia"? Toby Douglass 17:34, 16 May 2007 (UTC)
What is OR is that:
  • AVL is less intuitive than element counts
  • "However, the problem with element counts though is that the number of elements on the left or right of an element doesn't tell you anything about how they are arranged." In fact the entire section where you're talking about number of elements is factually inaccurate. Element count is actually better than Height balancing, but it's harder to implement, and doesn't provide O(Log(n)) insertion times. The entire section from where I just quoted to "what you in fact end up with are linked lists on each side of each element." is factually incorrect, because an element balanced tree is better than a height balanced tree. It needed to be removed as quick as possible. It would have been wrong for me to wait for a "by your leave". WP:BOLD. McKay 18:12, 16 May 2007 (UTC)


Image[edit]

I removed the image again for a couple of reasons:

  • The image doesn't seem to fit anywhere. We don't have a section on rotations, we've deferred most of that concept to the Tree rotation page.
  • The image is inaccurate. I doesn't adequately show how to perform a rotation, specifically, it doesn't show what to do with the other children of the pivot. McKay 15:43, 16 May 2007 (UTC)
Maybe we should add a small section about rotations here? McKay 18:12, 16 May 2007 (UTC)

Worst case search order[edit]

Hi. Accessing the deepest node in the most unbalanced AVL leads to a search time loggolden_ratio n, not 1.44 or 1.5 lg n as specified in the current version. I don't have time to fix this but, whoever wants this article to be accurate should change this. Martlau 13:41, 23 May 2007 (UTC)

  1. Most importantly, can you attribute that?
  2. doing the math, because I'm curious:
    • logphi(n) = lg(n)/lg(phi) ~~ 1.44042009 * lg(n)
    • so your statement is probably correct, and properly represented in the article, Maybe it should be worded that way, but I personally think that use of lg n is best used in this article relating to binary trees McKay 15:08, 23 May 2007 (UTC)

I'm somehow puzzled by those "log"s here. what is lg(n) exactly ? log2(n) ? log10(n) ? ln(n) ? —Preceding unsigned comment added by 139.165.223.18 (talk) 08:08, 8 May 2008 (UTC)

Logarithm. log2(n) is logarithm base 2, log10 is base 10, ln(n) is natural logarithm, base e (mathematical constant) -- intgr [talk] 10:58, 9 May 2008 (UTC)
I'm puzzeled as well. lg(n) usually means the base 10 logarithm, the base 2 logarithm is ld or lb. The constant 1.44042009, however, is correct for the base 2 logarithm only, so I think the text should say: 1.44042009 * ld(n). 213.217.69.77 (talk) 10:37, 23 March 2009 (UTC)

Explanation[edit]

I've completely replaced the current explanation, which, IMO, totally failed to explain anything, and replaced it with something which I have written to be a human-readable explanation of how an AVL tree works, so that someone who doens't know how AVL trees work can read it and then know. Toby Douglass 12:39, 17 October 2007 (UTC)

I note that User:Mikkalai has completely removed this section, on the basis that this is not a "Data Structures for Dummies". I contest this; if the article does not effectively explain how an AVL tree works, it cannot claim to be an article about AVL trees! Toby Douglass 17:00, 26 October 2007 (UTC)
Wikipedia articles must be written basing on printed books and supplied with references.
Then you'd better delete the rest of the article, because there isn't a single citation in it. Everything that's written is someone's attempt to explain how an AVL tree works.
What you wrote was your interpretation how it works. Please keep in mind that the reason of the efficiency of AVL tree is described in the Self-balancing binary search tree, linked at the very beginning. Please explain what is unclear in the current description.
It utterly fails to explain how an AVL tree functions: you cannot read this article and then implement an AVL tree. By and large, the explanations in the page explain *to someone who already understands* - which is utterly, UTTERLY useless.
The text describes the operations and their computationaal complexity. If you want to clarify something, please follow the current structure of the article, and don't please forget to provide references where your new information came from. `'Míkka 17:09, 26 October 2007 (UTC)
P.S. I've just noticed that I am not the first person who disagrees with your way of writing (looking higher in this talk page). `'Míkka 17:22, 26 October 2007 (UTC)
Yes, you are not the only person who doesn't see the point of an article actually telling people how to implement an AVL tree. There seems to be a view that the article should almost *clinically* describe the AVL tree, as if that is the purpose and end of the page. What possible use it that? the only people who would find it useful would be those who already understood how the tree works. Everyone else would find it incredibly hard going to try and figure out the principle underlying the tree. Toby Douglass 23:22, 2 November 2007 (UTC)

I am all for simplifying the language of terse computer science articles, but I find the writing style of this section somewhat lacking. For example, it refers to unbalanced trees as "normal binary trees", while in fact they are quite abnormal in practice (nearly all binary tree implementations in real applications are some sort of balanced trees). Linked lists are described as "a single long line". Second, it uses the grammatical first person which is discouraged (WP:MOS); the paragraphs are too short, phrases like "in fact" are used too often. In general what I find a "non-encyclopedic" writing style. I also agree that it doesn't quite fit into the article's structure. -- intgr [talk] 23:02, 3 November 2007 (UTC)

Implementations[edit]

I reintegrated a selected few of Mikkalai's massive rm of links (as I was looking for one which I thought I had seen once). It is quite accepted with other data structures, that a maximum of one implementation per programming language is linked, as an example for interested students and programmers. However, I collected them under a new title Implementations rather than under the previous External Links. Is this acceptable also for you, Mikkalai? --Gulliveig 02:46, 3 November 2007 (UTC)

External link lists should be collected under the "external links" section. I realize that links to AVL tree source code are very relevant and useful, but I dislike long external link lists because they invite everyone to add their own favourite to the list, so they build up cruft over time and degrade into uselessness again; not to mention the problem with spammers. Coming up with a non-arbitrary criteria for external links is very hard. -- intgr [talk] 22:33, 3 November 2007 (UTC)

Comparision[edit]

To help the beginners understand, I think comparision is very important. For example, comparision between insertion and deletion, comparision with other self-balancing trees, etc. Visame (talk) 19:59, 7 June 2008 (UTC)

I think an explanation which people who do not understand AVL can read and then understand AVL is very useful. What we have now is utterly lacking. It does not fully describe the algorithm and it can anyway only be understood by people who *already* understand how AVL works. Toby Douglass (talk) 10:43, 27 March 2009 (UTC)

Contradiction[edit]

The balance factor of a node is the height of its left subtree minus the height of its right subtree.

However, in the rotations list we have: Right-Right case and Right-Left case: If the balance factor of P is +2, then the right subtree outweighs the left subtree of the given node Left-Left case and Left-Right case: If the balance factor of P is -2, then the left subtree outweighs the right subtree of the given node

These statements are contradiction. The left minus the right would yield a +2 if the left subtree outweighs the right one. Not the other way. Am I missing somethingMichael miceli (talk) 20:34, 15 October 2009 (UTC)

Diagrams for Insert and Delete[edit]

I must say that the diagrams for post-insertion rebalancing are beautiful! Whoever originally did them did a good job. They very clearly show what's going on, and I would suggest that if anyone has some time, to create some equally descriptive and nicely-coloured ones for the post-deletion rebalancing cases as well. Troglodyto (talk) 19:36, 25 February 2010 (UTC)

Deleting diagram[edit]

I think the picture in the "Deletion" is not correct because we're talking about nodes with one child, and it shows a node with two children. Am I wrong ? Guitow (talk) 20:33, 27 April 2010 (UTC)

The image is correct. Moberg (talk) 15:54, 23 November 2010 (UTC)
What the poorly captioned image is trying to say is as follows: (middle tree) If we want to delete a node with two children, such as "7". We can either (left tree) write a "6" where the "7" was and delete the "6"; or we can (right tree) write a "9" where the "7" was and delete the "9". In either case we're now trying to delete a node with only one child, which is a simpler problem. What's not in the diagram is that removing a node with a single child may-or-may-not involve rebalancing the tree. So, if we hit a case where removing the "6" requires rebalancing but removing the "9" doesn't, then we obviously choose to remove the "6". Clear? :p 132.165.76.2 (talk) 09:58, 5 January 2011 (UTC)

Deleting leaves an ambigous rotation situation[edit]

After deleting a node, I end up with a unbalanced tree that can be fixed with wither a right-right followed by right-left rotation or only a right-left rotation. Which one should I choose? To get the situation: Draw the tree by inserting {1, 3, 2, 4} (without rotating for balance). Moberg (talk) 15:54, 23 November 2010 (UTC)

Both actions will balance the tree (at least as far as in the differences in height being 1). I choose the single rotation as even against an optimised double rotation it used less operations, is simpler and is faster (easier to update balances). The impact on balance seems equally random. However I have not proven this so don't take me at my word. As far as I can tell in this case that rotating once will invert the balance of parent. The double rotate will preserve the balance of the parent node. — Preceding unsigned comment added by 2001:978:2300:30A:2FC6:5A8A:5E23:429 (talk) 16:34, 26 February 2016 (UTC)

Issues with this article[edit]

  • Conventionally, the balance factor is computed by subtracting the height of the right subtree from the left; I say conventionally since all the books and academic papers I read are that way...
  • The sentence: "If the balance factor of R is ≤ 0" is wrong and confusing, it should be: If the balance factor of R is < 0, same for the mirror case... (not when you inserting anyway...)
  • And this article doesn't really talk about how to adjust balance factors all the way to the top of the tree...

— Preceding unsigned comment added by Codingrecipes (talkcontribs) 00:43, 3 March 2011 (UTC)

A source for the "citation needed"[edit]

The third paragraph needs a citation for the sentence, "however, red-black trees are faster for insertion and removal." Both insertion and removal are logarithmic for finding the place for the operation, but the rebalancing is where they differ. One source that compares the number of operations for each type of tree is Comparison of Binary Search Trees (BSTs). Two sources that state the time for AVL trees are 8.3 AVL Trees and Balanced Binary Search Trees. One source that states the time for red-black trees after insertion is 9. Red-Black Trees. It might be easier to use a source that includes information on both trees in the same place. The first link I gave does this, and my blog post does this as well. It would be flattering to have that cited, but perhaps a source without the excessive bantering that gets right to the point would be ideal. I am unsure about the guidelines for sources on Wikipedia pages which is why I have listed these five. ComputerGhos (talk) 08:36, 9 May 2012 (UTC)

Most blogs are not regarded as reliable sources, as anyone can create a blog and write anything they like there. Looking at the three non-blog sources you mention, although they all contain some relevant material, as far as I can see none of them actually gives a comparison of speed of insertion and removal for AVL and red-black trees, which is what is needed. JamesBWatson (talk) 09:02, 9 May 2012 (UTC)

Inconsistent sign for balance factor[edit]

The article states that

balanceFactor = height(left-subtree) - height(right-subtree)

But then uses if (balance_factor(L) > 0) to test if subtree L is taller on the right than on the left, breaking consistency. I think we should change the definition of balance factor to:

balance-factor = height(right-subtree) - height(left-subtree)

This supports the intuition that -1 means "left-heavy" and +1 means "right-heavy".

Joeyadams (talk) 15:27, 18 July 2013 (UTC)

AVL Tree Bounds[edit]

Knuth defines the the bounds of an AVL tree as a function of the size of the tree where the size of the tree is defined as the number of interior nodes. In other instances the size of the tree is defined as the total number of nodes. The wikipedia entry on binary trees for example is inconsistent in this regard. On that page the text defines size as the number of internal nodes but the figure defines the size as the number of total nodes. — Preceding unsigned comment added by 173.79.116.101 (talk) 20:05, 8 February 2014 (UTC)

Are you kidding me?[edit]

Let us first assume the balance factor of a node L is 2 (as opposed to the other possible unbalanced value -2).

OK

This case is depicted in the left column of the illustration with L=5.

OK, I see the left column.

We then look at the left subtree (the larger one) with root P.

Yes sir, I am looking.

If this subtree does not lean to the right - i.e. P has balance factor 0 or 1 -

I see. Balance factor is left height - right height, hence it has to be positive. But what happens when the balance factor is > 1?

we can rotate the whole tree to the right to get a balanced tree.

WHAT THE F**K??? ARE YOU KIDDING ME?? What is a 'whole' tree? Is it like whole grain? Whatever happened to L? And P? What does it mean to rotate a tree? Who wrote this piece of shit?

This is labelled as the "Left Left Case"

Rotating a whole tree means rotating the top most node of the subtree you are looking at. The height of every node changes, hence you are rotating the whole tree. Technically in the alternating case where you rotate the subtree you're also rotating an entire tree just not the one identified by the balance anomaly which is one higher and has a whole tree on the other side that gets pushed down by the rotation.

I am enlightened!

Could someone who actually knows how this algorithm works rewrite this section? The current section has been written by someone who hates knowledge.--Cubancigar11 (talk) 07:07, 7 April 2014 (UTC)

Some simple amendments. --Nomen4Omen (talk) 19:44, 1 February 2016 (UTC)

Insertion/Deletion average complexity[edit]

Every reference cited lists the average complexity of insert and delete as O(log n). This is because an insert or delete operation is considered to be done without regard to the state of the tree. The operation must start at the root node of the tree and search until it finds the appropriate place to either insert or delete the desired element. There is no dependence on any previous traversal of the tree for this operation to happen. — Preceding unsigned comment added by Jmonty42 (talkcontribs) 21:38, 29 September 2015 (UTC)

It is quite common to list the average complexity of insert and delete as O(log n). The above contribution also states: "The operation must start at the root node of the tree". But this is not absolutely true. Assume a tree having more than one access path to an object, e.g. a free storage management system with two access paths to each free space chunk, one with size as key, which will be used at storage acquisition, and the second with start address, which will be used at storage return. At storage acquisition, a chunk of sufficient size has to be searched for, removed from the set of free chunks in both trees and a possible remainder reentered into the set of free chunks in both trees. The walk from the node in the size-tree to the node in the address-tree can be accomplished within the chunk, so that no second search (for the address-chunk) is required. So in case the size fits exactly, we have a deletion (size-tree) with O(log n) and a deletion (address-tree) with a deletion cost where the operation does not start at the root node, it can start at the target node so that no searching is required. The walk from the node in the size-tree to the node in the address-tree is one (1!) instruction. For most balanced BSTs the worst case update cost amounts to O(log n), but on average it is O(1). If the size does not fit exactly, we have additionally to reinsert the remaining size (requiring a second size-search). In the address tree, because the chunk is already in correct (address-)sort order, we can immediately reuse the node. Thus, the cost is from 1 search and 2 mere (= without search) deletions up to 2 searches and 2 mere deletions and 2 mere insertions.
At storage return time, the user arrives with (start address, size). This has to be checked for conflicts with present free chunks. If there is no conflict it has to be looked for adjacent free chunks, so that these can be joined to make up a bigger free chunk. Thus we have, in the non-adjacent case, an insertion (address-tree) with search and O(log n) and an insertion (size-tree) with search and O(log n). The check for adjacency requires a 1-step traversal in both directions, address up and address down, which also amounts to O(log n) in the worst case, but on average is O(1). So, at storage return time we have 2 search operations, although we maintain at least two trees and in each tree up to three objects, i.e. we update from at least 2 up to 6 objects altogether.
The only source of which I know so far, that to some extent separates search cost from mere update cost, is Mehlhorn [7 Sorted Sequences]. On p.158 under "7.4 Amortized Analysis of Update Operations": "Theorem 25. Consider an (a, b)-tree with b ≥ 2a that is initially empty. For any sequence of n insert or remove operations, the total number of split (p.151) or fuse(p.152) operations is O(n)." (Split or fuse are needed only for rebalancing operations.) And on p.165: "AVL trees do not support constant amortized update costs."
Because O(log n) and O(1) always sum up to O(log n) there is IMHO nothing wrong when stating: update cost is O(log n) in the worst case and O(1) on average when search is not included. So I would appreciate very much to find another and maybe more explicit source for this separation. --Nomen4Omen (talk) 17:24, 1 February 2016 (UTC)
Refer to our previous discussion on the Red-Black Tree talk page. Firstly, the data structure you describe with multiple access paths to each node is NOT a binary search tree, AVL tree, or any other tree-like data structure as defined by anyone. Every single source that has been cited on any page about this has noted that you cannot insert into a binary search tree without searching first. Please carefully read the source you cited. In the opening paragraph of section 7.4 it reads "In the best case, we basically pay for locating the affected element, for updating the sequence, and for updating the bottommost internal node." It clearly states that searching must be included in insertion into a binary search tree. There is no logical way to consider it otherwise and nothing you have ever cited supports that. The example you give above is nonsensical and irrelevant to discussions about binary search trees. The data structure you describe is by definition not a binary search tree. Jmonty42 (talk) 02:22, 8 February 2016 (UTC)
Dear User:Jmonty42, you are right: the data structure I described above is NOT ONE (1!) binary search tree – it is TWO (2!) binary search trees, combined into ONE data structure. And it is only ONE example of combining data structures in an application. There are certainly lots of other applications which combine data structures together in one application. I chose this one, because it has an inherent motivation, although there are certainly lots of other applications, where it is useful to have secondary, tertiary etc. access paths to a node besides the intrinsic binary search. (But of course, if they shall have some motivating power they can not be totally simple. And I do not claim that I found the simplest one.)
You partially repeat what I said in the first sentences of my above contribution that I do not know of a source which explicitly modularizes INSERT and DELETE into SEARCH and UPDATE costs – except that citation from Mehlhorn. His statement that "AVL trees do not support constant amortized »update« costs" would be totally pointless if modularisation between lookup and update would not be possible. He knows and we all know that »update« costs for RB trees are indeed constant on average as they are for AVL trees. So he makes this statement for marking a (small) advantage of RB trees over AVL trees, an advantage which were completely nonsensical and irrelevant as you would say if modularisation at this point were impossible. No need to stress that software modularisation is principally a good thing. It allows to (re-)combine the modules in a way useful to an application with other modules, twice, and three or more times.
And if modularisation is possible (of course it is) and is foreseen from the beginning, there is no need to quote e.g. this (as you say: nonsensical and irrelevant) example of free storage management. (And btw, modularisation is already halfway foreseen in the articles and by all authors in that SEARCH is a module of its own. What if the application has to take a look at the data before deleting; and if it decides to delete, has it to search for the second time with the same result?)
Finally, it is not a crime to look for a source – in this case for one which explicitly takes into account modularisation. --Nomen4Omen (talk) 17:33, 8 February 2016 (UTC)

Article to implementation - Contains Jibberish - Sabotage?[edit]

I've tried making an implementation from this article. It is very close to functional however this article has not helped in a way that I think goes beyond any kind of confusion about academic, legacy or field specific language. It is almost as though someone has mangled it to stop people copying and pasting or reading and rewriting into assignments. It just isn't written well.

It mixes several styles of demonstrating things and I'm not sure if that's because of multiple authors or something else (someone writing each part in whichever way seems easiest). However it gets progressively worse, perhaps because people haven't gone all the way through it. It gets worst on deleting a node. It is written in a way that seems over complicated to me and even makes no sense in parts. Consider this:

"Let node X be the node with the value we need to delete, and let node Y be a node in the tree we need to find to take node X's place, and let node Z be the actual node we take out of the tree."

"4. Choose node Z to be all the child and parent links of old node Y = those of new node X."

How can node Z be a node but then be a set of edges? How can we take any node out of the tree but X? Even if we swap X with Y, we're still removing X from the tree. Here it suggests to create a temporary node Z which is a copy of old Y but actually the new X which we already have after swapping so it's redundant. I find the concept of Z as described in the article to be really confusing and it seems like someone's implementation quirk that requires a virtual node but then it does not make sense because why would you delete the virtual node in a later step and not X? In reality I only see three real nodes directly in play.

  • Always: The node to delete (X).
  • Sometimes: The node to swap the node to delete with (Y). -- This operation should have no side effects regarding the following operations.
  • Sometimes: The node to replace X with (Z). -- This may be null in most implementations with a check needed only for updating its parent.

Y in play if count_children(original(X)) == 2, Z in play if count_children(updated(X)) == 1

As far as I can establish so far all you need to do is swap X with a candidate further down if there is one so that your delete affects as little as possible (occurs on a leaf or on a node containing only one leaf).

  1. If X has less than two children go to 4.
  2. Let Y be closest value to X starting with either one of its children (I choose the highest one or arbitrarily if equal). You can do this with the same search used for insertion reusing the value of X starting with the chosen child (assumes values are unique). Simpler, if you choose left, then keep going right until you can't because going left is moving further away from X and vice versa. Y is guaranteed to have one leaf child or no children with any child being more distant to X.
  3. Swap all of the parent child links for X and Y, including the respective child links in their parents (may be root). The tree structure is now identical although the order is no longer preserved (X is now out of order but that is acceptable as it is to be deleted). If node identity is unimportant, you may simply swap the value of the two nodes then let X be Y.
  4. If X has a child let Z be that child otherwise let Z be null (empty).
  5. Replace X with Z so that X's parent now points to Z instead (or update root with Z if X is root) and if Z is not null let Z's parent be X's parent. X will no longer exist in the tree. It can be deleted now but its parent will need to be preserved for the next step.
  6. Letting N be Z and P being the parent of X retrace the tree back up the path correcting the balance as needed (while instead of do while).
  7. Explicitly delete X if necessary.

6 should now integrate with the pseudo code which can be halved as the code for the other side is the same with left/right and 1/-1 swapped. On second taught, 6 has a problem if X is empty. The first iteration should go by which ever side X was on. Perhaps a simply approach is to delete X last of all and use it to seed N.

The pseudo code also is a bit misleading as far as I can tell as it doesn't manage updating the balance factors after a rotation. For the alternate case it's a small distinction/optimisation but you don't need to double rotate. You can update only the relations needed in a single action.

I have not verified that the above is 100% correct in all cases so I have not attempted to update the article.

I have made a working implementation which passes full testing. I believe much of the problem at least concerning the pseudo code is that implementation is not trivial. I've seen algorithms elsewhere that can be implemented completely from the pseudocode or are well abstracts giving a complete overview. The former is unlikely to be possible here. Some have tried but haven't been able to get it right. There are a lot of specific details when it comes to implementation as well as varying approaches. The depth that parts of this article go to is inconsistent and ultimately arbitrary. One category of details will be included here but not there. One format here and another there.

I suggest a single format should be the target of this article when describing steps in implementation. Either well abstracted pseudo code from someone who has a good working implementation in particular with less duplication of symmetrical cases (use words such as opposite, alternating, etc) or in the list form but written in a way that is less waffle and more abstract about differences, for example, just swap X and Y.

It isn't completely useless. I did manage to make a good implementation from this. However the article caused far more confusion and impedance than I think it really should have to if written to the expected standard.

— Preceding unsigned comment added by 2001:978:2300:30A:2FC6:5A8A:5E23:429 (talk) 19:42, 24 February 2016 (UTC)

Original research[edit]

Please read carefully the major wikipedia policy WP:NOR and don't write texts in wikipedia articles without backing all statements with references from reliable sources. - üser:Altenmann >t 15:23, 24 July 2016 (UTC)

Article in ruins[edit]

It’s really a shame what happened to the article five months ago on 24. Jul. 2016. Since then the article is missing almost completely the sections most important to the subject, namely the ones about the modifying operations insert and delete. For these operations the data structure has been invented at its time, but the article no longer reflects this.

The destruction has been verbally justified by „rm unreferened original computer science research“ or here, in Talk:AVL tree, by „don't write texts ... without backing all statements with references from reliable sources“. A closer look at the old article would have shown that there is not really original computer science research (maybe it is kind of Synthesis of published material) and that especially the code portions have been missing references. But for the main text there were reliable references such as Georgy Adelson-Velsky, Evgenii Landis; Knuth; Sedgewick; Pfaff; Alexander. Anyway, for calling for more or better references there could have been other means than total erasure. However, also the rather old template messages in sections Searching and Traversal about additional citations have not been very encouraging to authors all the time for five months now. Almost nothing essential has happened since then. So the question remains: Why? — Preceding unsigned comment added by 92.203.216.3 (talk) 07:21, 28 November 2016 (UTC)

Changing language to haskell?[edit]

AVL Trees have a very functional structures and using a functional programming language like haskell highly simplifies the code. Haskell removes all the boilerplate and brings the implementation very close to pseudocode. See https://gist.github.com/gerard/109729 Vishal0123 (talk) 01:11, 3 May 2017 (UTC)

Changes by user:Thislooksfun[edit]

Sorry, as far as I can see you are a newcomer.

Sorry, you are wrong. Please read the text carefully. I guess everything is explained. Y is never the inserted node. Y is a subtree. The inserted node could be the Z of section #insert. Maybe you find an even better explanation. But the statements are correct. --Nomen4Omen (talk) 20:06, 7 May 2017 (UTC)

Re: Changes by user:Thislooksfun[edit]

user:Nomen4Omen,

You are correct, I am new, and I apologize for the misunderstanding, I did not realize that Y was never the inserted node. In that case, the code as it stands is correct.

However, it does prompt me to suggest that the article is incomplete, since it doesn't appear to deal with the case where Y is the newly inserted node. Having just implemented my own version of an AVL tree that didn't have a deletion operation (for a class), I omitted all code that wasn't needed when only performing insertions. However, when omitting the case BalanceFactor(Y) == 0, my code produced in incorrect tree. Upon adding it in, everything worked as expected.

If you can, in fact, guarantee that Y is never the inserted node—which as far as I can see, the code outlined doesn't—then omitting that section during insertion is fine.

If, however, Y might be the inserted node—which, once again, the code appears to allow—then that block is needed for both insertion and deletion operations.

It's entirely possible I'm missing something obvious as to why Y is never the inserted node in the provided pseudocode, but in that case I both request a reference to that point, and suggest making that fact more clear throughout the article to avoid future confusion.


-- Thislooksfun (talk) 01:34, 9 May 2017 (UTC)


(Additionally, as I am new, am I using Talk correctly with this? It's my first time on this side of the magic.)

@Thislooksfun: Both, #insert and #delete, are described in the respective sections. There the main and the "intrinsic" steps of insert resp. delete are handled – and both in an iterative loop. In some circumstances (also described there) the balance of the tree is (temporarily) not AVL, so that the tree has to be rebalanced (which is done by means of rotations). Because this may happen in both, insert and delete, these code portions are placed in a separate (and "common") section called #Rebalancing – as subroutines. Despite this "common" code, there are small differences when called from insert or delete, which may suggest a high performance programmer to write separate code for rotations during insert viz. delete. Btw, another difference between insert and delete is, that after a rotation the insertion is always complete while a deletion may have to continue the loop.
Unfortunately, the figures related to the code portions specific to insert resp. delete, are only in the article Binary tree. But you may add some by your own and apply the correct node names, as they occur in the AVL article. --Nomen4Omen (talk) 09:38, 9 May 2017 (UTC)