|WikiProject Computer science||(Rated B-class, High-importance)|
- 1 Untitled
- 2 Deletion from an internal node is NOT clear
- 3 Intro to Algorithms
- 4 duplicates
- 5 Nodes structure
- 6 variants
- 7 Error in illustration of B+ tree
- 8 Concurrent access
- 9 Repeated info
- 10 Picture
- 11 weight-balancing scheme
- 12 Relaxed Balance
- 13 Minor Quibble
- 14 hop?
- 15 Boeing citation
- 16 Leaves "carry no information"?
- 17 Big Oh notation
- 18 I fixed some things; more improvements needed
- 19 Where is the complexity?
- 20 English
- 21 Recent edit to Knuth's description of a B-tree
- 22 Request revision on the Rebalancing after deletion
- 23 Is 2-3 tree a B-tree?
- 24 Animation
- 25 key - copies contradiction
- 26 The B-tree uses all those
- 27 2014-11-21 BS Comments
This article needs a complete rewrite from an expert. It's completely useless in many ways. I will erase now the Access concurrency section; the reference links to a non-public site!, WTF?!
In response to the user Wolfkeeper: Thanks for explaining the Wikipedia policy about external links, however I still think this section is wrong, because in order to understand what it means, you must check the link. If it is not removed, at least I expect a more serious explanation. —Preceding unsigned comment added by 18.104.22.168 (talk) 04:39, 17 May 2009 (UTC)
Shouldn't the majority of this be under (a,b)-tree? B-trees are a variant of (a,b) trees, so the (a,b) tree article should contain pretty much everything except stuff like the fact that a = B/4 and b = B/2 in a B-tree.
Moved this text to the discussion section from the main definition. Chadloder 18:34 Jan 22, 2003 (UTC)
The original author interchanged between B+-tree and B-tree. If I remember correctly the only difference is that one only stores values in leaf nodes.
Also, should this article be titled B-tree instead of B tree? And does any know what the B stands for? I think it is balanced.
- German Wikipedia states that it is unclear what the B stands for and that the inventors didn't gave an explanation for it. It could be Balanced, Bayer, Broad, Bushy, or Boeing. (One of the inventors of the B-Tree is Rudolf Bayer, who worked for the Boeing Scientific Research Labs.)
- Bayer tree is a common name for them, at least in German. This is the name by which I was taught about them as a computer science student. A balanced tree is a tree that is balanced in general. I'm not familiar with any of the terms broad tree, bushy tree, or Boeing tree. So my vote is for Bayer. Rp (talk) 17:51, 8 July 2008 (UTC)
- I moved the article from B tree to B-tree. The old title might have been that way because the author was trying to write about B+trees and the '+' got gobbled up. --Mrwojo 15:32 Feb 15, 2003 (UTC)
- Sometimes B+Trees are also named B*Trees.
There is some disagreement on what the B stands for. Some people think it stands for 'balanced', 'bushy', or even 'Bayer' after one of the authors. Chadloder 18:35 Jan 22, 2003 (UTC)
Deletion from an internal node is NOT clear
In the first case, joining will lead to disaster if the left and right subtrees are not leafs. Example: for a B-Tree of max size of 2, insert values of 1 to 11 and then delete 4. —Preceding unsigned comment added by 22.214.171.124 (talk) 04:52, 15 May 2009 (UTC)
Intro to Algorithms
I deleted the sentence about the psuedo-code for deletion being in intro to algorithms because I have the newest edition of the book and there is no psuedocode in the deletion section. --Anon —Preceding unsigned comment added by 126.96.36.199 (talk) 03:44, 21 July 2008 (UTC)
B-trees can have duplicates, right? This could be added to the article. Meonkeys 19:16, May 16, 2005 (UTC)
Yes. The B-tree representing the "name index" in a phone book database probably has lots of identical duplicate "John Smith" names.
Also, is a B-tree the same as an n-ary tree? Meonkeys 19:16, May 16, 2005 (UTC)
A B-tree can be seen as a very specialized form of n-ary tree, one that adds a pile of special restrictions. The restrictions make insertion and deletion a little more complicated -- but, in return, the B-tree can guarantee very fast searches.
- Some nodes in some n-ary trees have only 1 or 2 children. B-trees are constructed so every internal node has at least L children.
- Some n-ary trees do not have all the leaves on the same level. (For example, it's impossible for a binary tree with 5 leaf nodes to put them all on the same level). B-trees are constructed so that every leaf is at the same level.
--DavidCary 20:52, 30 July 2005 (UTC)
- Really. It is just a matter of choosing less-than-or-equal in place of less-than relation. That of course complicates the search algorithm, because when we find an item 'equal' to the given key, we need to check also its siblings for equality, and corresponding subtrees as well. In a degenerate case of a tree of N 'equal' nodes we need to find and return all nodes as a result of a single search. That's difficult, so we usually would rather keep a single instance of every key, and link duplicates in some additional piece of storage, say a kind of linked list. --CiaPan (talk) 19:47, 14 May 2014 (UTC)
I 'm wondering if the U (maximum of elements) in the article is not an 2L (minimum of elements).
- There may be a terminology problem here. The usage of L and U appears inconsistent below. For a simple definition, the number of elements (aka keys) should vary by a factor of 2 to allow splitting an merging to work. Comer uses as the minimum number of keys in node, and as the maximum number of keys. The number of children will vary from to . Some use to be the maximum branching factor (max children) and other use it as the maximum number of keys in a node.Glrx (talk) 20:53, 18 August 2009 (UTC)
Other sources from goole tell U =2L (minimization factor): http://www.bluerwhite.org/btree/ (the structure of B-tree), http://www.semaphorecorp.com/btp/algo.html (3 paragraph). --Loading 07:29, 18 Jun 2005 (UTC)
Certainly a programmer *could* pick U=2L. That type of B-tree is the simplest kind for a teacher to try to explain. But often programmers pick some maximum number of elements U that is is an odd number, such as 2-3 trees, L=2, U=3. Donald Knuth seems to like U = L * 3/2, in the form of a B*-tree. --DavidCary 20:52, 30 July 2005 (UTC)
I am teaching B-trees at San Jose State University. I have three sources (other than the Wikipedia article) that pairwise disagree on this L-U issue. Here are what they say:
(1) The well-known algorithms textbook, Introduction to Algorithms, by Corment, Leiserson, Rivest, and Stein (CLRS), defines B-trees in terms of one parameter t, the "minimal degree" (p. 439). Thus L = t and U = 2t. It is required that t be an integer, so in particular U has to be even.
- Do L and U refer to limits on keys or the limits on branching? If they refer to keys, then setting t=1 would create a 1-key/2-key B-tree which would be the ordinary 2-3 B-tree. It depends on the definition of "minimal degree". CLRS may be using t in the same sense as Comer's d.Glrx (talk) 20:53, 18 August 2009 (UTC)
(2) The B-tree animation applet of slady (linked from the article) also uses a single parameter, which he calls the "order". He doesn't explain it, but experiments with the applet give, for order 1, L=2 and U = 3, and for order 2, L=3 and U = 5. In particular you always get U odd, so the animation applet can't duplicate any of the trees discussed in CLRS. (Do you suppose this confuses my students?!)
- This would fit if Slady's "order" of a B-tree is Comer's -- the minimum number of keys in a node. Then L=d+1 and U=2d+1. U will always be odd. Here L and U are refering to number of children/subnodes and not number of keys.Glrx (talk) 20:53, 18 August 2009 (UTC)
(3) Knuth's "Art of Computer Programming", vol. 3, p. 473, defines B-trees in terms of the "order" m. Comparing it to the Wikipedia articla, we see U=m and L = ceiling(m/2); we need ceiling since m is not required to be even. So CLRS covers the case when m is even, m = 2t; but Knuth also allows the case L=t and U = 2t-1; this seems to be the case the applet covers, where the "order" you enter in the applet is t-1.
Some restriction on the relation between L and U has to be made; when we split a full node (containing U-1 elements) upon inserting a new key, we will get two new nodes containing U/2 elements each (for even U) or one with floor(U/2) and one with ceiling(U/2) elements, for odd U. We must require that L be at most floor(U/2) so that these nodes are legal. Moreover, L can't be too small either, or deletion won't work right. On 1 April 2006 I edited the article to correct this point, and also to correct the description of the deletion algorithm. --User and date unknown
- The above description is incorrect because a key must be promoted to the parent level during the split. We split a full node (with U-1 elements) only when inserting a new element (for a total of U elements). The result of the split will keep U-1 elements at the current level and promote one element to the parent level. If U is even, then one sibling has (U-2)/2 and the other (U)/2 elements. If U is odd, then both siblings have (U-1)/2 elements.Glrx (talk) 20:53, 18 August 2009 (UTC)
In the following I'll try to explain the definition tought at Technical University of Munich where Bayer used to work. This definition seems to make sense, and is consistent with the 3 definitions above. Only the terms used differ slightly. As above, I'll use the letters L and U for the minimum and maximum number of children of a node, though the letters a and b are commonly used at TUM.
Any search-tree that works the way explained in the article needs to comply to U >= 2L-1. Otherwise splitting up a node upon insertion will not work. A balanced search tree complying this rule is called (a,b)-tree. An (a,b)-tree where U = 2L-1 is called a B-tree.
These definitions have the following implications:
- The above definitions 2 and 3 comply to this definition of a B-tree. (In the second edition of Knuth's "Art of Computer Programming", vol. 3 it's on page 483, and it's a bit vague: does m/2 mean integer division here?)
- The above definition 1 is not valid for a B-tree in this strict sense, but for another specialization of an (a,b)-tree.
- The values U = L * 3/2 for B*-trees do not comply to this definition of a B-tree or an (a,b)-tree. Therefore a B*-tree is not a specialization of a B-tree, but a different though related datastructure.
- The term (a,b)-tree gives another possible explaination for the meaning of the B in B-tree. For a B-tree, only the b needs to be known, and it's always odd. The a can be calculated as ceiling(b/2).
I don't know if the above nomenclature is common outside of TUM. I'm not an expert on the topic. The only references I can provide are two lecture notes in German: ead_ws9697.ps.gz and ead_ws9798.ps.gz.
Maybe someone should check the original publications! However, this node structure issue needs to be clarified in the article. In particular the rule U >= 2L-1 should be mentioned, even if it turns out that all (a,b)-trees are commonly called B-trees.
Michael R. 22:55, 16 August 2006 (UTC)
Should *this* article B-tree have a section that explains variants such as B*-tree, B+ tree, etc.? Then I want a section that explains pros and cons -- why would someone use one variant instead of another, or instead of a simple binary tree ? Some people find that preferable to having a bunch of very short stub articles that say "just like a B-tree except that ..." -- or, worse, a bunch of fully-fleshed out articles that repeat the preliminary explanation over and over. --DavidCary 20:52, 30 July 2005 (UTC)
Are 2-3-4 trees a kind of B-tree with L=2, U=4 ?
Error in illustration of B+ tree
It appears that there is an error in the illustration. According to the text, all values in the leftmost subtree should be less than a1. In the illustration, however, the leftmost subtree contains the value 3, which is equal to a1.
I agree, from "Database Systems: The Complete Book"
-- if j pointers are used then there will be j - 1 keys, e.g. K_1, K_2, ..., K_j-1 so the first pointer points to a part of the B tree where some of the records with values < K_1 will be found; the second pointer points to where records with values >= K_1 and < K_2 will be found and so on.
This ordering is also used in 'Silberschatz, Korth, Sudarshan: Database System Concepts, 4th Edition'. Schmid 12:39, 13 August 2006 (UTC)
The paragraph at the end of the insertion section suggests that an improvement involves splitting nodes that might fill on a leaf node split on the way down the tree. This buys nothing. It only reduces the effective node size by one.
Unless the nodes are locked there is no guarantee the nodes will remain non-full on the way back up during a split. To allow maximal concurrency, you don't want to lock nodes on the insert descent.
--Karl Malbrain 20:52, 18 May 2006 (UTC)
What is "completely off" about this section? I have a copy of the reference material and summarized it for this section. N.b. there is no requirement that reference materials be available on the web for free.
If I publish an online paper (free or not) about "B-Tree Programming" is not enough to reference it, unless we call this advertising. If this is not the case, the least I expect from a summary is what are the alternatives, in which cases is better (not the -why-, I can check the reference if I am interested enough) and optionally which vendor implements it. —Preceding unsigned comment added by 188.8.131.52 (talk) 14:59, 20 May 2009 (UTC)
The paper was published by the Association for Computing Machinery, it is not an on-line paper. Are you challenging this source in specific? I added the section in an attempt to flesh out the material from the perspective of a high-school understanding of computer science. I don't know which vendors use the forward pointers in their B-tree structures. Again, what is "completely off" about this section?
Thanks for the response. The "off" I meant, is that it sounds like summarizing JFK as a president assassinated with a reference to a book about his life. What are the different strategies? (the current summary sounds like there are not other alternatives. If this is the case, then at least it must say that there are not other). In which cases this specific strategy is convinient?. Are there any known engines that implement this? (MyISAM maybe?). —Preceding unsigned comment added by 184.108.40.206 (talk) 00:52, 21 May 2009 (UTC)
A lot of information is repeated redundantly throughout the article. Please consider rewriting it to make it a bit more concise. --220.127.116.11 17:37, 21 May 2006 (UTC)
- I like the use of "repeated redundancy" - Yogi Berra would be proud. The article is basically a joke, full of too many ambiguities and self-contradictions. One problem that seems to occur in every Wikepedia article about trees is the failure to distinguish between "keys" and "values" - or state that the two terms are synonymous. Here, that's just one factor that makes this article useless. — Preceding unsigned comment added by 18.104.22.168 (talk) 18:32, 21 May 2014 (UTC)
I think the picture should be removed. Why would you put a picture of a B+tree on the B-tree page. It would be like looking up the US capital and finding a picture of the Eiffel tower. Not to mention the B+tree picture is incorrect. 3 should be in the middle node and 5 in the last. The pictures are great but people look at the picture instead of reading and the info in the picture is wrong. I would recommend removing the pictures unless they can be corrected. —The preceding unsigned comment was added by 22.214.171.124 (talk • contribs).
Could someone explain the weight-balancing schema invented by Arge and Vitter? It seems to have better performance than the 2-3 B-trees.
There has been quite a bit of work by Kim S Larsen and Rolf Fagerberg (e.g. http://citeseer.ist.psu.edu/51205.html ) showing how the balance of a btree can be maintained lazily with several benefits. For the purposes of elucidation relaxed balance is particularly good because you can describe insertion or deletion and separate the basic steps of the algorithm and the steps for the maintenance of balance. Another benefit is that all modifications to the structure of the tree are local, so concurrency is more easily exploited.
I could be missing something obvious, but I don't see why tuning a B-tree to disk block sizes would result in a 257-513 B-tree. Shouldn't it be a 257-512 B-tree? —Preceding unsigned comment added by 126.96.36.199 (talk) 12:47, 31 July 2008 (UTC)
This depth will increase slowly as elements are added to the tree, but an increase in the overall depth is infrequent, and results in all leaf nodes being one more hop further removed from the root.
Is that "hop" just the writer's voice, or do people frequently use the word "hop" to describe that? I know there are a lot of terms used in computer science that don't sound nearly as technical as they are, but on the other hand, "hop" only occurs once in the entire page. (Not saying it's a problem and needs to be fixed, just wondering.) --188.8.131.52 (talk) 19:45, 6 November 2008 (UTC)
- It's a term borrowed from networking, apparently. I would use "one node," and have modified it accordingly. Dcoetzee 21:16, 6 November 2008 (UTC)
There is a "citation needed" marked beside the word Boeing.
The citation is in the external link mentioned below, named NIST's Dictionary of Algorithms and Data Structures: B-tree at http://www.nist.gov/dads/HTML/btree.html written as this:
- The origin of "B-tree" has never been explained by the authors. ... "balanced," "broad," or "bushy" might apply. Others suggest that the "B" stands for Boeing. [Bayer and McCreight were at Boeing Scientific Research Labs in 1972.] Because of his contributions, however, it seems appropriate to think of B-trees as "Bayer"-trees.
- - Douglas Comer, The Ubiquitous B-Tree, Computing Surveys, 11(2):123, June 1979.
- The origin of "B-tree" has never been explained by the authors. ... "balanced," "broad," or "bushy" might apply. Others suggest that the "B" stands for Boeing. [Bayer and McCreight were at Boeing Scientific Research Labs in 1972.] Because of his contributions, however, it seems appropriate to think of B-trees as "Bayer"-trees.
Leaves "carry no information"?
All leaves appear in the same level, and carry no information.
That's not how the pictures look.
Big Oh notation
"can be done in O(log2N) comparisons. If the table had 1,000,000 records, then a specific record could be located with about 20 comparisons: log21,000,000 = 19.931". It's a detail, but that's not the meaning of the Big-Oh notation. Depending on the implementation it may need a million comparisons or even more. The big Oh notation only says something about how the performance scales as N increases. I understand, though, that this may be harder to explain in Layman's terms and that it might confuse readers who don't know about complexity analysis, but what is written now is just plain wrong. 184.108.40.206 (talk) 15:44, 8 May 2010 (UTC)
I fixed some things; more improvements needed
I'll get right down to it.
Problem with definition
First, in the list of five Definition rules, the second rule is:
2. Every node (except root) has at least m⁄2 children.
But, leaf nodes have no children. In defining the number of elements by the number of children, the article later leaves ambiguous the number of elements in leaf nodes.
Furthermore, the section decribing bulkloading uses a different set of rules where the leaf nodes and internal nodes have different maximum sizes, and seems to imply an expectation of the disconnection of the number of elements in leaf nodes from the number of elements in internal nodes. However, the rest of the article above this section assumes leaf nodes have the same maximum number of elements as internal nodes (as I understand it).
Near the top of the article, some inconsistencies in terminology in the literature about B-trees are raised. Perhaps this is at least part of the origin of some of these problems. The article would be better if after expositing the variations in terminology, it would choose and assert definitions for a system of terms to be used at within the remainder of this article. That would make the article clearer both in the minds of editors and to the reader, and would provide a framework on which judgements of correctness (by editors) and building of understanding (by interested end-user readers) could be based.
B-trees in filesystems and FAT filesystem
I have corrected an incorrect application of the terms "logical block" and "physical block" in the section of B-trees in file systems. Almost nothing in modern file systems refers to physical blocks, as disks present an interface in terms of Logical Block Addresses (LBAs) or logical, not physical, CHS addresses. (Floppy disks still use physical CHS addresses, but are becoming rarer and rarer.) Furthermore, even DOS file systems like FAT12 and FAT16, considered primitive by most today, use logical addresses in the FAT which are relative to the data area of the file system volume and are not the physical addresses of the disk nor addresses relative to the start of the partition.
Therefore, I have changed this section of the article to use the terms "file block" and "disk block". I think it is clearly enough implied that a file block is a block addressed relative to a file, and it is obvious that such a block address is logical, not physical, so I have not used the word "logical" with "file block", and I have not explicitly defined what a file block address is. As for disk blocks, it is not really pertinent to the B-tree discussion whether they are physical or logical, so long as it is clear that disk block addresses address actual blocks on the disk, not abstract blocks in an abstract file, using an address scheme that is global over at least the part of the disk used by the file system. Again, I think implication suffices to expose this; anyone who has a basic understanding of the purpose, nature, and rudimentary fucntion of a file system should understand this, and anyone who does not will not understand, or probably be interested in, the discussion of B-trees in file systems anyway. The point is that file systems map file blocks--parts of a file--to disk blocks--units of storage on a disk--and that this is what B-trees (and File Allocation Tables, and other alternative concepts and structures) are used for in file systems. The physical/logical dichotomy here is both incorrect and overly narrow so as to confuse or at least complicate the essential concepts about B-trees in file systems.
- I think that the term "Disk Block" is still very problematic for a number of reasons, starting with the fact that in most OS environments and jargon it has a very specific meaning that is different from how it is used here. "File Block" is confusing because it is unfamiliar and unclear what it means. The term actually used by almost all implementers, practitioners and analysts of extant B-Tree indexing is "Bucket". This is the term that I would suggest be used in this article. RBarryYoung (talk) 15:24, 28 February 2013 (UTC)
Algorithms - deletion of a node
I had made more improvements in this section, but all my work was unceremoniously reverted. I've reinstated the parts I think actually weren't in dispute. The other parts, in this section, really need improvements, as I think others above have said. Anyway, my suggested edits are in the history, if someone wants to revive them.
I have many issues with these edits, and that is why I reverted all of them.
While there are some improvements (such as FAT12 being 4080-2), there are many inaccuracies. In the B-tree, entries are rotated. Entries are not duplicated. There are unwarranted claims about the size of a file. There are extraneous comments about disk utilities.
I have reverted one of the reinstatements. I am considering reverting the FAT edits again.
Where is the complexity?
Recent edit to Knuth's description of a B-tree
I'm tempted to revert/modify this edit]. The edit uses another reference to overrule what is ostensibly Knuth's definition. The literature is troubled with differing terminology. See B-tree#Terminology. Knuth's terminology has an unusual bottom level of the tree -- leaves are below lowest keys. "Carry information" is also troubling; in some variations, the associated information is present above the leaves; in others, only key information is above the leaf; in still others, key information is copied above but retained below. Ordinary leaves, however, would have either the associated data or a pointer to that data and carry information. In Knuth's definition, the leaves appear to be the records -- and therefore carry information. Glrx (talk) 18:54, 28 December 2011 (UTC)
Request revision on the Rebalancing after deletion
Under section 5.3.3 Rebalancing after deletion there are
3. Append the first child of the right sibling as the last child of the deficient node
1. If the right sibling has more than the minimum number of elements
3. Insert the last child of the left sibling as the first child of the deficient node
2. Otherwise, if the left sibling has more than the minimum number of elements
From my understanding of B-Tree, these two steps make no sense as after you execute the first two steps described in that section. The separator and the first/last child of the sibling have been rotated. There should be no extra step to append or insert. I have no idea where this step comes from. But maybe I am wrong. I hope someone could further explain this.
- The section appears to describe the operation for a B+-tree rather than a B-tree. In the former, the parent only has a copy of the separator; there is not a full rotation in the B-tree sense. I've changed the section to use the B-tree rotation (and suggest that a left or right sibling may not exist). Glrx (talk) 16:20, 6 May 2013 (UTC)
Is 2-3 tree a B-tree?
By the definition in the book Introduction to Algorithms, a B-tree of order t is such that its every non-root node contains at least t children, and every node contains at most 2t children. The reason for this is that splitting a node into two nodes is possible only if it contains at least 2t nodes. This is a fundamental requirement for the data-structure algorithms on B-tree to work. In particular then, a 2-3 tree is not a B-tree; it's algorithms are different than those of B-trees. The text currently uses 2-3 trees as a recurring example; those examples should be changed to refer to the 2-3-4-tree, the smallest example of a B-tree. What do you think? --Kaba3 (talk) 18:15, 26 September 2013 (UTC)
- A 2–3 tree is a B-tree; it fits Knuth's definition. There are many definitions of order and many variations of the B-tree, but the basic algorithms are similar. If I take your definition literally, a 2–3 tree is B-tree of order 2 because non-root nodes contain at least 2 children and every node contains at most 3 children (which is less than 2t but does not violate 2t). Your second sentence is not the driving force behind balancing the tree; if a node may have nmax children, then there is no reason to split that node. The node is happy as it stands. There is also a subtle issue about keys/separators and children; a node has one more child than it has keys. A 2–3 tree has either one or two keys in each internal node. If an internal node of a 2–3 tree has 2 keys (3 children) already and I need to add a third key (a 4th child), then I can split the node into two nodes with one key each (total of 4 children) and the remaining key to the parent. Glrx (talk) 22:48, 27 September 2013 (UTC)
Hi. I am trying to develop a B-tree toy at . This article already has a link to an animation applet. Since applets are becoming unfashionable, I hope to replace this applet some day. I would appreciate it a lot if you would try out the toy and tell me how it can be improved. Maybe we could replace the applet linked then. --Ysangkok (talk) 13:41, 3 March 2014 (UTC)
key - copies contradiction
"In the B+-tree, copies of the keys are stored in the internal nodes; the keys and records are stored in leaves" vs. " The keys act as separation values which divide its subtrees"
Since a B+-tree is a B-tree, one must be incorrect. The "copies" can't be in the internal nodes when the original keys are.
The B-tree uses all those
"The B-tree uses all of the ideas described above." (What ideas?) "In particular, a B-tree:" . . . "keeps the index balanced with an elegant recursive algorithm" (where is balancing mentioned above?)
2014-11-21 BS Comments
"In B-trees, internal (non-leaf)" - 'non-leaf' is not mentioned in the referred to article. Could you please choose a different term than 'non-leaf', or define 'non-leaf' in the referred to article. i.e. In referring to something to explain something else, that something does not exist to be able to follow what is trying to be explained.
"within some pre-defined range." - range of what?
There is a basic problem here of a lack of definition of what a node is. There is also a problem that "A node is a structure which may contain a value or condition, or represent a separate data structure (which could be a tree of its own)." is too vague or intangible to be meaningful. In particular, 'condition' above is gibberish (to the reader at this point in their reading). Better, a node [which has an address, i.e. a path (address) by which it can be found <- this 'address' is the missing piece or attribute of the explanation), contains a piece of data. That data could be a value (such as the number 4, or the word 'red'), or a pointer (address) of (i.e. to) another node. [Missing, or, unless, a btree node can (must?) contain at least a value, AND zero or more pointers - at most one to the left (by definition, decreasing?), AND (at most) one to the right (by definition, increasing?)]. So, more succinctly, a node (with/having an address) contains a piece of data. Which is to say, a node is an address/(data and/or left and right pointing) pair.
- now this (above) paragraph may be completely wrong, and that's fine, but in being wrong demonstrates that my (the reader's) purpose in coming to this article is not getting satisfied. Either the above is correct and the article does not so explain 'simplistically' (too conceptual / intangible too fast), or does not well explain soon and simplistically enough what is actually meant (is true / actually is).
"When data is inserted or removed from a node, its number of child nodes changes." is baffling. Probably because it is self-referencing. (It is becoming apparent to me that I don't understand the lingo, or at least the established/recognized lingo, and if I don't, then that says the lingo is not being well explained.) So this bit ("When data is inserted or removed from a node, its number of child nodes changes.") says to me that what should be said is a node is a address/data+pointer 'pair', and nodes (and this is the problem with the use of the same term 'node' twice) contain SETS of nodes (or node sets). Thus the referred to diagram does not make sense (node-wise) as written. However, if one says nodes contain sets of nodes, it starts to come together. (That one does not understand how a node could contain a set of nodes, being a circular description, is ok - one can suspend understanding, expecting such to be explained in subsequent text.) The basic problem with the diagram is that it does not show lines between the .#. pairs. Saying that differently - the 'problem' is either that the diagram is 'wrong' OR the explanation thus far is, since I don't understand / haven't followed.
- by the diagram, the above node = address/data+pointer pairs is wrong. By the diagram, it would appear more correct to say that nodes contain (ordered) sequences of data, WITH POINTERS BETWEEN EACH PAIR OF DATA TO NODES THAT DESCRIBE (COVER) the data/value space between those adjacent values.
So, said differently, one assumes, or soon comes to assume b-tree = binary-tree, which is to say, a node could have at most left and right pointers. By this diagram, a node could have many pointers, and this is not made clear enough early on. So, perhaps earlier in the article it can say a b-tree is not (necessarily) a tree of binary nodes, but a tree of nodes, period. Binary or otherwise. Saying (noting), although possibly considered as redundant, noting that being a tree, nodes have descendants, and by its address knows its ascendant (if the latter is true - does a node's address tell the node where its parent is?).
Summarizing all of the above - the article, within only a few sentences, is as clear as mud. Since the purpose of writing an article is to clear up the mud, perhaps some reworking could be done? [I came here to learn / read / understand, so I'm not the guy to do the rework. i.e. If I understood the material I could contribute to the material, but because I came here to learn the material I can't. I can say, having come here to learn the material - I don't understand. [Therefore purpose in being written hasn't been satisfied. (?)]
"but may waste some space, since nodes are not entirely full" - does not yet follow / make sense. One would assume that a node size is dynamic, therefore only as big as the data contained, and so can waste no space. For the statement to be true, nodes are of a pre-defined size - something that should be stated beforehand. That this is true in the diagram is only now apparent to me. Adding to the diagram, or it's description, '4-element nodes' would make grasping what's being said much more clear much more readily. (Assuming 'element' is the correct term, there.)
The '2–3_tree' diagram illustrates what I say above - there is a diagram depicting a node (to my mind). The diagram depicted within this (b-tree) article shows something (node) that I can't understand as being a node (as I can in the 2-3 article's diagram) - they appear different. Thus I have a mental image of two different things being called 'nodes', and I don't understand (it hasn't been explained in this article) how the term 'node' is accurate for both diagrams. ('One of these things is not like the other' - so, i.e., understanding is not present [same name for two obviously different things]).