Talk:B+ tree

This article is within the scope of WikiProject Computer science, a collaborative effort to improve the coverage of Computer science related articles on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.Computer scienceWikipedia:WikiProject Computer scienceTemplate:WikiProject Computer scienceComputer science articles

High

This article has been rated as High-importance on the project's importance scale.

Things you can help WikiProject Computer science with:

Here are some tasks awaiting attention:

Article requests :
- Requested articles/Applied arts and sciences/Computer science, computing, and Internet
Cleanup :
- Computer science articles needing attention
- Computer science articles needing expert attention
Copyedit :
- Computing
Expand :
- Computer science
Infobox :
- Computer science articles without infoboxes
Maintain :
- Timeline of computing 2020–present
Photo :
- Find pictures for the biographies of computer scientists (see List of computer scientists)
- Computing articles needing images
Stubs :
- Computer science stubs
Unreferenced :
- WikiProject Computer science/Unreferenced BLPs
Project-related :
- Tag all relevant articles in Category:Computer science and sub-categories with {{WikiProject Computer science}}

This article was the subject of a Wiki Education Foundation-supported course assignment, between 19 January 2022 and 4 May 2022. Further details are available on the course page. Student editor(s): 00Nexiom (article contribs). Peer reviewers: Bulaaf.

inner insert breay, this diagram erringly splits nodes right when they reach capacity, thus precluding there ever being any "full" nodes. B-tree algorithms (and B+-tree) are supposed to split nodes only after capacity is reached and a new value is tried to be inserted in a full node. 132.198.12.98 21:30, 30 November 2007 (UTC)[reply]

The diagram is incorrect and has been removed. While there are some implementations of B+ trees that do split before the block is full (this allows you to perform the split after insertion is completed), this is not the "normal" procedure.78.91.39.182 15:13, 1 December 2007 (UTC)[reply]

Diagram[edit]

The diagram isn't very clear. It doesn't show that the data matches the keys, and it makes the 'linked list' pointers much different than the data pointers. —Preceding unsigned comment added by 24.26.131.178 (talk) 04:46, 4 December 2007 (UTC)[reply]

There should be step-by-step insertion/deletion diagrams showing a few common scenarios. --Aelon (talk) 10:32, 22 December 2007 (UTC)[reply]

The diagram contains a major mistake: in a B+ tree, right subtrees have keys greater than or equal to those in parent node. The root node must therefore be corrected to 4 - 6, and the issue is dealt with. Note: the very insertion algorithm described in this article ("Insert the new leaf's smallest key and address into parent") confirms the same. 26.09.2009

Merge suggested[edit]

It seems to me like the main difference between B-Trees and B+-Trees is a relatively minor one that has little effect on how operations proceed, which renders much of this article redundant with B-tree. I suggest this article be merged into B-tree and become a section of that article. Dcoetzee 03:58, 2 December 2008 (UTC)[reply]

Can't support this. The differences might not be obvious but lie in details. B+ trees have much more problems which are not discussed in the B-Tree article. For example how do you handle duplicate keys during deletion? Duplicate keys do not exist for regular B-Trees, therefore it's not an issue there. The differences affect all the algorithms, therefore it would not make sense to merge both articles. — Preceding unsigned comment added by 2001:41B8:83C:FA01:49DA:DF76:DF61:15D3 (talk) 12:12, 13 November 2018 (UTC)[reply]

Diagram[edit]

Am I not correct that the current diagram doesn't show a B+ Tree at all? First, nodes will have between N and 2N entries. Which integer N is 3 twice?

You are not correct. B+ trees are order B trees (a node must have between 1 and B children), and B doesn't have to be 2, it can take any integer value greater than 1.Michealt (talk) 11:47, 29 August 2011 (UTC)[reply]

Tree Order[edit]

Ramakrishnan's book on DBMS says that tree order of d means size of each internal node can range between d <= m <= 2d. However this article says it ranges between d/2 <= m <= d. I am not sure which one is correct. But majority of the CS Schools around take Ramakrishnan's book as reference. So this may be written wrong in the article. — Preceding unsigned comment added by 95.15.75.81 (talk) 18:18, 17 December 2011 (UTC)[reply]

This depends on whether you define "tree order d" as minimum tree order or maximum tree order. Both are valid. 2001:41B8:83C:FA01:49DA:DF76:DF61:15D3 (talk) 12:14, 13 November 2018 (UTC)[reply]

Fair enough, but that is not consistent with the definition given in the related article about B-trees, which uses the formula of the original paper (Bayer et al., 1972^[1]), although that one never explicitly talks about "order" in the sense of minimum or maximum number of index entries on a page. I would propose to go with the textbook definition of "order" (d <= m <= 2d) here which also is consistent with the B-tree article on here. 134.34.231.91 (talk) 09:24, 11 July 2019 (UTC)[reply]

References

^ https://link.springer.com/article/10.1007%2FBF00288683

Merge or Additional Clarity[edit]

The only sentence that separates this from the B-tree article is In a B+ tree, in contrast to a B-tree, all records are stored at the leaf level of the tree; only keys are stored in interior nodes.

Can someone add why you'd use a B+ tree instead of a B tree? There seem to be a lot of database solutions using B+ trees, so there has to be a reason, but without that reason, this article doesn't differ substantially from the B tree article. Dean (talk) 14:14, 22 July 2010 (UTC)[reply]

I've added some content to the implementation section describing at least one advantage of a B+-tree over a B-tree in the context of database systems. It seems to me that merging these two articles wouldn't be a great idea; while it's true that the articles are similar, that's largely due to a lack of detail on the insert and delete algorithms. There are subtle differences in how these work. If someone were to put in detailed pseudocode for these algorithms, they'd be somewhat different. That would make for quite a large merged article. While it's true that merging them now without that content might make some sense, it just means that someday, when someone wants to put in the detailed pseudocode, someone else would suggest splitting them. Maybe someone should just put in detailed code now? SeparateWays (talk) 19:02, 22 July 2010 (UTC)[reply]

Dean, as I understand it, a B-tree node takes up 50% more space (depending on data types, of course). This means that with the same block size a B+-tree fans out quicker, allowing you to traverse less tree levels to get to the leaf. — Preceding unsigned comment added by 82.139.81.0 (talk) 03:17, 18 May 2012 (UTC)[reply]

Actually, B+ Trees have several advantages over B-trees. As noted, all records are stored at the leaf nodes. The power of this is realised by another feature of B+ Trees, namely that all leaf nodes are connected with either uni-directional or bi-directional links. This allows sequential reading of all leaf key values by simply scanning the leaf nodes, following the leaf-node links. This is quite useful for efficiently using the inner nodes to locate the first key in a sequence, and then just reading the subsequent keys without having to traverse the tree up and down, as you would have to do with a B-tree. This is especially useful because B+ Trees allow duplicate key values, and B-Trees do not. Handling of duplicates presents several issues that B+ Trees have to contend with, which would never even be mentioned in the article about B-Trees. Kenbkop — Preceding unsigned comment added by Kenbkop (talk • contribs) 17:42, 15 July 2020 (UTC)[reply]

SQlite's use of B+ trees[edit]

At the very top of the article it is mentioned that SQlite uses B+ trees for its indices. The referenced link [[1]] says that B+ trees are only used to store data and not table indices. Should the reference to SQlite be dropped entirely or should there be an extra clause to say "SQlite just uses them for data!"? — Preceding unsigned comment added by 137.48.72.149 (talk) 13:54, 25 January 2012 (UTC)[reply]

I think it's fair to keep SQLite on that list as it is. The term "B+ tree" is only relevant when data records consist of a lookup field (e.g. table's primary key) and non-lookup data fields (rest of table data).

In the case of regular database indexes, the whole index record is the lookup key, therefore a B+ tree and B-tree would be equivalent anyway. I suspect that this observation also applies to other databases. -- intgr [talk] 13:40, 2 February 2012 (UTC)[reply]

Deletion of seperator value from index node[edit]

as seperator keys are copied from leaf and can be subsequently moved to higher index nodes, while deleting a key value we have to delete the key twice, once from the leaf and once from the index, right?

                     9
                  /     \
           2 5 - -       11    17        21     -
          / | \         /      |         |  \
              5 6 7 - 9 10 - - 11 12 13 -

like in this case if we want to delete 9, we first delete 9 from leaf; which will be prevented from underflowing by borrowing 11 from right sibling, adjusting their seperator entry in parent to 12 subsequently. The tree will then look like

                     9
                  /     \
           2 5 - -       12     17       21     -
          / | \         /       |        |  \
              5 6 7 - 10 11 - - 12 13 - -

but should we then delete the entry 9 from the root? which will need another deletion operation. Or we just keep that entry, it would take some space though and can acumulate to a larger space.

--Samik ganguly (talk) 10:30, 5 April 2012 (UTC)[reply]

Traditionally all keys correspond to actual data in leaves. If you don't delete the interior node value when the corresponding data is deleted, when will you? This is traditionally done by simply move the 10 to where the 9 is. A delete_min function typically does that using a pass-by reference parameter to the original data in the interior node's key value to be replaced.132.160.49.90 (talk) 04:02, 27 March 2016 (UTC)[reply]

Maximum possible number of keys in the root node as a leaf[edit]

Hello,

Quotation from the article: "(The root is also the single leaf, in this case.) This node is permitted to have as little as one key if necessary, and at most b."

Are you sure that the root node possible maximum number of keys as a leaf is b and not (b - 1) like any other "regular" leaf?

Thank you. — Preceding unsigned comment added by 217.128.106.129 (talk) 21:17, 10 November 2012 (UTC)[reply]

PostgreSQL's use of B+ trees[edit]

PostgreSQL is capable of using B+ trees: http://www.postgresql.org/about/ Ddugovic (talk) 16:59, 25 December 2012 (UTC)[reply]

That page only mentions B+ trees in relation to GiST, which is a framework for building advanced inverted index structures. The fact that it *can* be used for building B+ trees doesn't mean that anyone uses it that way. I think it would be very misleading to list it on this article because of GiST.

Normal ("btree" type) indexes in Postgres are not B+ trees. The distinction between B+ trees and B-trees is kind of nonsense for database indexes in the first place -- all the columns in the index itself *are* the lookup key, and they're the same on the leaf level as any other level. The *record* itself is generally stored in a separate structure -- in the case of Postgres, it's the table heap.

The distinction does make sense for databases which store whole table data in a tree, such as InnoDB (which uses a normal B-tree) or Oracle's index-organized tables (and others). Postgres always stores table data in a heap, not a tree, thus the B+ tree/B-tree classificaiton is not applicable. -- intgr [talk] 23:03, 25 December 2012 (UTC)[reply]

So, what is a B+tree?[edit]

I. Here it states, "A B+ tree can be viewed as a B-tree in which each node contains only keys (not key-value pairs)"

II. However, in the B tree article : "In the B+-tree, copies of the keys are stored in the internal nodes; the keys and records are stored in leaves"

Is is I, II, or neither?

A B+ tree is[edit]

A B+ tree is simply a B tree with data stored at the leaves instead of with the keys in the internal nodes.

Although it is possible to link the leaf nodes as a single or double link list, that is not a requirement. You could also do that with other trees, but it's just cleaner and more elegant to do with B+ trees than most others

There are misleading statements should be removed like the definition here that seems to be someone's inaccurate opinion coming from a typical variation. 132.160.49.90 (talk) 04:09, 27 March 2016 (UTC)[reply]

These statements are misleading, and technically incorrect. B+ Trees are an extension to B-trees, and as such are typically used as indexes for commercial database systems. The B+ Tree comprises two parts: a sequential index containing an entry for every record in the file, and a B-tree acting as a multilevel index to the sequential index entries. The sequential index is what makes it very useful for database engines, and it is also what helps to support duplicate key values. Kenbkop — Preceding unsigned comment added by Kenbkop (talk • contribs) 17:50, 15 July 2020 (UTC)[reply]

Search algorithm, insertion, and merging gibberish[edit]

First, a function just calls another function. What is k_0 or k_i? Why are there no brackets like in k_[i+1]? Where does p come from? etc .. etc... etc ...

"B-trees grow at the root and not at the leaves" Umm ....

Is there any reason whey the "merging" garbage should not be deleted?

Insertion: "B-trees grow at the root and not at the leaves."[edit]

I read the referenced "fundamentals of database systems" and after that I do not understand this saying either. If a leaf node is split the tree grows at its bottom as well, which is the common case. Maybe it would be better to say that **new levels** are added by splitting the root node and adding a new root.

External links modified[edit]

Hello fellow Wikipedians,

I have just added archive links to one external link on B+ tree. Please take a moment to review my edit. If necessary, add {{cbignore}} after the link to keep me from modifying it. Alternatively, you can add {{nobots|deny=InternetArchiveBot}} to keep me off the page altogether. I made the following changes:

Added archive https://web.archive.org/20090912082150/http://1978th.net:80/tokyocabinet/ to http://1978th.net/tokyocabinet/

When you have finished reviewing my changes, please set the checked parameter below to true to let others know.

Y An editor has reviewed this edit and fixed any errors that were found.

If you have discovered URLs which were erroneously considered dead by the bot, you can report them with this tool.
If you found an error with any archives or the URLs themselves, you can fix them with this tool.

Cheers.—^{cyberbot II}_{Talk to my owner:Online} 22:42, 9 January 2016 (UTC)[reply]

Terrible[edit]

I have more than twenty years experience as a developer, I have implemented AVL B-trees and coming to this article and reading it, I have absolutely *no idea* what a B+ tree is.

This is I have to say though my normal experience on the Wikipedia for anything scientific, mathematic or in computing; it is explained in a way that only someone who *already completely understands* can comprehend. In that light, this article is no worse and no better than all or almost all other such articles.

95.91.213.121 (talk) 11:02, 24 June 2017 (UTC)[reply]

I read your comment and it's terrible. After reading it, I am still absolutely no smarter about what's wrong with this article. If you're going to spend the time to write a comment, find something constructive to say. Or maybe even contribute to improving the articles. -- intgr [talk] 23:26, 28 June 2017 (UTC)[reply]

I have some computer experience also, and noticed that the diagram disagrees with the insertion pseudo-code. So, I came to this discussion page. Perhaps all that is needed to clarify this algorithm is an explanation of how to step through the pseudo code to search for a few keys, based on the tree shown in the diagram. (And, when that fails, replace the pseudo-code and/or the diagram. ;^)

I read your comment as well, and you described the exact opposite experience I had. I was able to understand what a B+ tree was after reading this article well enough to immediately then help a friend with their work on a related data structure. I'm going to go improve it a bit now too. Lcdrovers (talk) 05:23, 4 June 2022 (UTC)[reply]

Bug in Pseudo Code ?[edit]

Please double check me on this. The text introducing the Algorithms section suggests that a node has "d" children. The algorithm then suggests that the node contains keys k_0 .. k_d. That's d+1 keys. That suggests there are d+2 children. That's the first apparent bug.

The pseudo code then indicates that for k<=k_0 one should visit p0. That's reasonable, but if that's the case, then if k_{d-1} < k <= k_{d}, one should visit p_{d}. If that's the case, then if k_d < k, one should visit p_{d+1}. But that's not what the pseudocode suggests. The last pseudocode case suggests visiting and p_d, not p_{d+1}. That is the second apparent bug.

Please double check me on this. Jasonnet (talk) 07:50, 27 October 2020 (UTC)[reply]

Diagram[edit]

The diagram is incorrect (the top node has too small a range) and also confusing. I had to look at many other visualizations on the internet to understand B+ Trees.

There are a few other issues with the diagram, IMO:

The "next" pointers in the leaves being at the same level as the "data" slots make it look as though they are meant to be part of the "data" array
The alignment of the boxes, while not bad, don't make it as clear as it could be that the sub-trees are the intervals of keys between each parent key

There is a great visualization tool here, which corrects both of these flaws:

https://roy2220.github.io/bptree/visualization/#4,+1,+2,+3,+4,+5,+6,+7,+8,+9,+10

I recorded a GIF to which I think is much more informative, perhaps we could replace or add this to the page to help other learners in the future?

GavinRay97 (talk) 17:10, 8 January 2023 (UTC)[reply]

Hi, i posted the problem below. Since you visualized the algorithm with an even number for b (4) and I see a problem with odd values of b - maybe you could kindly verify or falsify my claim that the rules don't work for odd numbers. Thanks! 2.243.54.247 (talk) 06:08, 4 April 2023 (UTC)[reply]

Root-Leaf node size[edit]

In the example given with b=7 it seems to be impossible to construct a valid tree with 7 entries.

For tree sizes from 0-6 the root node can point directly to the records. Fine
For tree sizes >= 8 the data can be balanced between 2 or more leaf nodes.
For tree size 7 a single root node cannot have all records as children, so it cannot be the only node of the tree. So we need a "regular" root node with at least 2 children. As each child requires at least 4 children, we need at least 8 records to construct the leaf pages.

(This problem seems to arise for b entries whenever b is an odd number.)

Am I missing something or does one of the restrictions be modified a little to accommodate that case? 2.243.54.247 (talk) 06:05, 4 April 2023 (UTC)[reply]

[1] ttps://link.springer.com/article/10.1007%2FBF00288683

[1]