Talk:K-d tree

Kd-tree should be named kd-trie and vice versa

If the term kd-trie is accepted, then the current kd-tree article should be about a kd-trie and vice versa. Here's the rationale (this a copy from kd-trie's discussion page). The idea of a trie is to store strings (or finite sequences of any ordered set) such that symbols are shared between words with the same prefix. This delivers O(m) time complexity for insertion, deletion and searching, where m is the length of the word. Now consider the data structure in the kd-tree article. The first inserted point should be taken as an "empty word". Every point has an associated axis-aligned plane which contains it. The subsequent points will be classified by this plane as being on the "negative" or "positive" side (for any direction). Now if you insert another point, then it implicitly has an associated string with it that is given by traversal to its insertion place in the tree. If in the traversal you choose the negative sub-tree, then append "n" to the string, otherwise append "p" to the string. This way you can see that the points are associated with strings of the form "nppnpppnnnpnpnp" etc. Thus you can see that these associated strings form a trie. But they are special kinds of tries: every point is part of the set, and so every associated substring is contained in the word-set —Preceding unsigned comment added by Kaba3 (talk • contribs) 21:46, 16 January 2008 (UTC)[reply]

I would estimate that many people know both implementations as KD-Trees, the distinction between a trie and a tree is minor and can be pointed out in text. The discussion in kd-trie is not extensive by any means, and the first comment there "kd-trie is my own invention..." is somewhat concerningUser A1 (talk) 01:52, 17 January 2008 (UTC)[reply]

I agree. It would be better to embed the discussion in the kd-trie page to the kd-tree page. I hope someone does something, because currently the kd-trie term is misleading. I don't have the guts because I am a wikipedia-newbie and don't want to step on anyone's shoes:) --Kaba3 (talk) 19:52, 18 January 2008 (UTC)[reply]

I don't particularly agree that a kd-tree is a trie in this sense. It's the same as claiming a binary-tree is a trie since you can form a string of l's and r's representing whether you went left or right to find the node. If those directions had some greater meaning then it might be useful. Anyhow bdodson (talk) 08:47, 7 December 2008 (UTC)[reply]

Contradiction in text?

Doesn't the statement "In addition, every node of a kd-tree, from the root to the leaves, stores a point" contradict the end of the first paragraph "kd-tries are a variant that store data only in leaf nodes" ? I'd change it, but I know almost nothing about the topic. Also, while you are at it, "point is infite" should be "point is infinite." Nice article. It helped me a lot.

There are at least two different definitions of kd-tree. The most popular one calls for storing points in the intermediate nodes. Preparata/Shamos book gives that definition. However, de Berg/van Kreveld/Overmars/Schwarzkopf book gives an alternative definiton, where points are stored in the leaves only. In the latter variant the splitting planes are still chosen as medians and point (or points) lying on the plane are simply forced into one of the two partitions. —The preceding unsigned comment was added by 198.182.56.5 (talk) 01:31, 28 April 2007 (UTC).[reply]

Yes, I think there is a contradiction too. I left a message at the kd-trie article's discussion page with a rationale, but haven't got a response yet. In short, my opinion is that the current kd-tree should be called a kd-trie, and the current kd-trie a kd-tree. --Kaba3 (talk) 09:43, 24 November 2007 (UTC)[reply]

definitions are contradictory. But both are valid

Picture Included

The picture included really isn't the best choice for demonstrating a kd-tree to someone who isn't familiar with it. has a much better picture, for instance

The problem I have with the picture (we're talking about the pic at the top, yes?) is that the vertices of the splitting planes appear, visually speaking, to be tree nodes, when in fact that is not correct. And the tree nodes themselves are not represented. wwheeler 18 January 2007

Go for it. I made the existing one but agree that it's not too explanatory. Btyner 05:17, 20 November 2005 (UTC)[reply]

I also made an SVG version of this new one, but for some reason, commons didn't recognize it as such. Btyner 22:21, 21 February 2006 (UTC)[reply]

Also, some psuedo code for construction, insertion, deletion, nearest neighbor, etc. would be very useful. --Numsgil 13:07, 13 October 2005 (UTC)[reply]

Python?

Does the Python version of the pseudocode really add anything? It seems like it is basically a restatement of the pseudocode with a slightly different (less clear) syntax. Do we really need both? Neilc 22:37, 15 April 2006 (UTC)[reply]

I personally found it useful to have an executable version to try things out on and improve my understanding. That's also why I wrote the code that produced that 2D results picture. Granted, this page may not be the best place for an implementation, though I don't know a better place for such a tiny snippet. KiwiSunset 04:51, 19 April 2006 (UTC)[reply]

Content

Ok, so I can understand that balancing a kd tree takes care, but how do I actually do it? This article really needs someone with that knowledge to add that information in. ACiD2

I've been working on tracking down the existing work done by Overmars on this topic, which is very similar to AVL trees. I might end up just re-deriving it, getting some review, and giving the citation as the source of the idea. If anyone has access to the pertinent journal article, it would be very helpful. There was also a technical report published. M. H. Overmars and J. van Leeuwen, Dynamic multi-dimensional data structures based on quad- and k-d trees, Acta Informatica 17, 3(1982), 267-285.

Search complexity?

The articles contains no information about the complexity of searching a kd-tree for an exact nearest neighbor?

The algorithm analyzed in the following paper is fairly generic (it uses region queries), but does apply to kd-trees. I don't believe there's an existing article about the algorithm itself, but it would be nice to have one. Cleary, J. G. 1979. Analysis of an Algorithm for Finding Nearest Neighbors in Euclidean Space. ACM Trans. Math. Softw. 5, 2 (Jun. 1979), 183-192. DOI= http://doi.acm.org/10.1145/355826.355832

Adding elements?

Doesnt adding elements in the method described lead to an unbalanced tree? If this is so then i think it should be made explicitly clear, and a rebalancing algorithm (or a way to preserve balance) should be given User A1 07:21, 1 December 2006 (UTC)[reply]

kd-Tree is a static data structure, so yeah adding an element will introduce disorder. Rebalancing it is perhaps outside the realms of the basic description. IlyaTheMuromets 22:45, 8 Jan 2007 (EST)

Nearest neighbour

Hello,

I added the NN algorithm, but i have hidden it in comments as it was a little off the cuff. Feel free to fix it, i will go over it again later.

Thanks User A1 12:54, 2 December 2006 (UTC)[reply]

I don't suppose you can clarify this section, I find the way it is explained higly confusing (Not that confusing me is particularly hard). For example 'the tree is searched in a depth-first fashion, refining the nearest distance' this should be put in a more understandable way, such what do you mean by depth-first, refine the nearest distance how, and what do you mean by nearest difference. We need to assume the readers are not programmers, as they most likely aren't (though however, I am). --Chase-san (talk) 10:34, 11 January 2008 (UTC)[reply]

A year later and someone replies! I made some very minor changes, (wikified depth-first and hypersphere) later I will add a reference and look into making an animation. User A1 (talk) 13:31, 11 January 2008 (UTC)[reply]

I'm not sure that the proposed algorithm is correct. The explanation gives us an idea how the algorithm works but no reassurance that it is correct and runs in logarithmic time. It seems to me that nearest neighbour with an Euclidean (L2) metric given any Kd-tree produced by a process which only considers the orders defined by the points x and y coordinates respectively is

\Omega (n^{0.5})

, where n is the number of points. Consider the following argument. Let n-1 >= 4 be square and place

{\sqrt {n-1}}

times

{\sqrt {n-1}}-1

points on coordinates (x,y) for integers

0\leq x<{\sqrt {n-1}},0<y<{\sqrt {n-1}}

. Place n points on coordinates

(i,n-{\sqrt {n^{4}-i^{2}}}-e_{i})

for

0\leq i<{\sqrt {n-1}}

, with e_i = 0 except for k, for which it is simply small enough but greater than zero. Place one final point on (0, -n^2). For any given Kd-tree on this instance produced by the above methods, every other choice of k will produce the same order and therefore also the same Kd-tree. Evaluating the above equation, k will be the point closest to (0, -n^2). For any given Kd-tree of this kind, and for any deterministic algorithm inspecting at most sqrt(n)-2 of the points on the bottom row, the algorithm must fail for at least one instance. It follows that nearest neighbour given a Kd-tree produced by order alone is

\Omega (n^{0.5})

. Is it also

O(n^{0.5})

? — C. lorenz (talk) 19:35, 25 September 2008 (UTC)[reply]

Have a quick look at "An introductory tutorial on kd-trees" by Andrew Moore. To be honest I have not taken the time to understand your comment at this time, though I intend to at some time. User A1 (talk) 23:45, 25 September 2008 (UTC)[reply]

The log(n) comes from the structure of the tree, operating under the assumption of a spatially random distribution of points (there do exist scenarios where the worst case running time is considerably worse than log(n) -- the scenario where the NNs are almost at the same radius from the search point.User A1 (talk) 23:52, 25 September 2008 (UTC)[reply]

I'd buy average-case log n under the uniform distribution but it's important to make that distinction. Would you clarify this and do the same for the section on Kd-trees under Nearest_neighbor_search? Perhaps a reference to e.g. this tutorial? Thanks for the quick reply! — C. lorenz (talk) 06:15, 26 September 2008 (UTC)[reply]

Hi all. I rewrote the algorithm description to hopefully be much clearer. I know it was a tricky algorithm for me to understand when I implemented it. bdodson (talk) 08:41, 7 December 2008 (UTC)[reply]

Hi, I was trying really hard to read through the algorithm description, but I'm finding it really confusing. Is there any way we could get some psuedocode on this? It would be nice to have the implementation of the kd-tree (already in psuedocode and is really helpful) and then how to find neighbors to see the power of kd-trees.

Missing title in title bar

The title of this article does not seem to be showing up in my browser title bar. Is this just me, and if not any suggested soultions?

MadScientistVX 05:09, 23 February 2007 (UTC)[reply]

Odd. I second that, firefox 1.5.0.7 linux x86, despite the presence of <title>Kd-tree - Wikipedia, the free encyclopedia</title> User A1 15:23, 23 February 2007 (UTC)[reply]

Removing elements

This piece is not really clear to me, is someone could clarify it a bit, that would be excellent! Laurens 11:07, 27 April 2007 (UTC)[reply]

Yes, "removing the invariant" is quite unclear. The point that they wish to make is that when deleting a point from the tree, one has to ensure that sub-points are merged back into the original tree in a sensible fashion, ie one that doesn't break the spatial nature of the tree itself.

There are several possible techniques for doing this. Firstly one may simply mark the element that one wishes to delete with some form of ignore flag, such that this node no longer contributes to the algorithms that are run over the tree. This has the nice effect that it is fast to perform on any given node, but the downside is that the tree still has that node in it, so it is still taking up computational space. Secondly one could unlink the node, which we can call node "A" then generate a tree using the subnodes of "A" then link its children back into the tree, this has a higher up front computational cost, but removes "A" from the tree. Also this method may result in unbalanced trees, such that search algorithms run past it may perform in a non-optimal fashion. Finally one could completely destroy the tree and rebuild it, but without using "A". This costs the most computationally, but will produce an optimal (in some sense) tree.

I haven't included this into the article as the bit about linking and unlinking is intentionally vague, as i never actually implemented this when i wrote a KD tree algorithm for NN searching. User A1 13:58, 27 April 2007 (UTC)[reply]

Thanks, that helps a bit :) Laurens 15:07, 27 April 2007 (UTC)[reply]

Bug in query function?

It seems to me that the first d levels in the tree won't be checked against the bounding box in all dimensions. So querying for points in ([6,8], [3,4]) in the example tree would return (7, 2) even though 2 is not in [3,4]. Or have I missed something? 206.248.181.57 (talk) 23:07, 17 November 2007 (UTC)[reply]

No you havent, that algorithm should work if tree definition would be that nodes can't store values - only leafs. You have to check explicitly for boudaries - or you'll get false positives. —Preceding unsigned comment added by 149.156.124.14 (talk) 23:28, 15 January 2008 (UTC)[reply]

Sample code in Ruby

class for generating trees

class KDTree

 attr_reader :root # root node
 attr_reader :points # search results for ortagonal search range

 # creates new tree from list of points - with 'dim' dimensions
 def initialize(points, dim)
   @dim = dim
   @root = KDNode.new(dim).parase(points)
 end

 # adds node to tree - creates unbalance
 def add_node(point)
   @root.add(point)
 end

 def find(*range)
   @points = []
   query(range, @root)
   @points
 end

 def query(range, node)
   axis = node.axis
   median = node.location[axis]

   if node.left && (median >= range[axis].begin)
     query(range, node.left);  #/* Skip left if max of left tree (median) is out of range */
   end
   if node.right && (median <= range[axis].end)
     query(range, node.right); #/* Skip right if min of right tree (median) is out of range */
   end
   if (0..@dim-1).all?{|ax| range[ax].include? node.location[ax]} # check if all constrains are fullfilled
     @points << node.location;
   end
 end

 #print tree
 def print
   @root.print
 end

end

class KDNode

 attr_reader :left, :right
 attr_reader :location

 attr_reader :axis

 def initialize(dim, location=nil, left=nil, right=nil)
   @dim = dim
   @location = location
   @left = left
   @right = right
 end

 # node parasing
 def parase(points, depth = 0)
   @axis = depth % @dim

   points = points.sort_by{|point| point[@axis]} # that's why this algorithm isn't o(nlog(n))
   half = points.length / 2 # simplest median choosing

   @location = points[half]
   @left = KDNode.new(@dim).parase(points[0..half-1], depth+1) unless half < 1
   @right = KDNode.new(@dim).parase(points[half+1..-1], depth+1) unless half+1 >= points.length
   self
 end

 def add(point)
   if @location[@axis] < point[@axis]
     @left ? @left.add(point) : @left = KDNode.new(point)
   else
     @right ? @right.add(point) : @right = KDNode.new(point)
   end
 end

 def remove
   self.parse(@left.to_a + @right.to_a, @axis)
 end

 def to_a
   @left.to_a + [@location] + @right.to_a
 end

 def print(l=0)
   @left.print(l+1) if @left
   puts("  "*l + @location.inspect)
   @right.print(l+1) if @right
 end

end

a = []

10.times do |x|

 10.times do |y|
   10.times do |z|
     a << [x, y, z]
   end
 end

end

tree = KDTree.new(a, 3) tree.print

puts " -------------- "

tree.find([0,4], [1,4], [5,7])

pp tree.points.sort_by{|po| po[0]} —Preceding unsigned comment added by 149.156.124.14 (talk) 23:37, 15 January 2008 (UTC)[reply]

A Reference to a Generic Java Implementation of the KD Tree Needed

Recently I posted a link to a Java implementation of the KD tree which uses the new Java 5 construct of generics (similar to templates in C++). The link was as follows:

Savarese Algorithms

However the addition was removed with the comment that the contributor (Savarese, not myself) was "just an individual" and that although the library was open source that apparently it was somehow not open enough to merit mentioning on the wikipedia page.

Having done a fair bit of research on the web I can attest that Generic Java implementations of KDTrees are not well represented and that, all other concerns aside, the Savarese implementation is one of unusually high quality (modern implementation, full JUnit tests, consistent code quality). I, as a Java programmer, would find a reference to a implementation in my language of choice most useful, as I imagine others would.

So my question is this, how is it possible to get a reference on the Wikipedia KD Tree page to a high quality Java KD Tree implementation?

Would it be necessary for me to do a clean room implementation of a KD Tree using Generics then I suppose get some sanctioning body to accept it so that it won't just be "some guy" implementing a KD Tree? If so it seems rather a high bar and not really in the spirit of individual contributors as I thought Wikipedia to be.

As a side note, and hardly fully definitive, the crowd seems to give a nod to the Savarese implementation as it is the first google hit if you search "KD Tree Java". —Preceding unsigned comment added by Ltburch (talk • contribs) 17:35, 21 April 2008 (UTC)[reply]

bug in python code?

I think this is wrong

   node.leftChild = kdtree(pointList[0:median-1], depth+1)
   node.rightChild = kdtree(pointList[median+1:], depth+1)

and should be

   node.leftChild = kdtree(pointList[0:median], depth+1)
   node.rightChild = kdtree(pointList[median:], depth+1)

 —Preceding unsigned comment added by LodeLeroy (talk • contribs) 14:41, 5 August 2008 (UTC)[reply]

Bug correction in python code.

The slicing notation in the python code would lead to losing a predecessor to the median. --Revpj (talk) 03:51, 10 August 2008 (UTC)[reply]

Balancing

Hey. I'd be happy to write up about rebalancing the tree if I knew how it's done or I could find a paper on it. At the very least we should have a reference to something. I believe it's possible, but I can't find anything easily. If you can point me to a paper, that would be great. bdodson (talk) 08:44, 7 December 2008 (UTC)[reply]