Talk:Disjoint-set data structure

Computing Start‑class

	This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.ComputingWikipedia:WikiProject ComputingTemplate:WikiProject ComputingComputing articles
Start	This article has been rated as Start-class on Wikipedia's content assessment scale.
???	This article has not yet received a rating on the project's importance scale.

Computer science Start‑class High‑importance

This article is within the scope of WikiProject Computer science, a collaborative effort to improve the coverage of Computer science related articles on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.Computer scienceWikipedia:WikiProject Computer scienceTemplate:WikiProject Computer scienceComputer science articles

Start

This article has been rated as Start-class on Wikipedia's content assessment scale.

High

This article has been rated as High-importance on the project's importance scale.

Things you can help WikiProject Computer science with:

Here are some tasks awaiting attention:

Article requests :
- Requested articles/Applied arts and sciences/Computer science, computing, and Internet
Cleanup :
- Computer science articles needing attention
- Computer science articles needing expert attention
Copyedit :
- Computing
Expand :
- Computer science
Infobox :
- Computer science articles without infoboxes
Maintain :
- Timeline of computing 2020–present
Photo :
- Find pictures for the biographies of computer scientists (see List of computer scientists)
- Computing articles needing images
Stubs :
- Computer science stubs
Unreferenced :
- WikiProject Computer science/Unreferenced BLPs
Project-related :
- Tag all relevant articles in Category:Computer science and sub-categories with {{WikiProject Computer science}}

disjoint set

This is also known as MF-Set (Merge-Find-Set), isn't it?--216.244.232.1 03:37, 18 Oct 2004 (UTC)

I was not aware of this. Thanks. I've added this information and the appropriate redirects (merge-find set, merge-find-set, merge-find data structure). Derrick Coetzee 05:45, 18 Oct 2004 (UTC)

Some references

The first description of the disjoint-sets forest idea is found in:

Bernard A. Galler and Michael J. Fisher. "An improved equivalence algorithm", Communications of the ACM, Volume 7, Issue 5 (May 1964), pages 301-303.

Further reference:

Zvi Galil and Giuseppe F. Italiano. "Data structures and algorithms for disjoint set union problems", ACM Computing Surveys, Volume 23, Issue 3 (September 1991), pages 319-344.

Thanks for the references. I'll add them. Deco 04:10, 24 December 2005 (UTC)[reply]

Does the source code belong here?

While the source code is no doubt useful, it's not really commented on that much in the rest of the entry. Might it belong in Wikisource? --Jacj 14:21, 8 May 2006 (UTC)[reply]

some notes

wouldn't the Fredman and Saks result need a reference? (at the end of the "Disjoint-set forests" section)

also, isn't this the same algorithm that physicists know as Hoshen-Kopelman? I will try to add that in some appropriate way later.

Disjoint-set forests algorithm complexity

On the page http://en.wikipedia.org/wiki/Big_O_notation#Common_orders_of_functions it says that the complexity of the The find algorithm of Hopcroft and Ullman on a disjoint set is iterated logarithmic. Iterated logarithmic is the inverse of Tetration. On this page at the end of the chapter Disjoint-set forests algorithm complexity however it says that the complexity is the inverse of the Ackerman function.

It also says on this page that the inverse of the Ackerman function is less than 5 for all practical n, while on the Iterated_logarithm page, it says the inverse of Tetration (= the iterated logarithm) is less than 5 for all practical n.

Tetration and Ackerman are related but not the same.

What is correct and what isn't?

Lodev 15:10, 10 September 2007 (UTC)[reply]

Both are correct, in that both are proven upper bounds for the runtime of these operations. The inverse Ackermann function grows more slowly than the iterated logarithm, so it's a better bound. This is already described briefly in the History section of this article. Dcoetzee 18:24, 10 September 2007 (UTC)[reply]

Hantao Zhang says that the time is linear

This paper of Hantao Zhang [1], a CS professor at the University of Iowa, claims that the bound given in the Wikipedia article is not the best possible, and that a more careful analysis shows the algorithm to have linear worst-case time. I am following up with Zhang about the provenance of the paper, and whether it was published. -- Dominus (talk) 23:58, 21 January 2008 (UTC)[reply]

I was going to say that there's a proven lower bound, but the paper addresses this. However, I'm not confident that this paper, which is still very new, has been thoroughly peer-reviewed. I'd like to see it published in a conference or journal or at least a workshop first. Dcoetzee 03:28, 22 January 2008 (UTC)[reply]

It has not yet been published or peer-reviewed. It is under submission to STOC 2008. Zhang had not been aware of the Fredman and Saks thing, although he had been aware of similar work that proves the same result. I will post an update if I learn anything new. -- Dominus (talk) 23:51, 28 January 2008 (UTC)[reply]

The paper was rejected from STOC 2008. -- Dominus (talk) 15:15, 27 February 2008 (UTC)[reply]

Is there any indication of why the paper was rejected -- did the reviewers find mistakes, or did they simply not think it was important enough? 24.80.10.198 (talk) 03:28, 14 April 2008 (UTC)[reply]

I suppose there were mistakes in it, since Zhang has removed it from his website. Does someone remember the title of the paper? Adrianwn (talk) 09:36, 18 April 2008 (UTC)[reply]

The title was The Union-Find Problem Is Linear. I still have a copy. -- Dominus (talk) 13:56, 18 August 2008 (UTC)[reply]

Shorten Applications section

I will shorten the Applications section; in particular I will remove all subsections except the one related to undirected Graphs. Rationale: Wikipedia is not a textbook, it is also not a programming manual or guide. Books or websites about algorithms or programming are better suited for such descriptions; however, references to them would be appropriate. Adrianwn (talk) 09:50, 18 April 2008 (UTC)[reply]

I disagree with this edit and with your interpretation of "Wikipedia is not a textbook". The policy page goes on to say: "It is not appropriate to create or edit articles which read as textbooks, with leading questions and step-by-step problem solutions as examples." This seems to imply that this has more to do with style than content; applications of a data structure are certainly relevant to the data structure, just as carbon dioxide describes industrial uses of that chemical in great detail. Dcoetzee 20:57, 18 April 2008 (UTC)[reply]

You are right, the applications of this data structure should be listed. However, the Applications section contained a lot of those step-by-step instructions. Could you please rewrite that section in a shorter, more concise way? I would do it myself if I understood more about these specific subjects. Adrianwn (talk) 07:20, 19 April 2008 (UTC)[reply]

You're right, it ought to summarize the applications rather than detail the application algorithms, but the point is to describe the mapping of various application domain events onto disjoint-set operations. I'll take a look at it again, at least the ones that I wrote. Dcoetzee 06:06, 30 April 2008 (UTC)[reply]

Also, "weighted union heuristic" is a term taken directly from CLRS, and it's used in many publications; you would have found many more Google hits if you tried searching for it with the hyphen ("weighted-union heuristic"). For this reason I've added it back. Dcoetzee 21:02, 18 April 2008 (UTC)[reply]

Nope, still gives about 200 hits (Google doesn't care much about hyphens), and only 11 hits in Google Scholar. Also, as far as I remember, CLRS never calls it "the" weighted-union heuristic, but rather "a" weighted-union heuristic (i.e., a description of the method instead of a name for it). What do you think about: "When the length of each list is tracked, the time needed can be ameliorated by always appending the smaller list to the longer. Using this "weighted-union heuristic", a sequence of [...]" ? Adrianwn (talk) 06:59, 19 April 2008 (UTC)[reply]

Sure, sounds fine. Dcoetzee 06:08, 30 April 2008 (UTC)[reply]

If step by step instructions have been removed from the article, maybe they could be transferred to wikibooks and linked to from here. —Preceding unsigned comment added by 207.241.238.233 (talk) 00:08, 30 April 2008 (UTC)[reply]

molecular dynamics

Why is this citation in the reference section? The article doesn't appear to actually refer to it.

J. A. Spirko and A. P. Hickman, Molecular-dynamics simulations of collisions of Ne with La@C82, Phys. Rev. A 57, 3674–3682 (1998). —Preceding unsigned comment added by 207.241.238.233 (talk) 00:06, 30 April 2008 (UTC)[reply]

Path compression algorithm

The find algorithm listed under path compression is the same one as the normal find, it does not include path compression at all. I figure it should be more like the following:

 function Find(x)
     if x.parent == x
        return x
     else
        l = [ x ]
        x = x.parent
        while ( x != x.parent)
           l += x
           x = x.parent
        for y in l
           y.parent = x
        return x

Wppds (talk) 08:02, 20 October 2008 (UTC)[reply]

No, it isn't the same one and yes, it does perform path compression. The difference is in the else-part of the if-statement:

 x.parent := Find(x.parent)

Once Find() terminates, it returns the root node. This root node then becomes the new parent of each traversed node. Adrianwn (talk) 15:35, 20 October 2008 (UTC)[reply]

Right, strange that I missed that. I'll delete this comment in a few days then.Wppds (talk) 09:01, 5 January 2009 (UTC)[reply]

Why? It is neither necessary nor customary to delete comments from a talk page. Often this will be reverted anyway; sometimes it might even be considered vandalism. Don't worry, we all make a mistake sooner or later :-) Adrianwn (talk) 12:38, 5 January 2009 (UTC)[reply]

Heh I figured they were to be cleaned up at some point. Did read the guidelines now, archivable then, when this gets too long. Wppds (talk) 16:08, 7 February 2009 (UTC)[reply]

first bullet

in the first section, it says # Find: Determine which set a particular element is in. Also useful for determining if two elements are in the same set. But if this a disjoint set, how can an element be in two sets? —Preceding unsigned comment added by 24.1.30.246 (talk) 16:18, 7 April 2009 (UTC)[reply]

You misread it - it says "determine if two elements are in the same set", not "determine if two sets both contain the same element". Dcoetzee 22:57, 7 April 2009 (UTC)[reply]

Rank Heuristic algorithm error?

Oh gosh, I'm on a roll of stupid edits... Sorry for the noise ;-)

Complexity

The quote from the paper is unclear: is the Ackerman complexity the least an algorithm must use (a lower bound), or is it an upper-bound (the truly optimal complexity is no more than the Ackerman complexity)? --Gwern (contribs) 01:36 11 January 2010 (GMT)

I think it means that the Ackermann function is an upper bound for all inputs, and a lower bound for some inputs (i.e., the worst case). — Adrianwn (talk) 07:26, 11 January 2010 (UTC)[reply]

How is Union defined ?

Is it Union(x,y) and x,y are representatives ? Cormen writes they are not. I am confused. The difference is whether we have to find the roots or not. Webskipper (talk) 15:58, 21 May 2010 (UTC)[reply]

It depends on how Union(x,y) is defined; the pseudocode in this article does not assume that x and y are representatives. – Adrianwn (talk) 16:22, 21 May 2010 (UTC)[reply]

Error in the implementation

The implementation listed as Implementation of Disjoint-set Forests in C++, by Bo Tian seems not to update the path (it dont do path compression) which is the hole point. — Preceding unsigned comment added by 85.164.124.173 (talk) 17:57, 20 July 2011 (UTC)[reply]

I've removed the C++ links, as the code was poor quality. Follow the pseudocode in the article instead. The Python implementation also looks reasonably okay. If you're looking to do this in C++ in a production setting you should be using the Boost implementation instead. Dcoetzee 04:46, 21 July 2011 (UTC)[reply]

Linked lists

The article claims that concatenation of linked lists in constant time. This is not so, you have to append the second list to the tail of the first one, which requires O(|L1|) where L1 is the first list.

To get such a limit, we need more, such as a pointer to the end of our lists. These are not strictly "linked-lists" and I suggest that this is mentioned.

STL (for C++) provides "splice" on lists which is O(1). To support this, lists are actually doubly linked lists, and for instance in GNU C++ (libstdc++), they are even circular doubly-linked list with a direct access to the last element.

Akim Demaille (talk) 13:33, 23 October 2011 (UTC)[reply]

Keeping a tail pointer for a linked list is a trivial extension and extremely common in practice (I'd argue, more common than not keeping such a tail pointer). I really don't think this requires any further explanation. It is true that STL splice is useful for implementing list merging in C++, but that's an implementation detail and not terribly relevant here. Dcoetzee 20:10, 24 October 2011 (UTC)[reply]

I see another error, or at least a significant omission regarding linked lists. The article claims that by adding a pointer back to head on each node, find can be implemented in constant time, but doesn't explain how this is possible. Find on an linked list typically requires visiting every element, and I don't see how pointers back to head on each node would change this. 75.21.70.205 (talk) 23:02, 25 October 2011 (UTC)[reply]

You're confusing the linked list "find" operation with the disjoint-set data structure "find" operation. The role of the first is to find a list element with a particular value. The role of the second is, given an element, to find its set representative - which we have chosen to be the head of the list that element belongs to. I've edited to attempt to clarify this. Let me know if this can be improved further. Dcoetzee 01:41, 26 October 2011 (UTC)[reply]

Complexity of path compression

If I use only path compression without height/rank comparing, what is the complexity of the algorithm? I remember that somebody told me that it would be O(log n) but I'm not sure. It would be nice if the article provide this questionable complexity. --Nullzero (talk) 05:51, 12 February 2013 (UTC)[reply]

Run time of MakeSet in Disjoint-set forests

Maybe I'm missing something very simple, but why it said that the run time of MakeSet in the Disjoint-set forests implementation is O(log(n))? Shouldn't it supposed to be O(1)?
Thanks — Preceding unsigned comment added by 79.177.179.58 (talk) 11:00, 14 January 2016 (UTC)[reply]

Fixed. Glrx (talk) 02:54, 16 January 2016 (UTC)[reply]

Can we add a box with the asymptotic runtimes, like some of the other data structures have?