Talk:Hopcroft–Karp algorithm

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Pseudocode[edit]

Is it just me, or is the pseudocode rather hard to read? For example, the first loop tripped me a bit, because it is some akward mixture of high-level functions ((u,v) in E) and low-level implementation (for node u, node v). Furthermore, the code checks for nodes being in M. I do not understand this, as M is a matching, and thus, a set of edges? --Tetha (talk) 05:49, 11 June 2008 (UTC)[reply]

M is a matching. When the code asks whether a vertex is in M, it means, is it an endpoint of one of the edges in M? I don't think it's any shorter or clearer than what's here, but if it would help you to see actual code for this algorithm, I have some here. (You can ignore the "imperfections" routine in that file; it solves a different and somewhat more complicated problem.) —David Eppstein (talk) 06:32, 11 June 2008 (UTC)[reply]
At least for me, the python code helps a lot. My definition of a matching came from graph theory, where a matching was just a set of edges that encode the matching (that is, an edge (u,v) in M means: u is matched with v). Your matching is a mapping that maps a node u to the node v, with u being matched with v. In your world, it makes a lot of sense to ask: node in matching, in mine, it did not (I rather would ask: exist an x with (x, u) in M.). Thank you. --Tetha (talk) 09:39, 11 June 2008 (UTC)[reply]

LIFO queue?[edit]

Page says:

We also maintain a queue(last in first out datastructure)

Last in, first out is a stack, no? —Preceding unsigned comment added by Talex (talkcontribs) 11:40, 20 July 2008 (UTC)[reply]

Queue is FIFO, stack is LIFO...I seem to be strange. 210.122.65.2 (talk) 07:50, 18 August 2008 (UTC)[reply]

Incorrect pseudo-code[edit]

I am sorry that I was trigger-happy with the main article. Only after I have completed my edit did I realize that it should have been discussed on this forum first. (First time to edit a Wikipedia page).

Did not touch the pseudo-code itself, only the part that explains it. It may take me much longer to write a readable and correct version of the pseudo-code.

The pseudo code is not complicated without a reason, it is incorrect. I realized it after I have implemented the algorithm only to see that in some cases it requires almost |V| iterations (rather than the promised ).

The main problem with the pseudo code (other than clarity) is that the BFS stops immediately when it discovers an augmenting path. The is against the way the original algorithm works, when it generates a maximal number of augmenting paths each time.

The proof that only iterations are needed is based on the premises that each BFS iteration will return strictly longer augmenting paths. The current pseudo-code does not work this way.

For example, consider a graph with edges (divided into n subsets):

  • (u01, v01) (u01, v02) (u02, v01)
  • (u11, v11) (u11, v12) (u12, v11)

...

  • (u_n1, v_n1) (u_n1, v_n2) (u_n2, v_n1)

With initial matching marked in bold The pseudo-code will require n iterations, when the real algorithm requires only one. Michael Veksler (talk) 23:09, 9 October 2008 (UTC)[reply]

Copied the suggested pseudo-code to the main page. Removed it from here to avoid redundancy of a big piece of information. Michael Veksler (talk) 07:21, 11 October 2008 (UTC)[reply]

The pseudo-code was introduced byMichael Veksler (talk) 01:31, 10 October 2008 (UTC)[reply]

Minor fixes to the previous version of the algorithm (also by me) Michael Veksler (talk) 06:39, 10 October 2008 (UTC)[reply]

This pseudo-code is huge. Maybe it should be rewritten in python as Tetha (talk) has suggested. Unfortunately, I have never written more than few lines in python at a time.

Log: Using u.matching instead of M throughout the algorithm. This gives the right sense of the O(1) nature of operations on M. Michael Veksler (talk) 08:30, 10 October 2008 (UTC)[reply]

Log: Put constraints over "for all" elements at the "for all" line, rather than on a separate if line.

I have implemented the pseudo-code and it does not seem to be working (Kursancew) —Preceding unsigned comment added by 189.95.58.234 (talk) 15:38, 25 November 2008 (UTC)[reply]

I regret that this pseudo-code came out to be as huge as it is. Maybe it will be possible to refine it into something saner than what I could come up at the time. I have a perl implementation of the algorithm that works, but its readability is poor. If anyone wants to have a look at that perl implementation I will be happy to mail or publicize this implementation. Michael Veksler (talk) 09:34, 16 December 2008 (UTC)[reply]

Terminology: augmenting and alternating paths[edit]

It would be nice if the definitions of augmenting/alternating paths in Matching#Definition and Hopcroft–Karp algorithm#Alternating paths matched. — Miym (talk) 23:04, 5 March 2009 (UTC)[reply]

It turns out that the Matching article's terminology matches the terminology in Ahuja, Magnanti, and Orlin. So I'll fix up this article to match. —David Eppstein (talk) 23:19, 5 March 2009 (UTC)[reply]

Weighted graphs?[edit]

I read some papers that, for a maximum weighted bipartite graph matching, claim they use Hopcroft-Karp instead of the Hungarian algorithm, since it's O(n^2.5) instead of O(n^3). See for example "Tracking Across Multiple Cameras With Disjoint Views" (Javed, Shah, 2003). Are these guys just clueless people trying to make it seem like they're using the new "hip" algorithm that is faster instead of the "old" Hungarian algorithm? Hopcroft-Karp has nothing to do with weights. Or am I missing something? This can be considered a widely referenced paper in Computer Vision. Jotaf (talk) 18:22, 31 August 2010 (UTC)[reply]

If you're missing something, so am I; I don't know of a weighted version of Hopcroft-Karp. —David Eppstein (talk) 19:33, 31 August 2010 (UTC)[reply]

Pseudo-code conflicts with the algorithm[edit]

The pseudo-code DFS from vertices that has a Dist of 0, which starts a path, whereas the algorithm states that we should begin from vertices that ends a path. According to Dinic Algorithm, this corresponds to BFS from the sink and DFS from the source. I am not sure if this will make a difference to the time complexity. — Preceding unsigned comment added by Tonyxty (talkcontribs) 02:04, 14 April 2011 (UTC)[reply]

BFS should stop as soon as it reaches NIL[edit]

It's been several hours since I implemented this algorithm. This pseudocode is really helpful but...

BFS function does not do what it is supposed to do. As soon as it reaches NIL, it should clear queue Q and return true. Then there's no need to check NIL distance at the end of the function body, false should be returned. Now algorithm proceeds, causing (not asymptotically significant) time loss and maybe strange behavior on undefined Adj[NIL]. 80.188.71.176 (talk) 02:10, 16 January 2012 (UTC)[reply]

"Analysis"[edit]

"It can be shown that each phase increases the length of the shortest augmenting path by at least one: the phase finds a maximal set of augmenting paths of the given length, so any remaining augmenting path must be longer."

This is not that simple. In fact, showing that the length of the shortest augmenting path increases is perhaps the most difficult part of the entire proof. The error comes down to the distinction between "maximal" and "maximum." There may be other paths of the same length, but just not ones which are vertex-disjoint from these ones. 68.61.30.23 (talk) 16:48, 21 April 2013 (UTC)[reply]

Example Python code[edit]

def match_hopcroft_karp(adjLR):
    """Find maximum matching in a bipartite graph L u R specified by its L->R adjacency matrix which maps L to an iterable in R."""

    matchLR={};matchRL={};lev={}# Initialise 'globals' used by bfs, dfs and main body

    def bfs():
        L=set([l for l in adjLR if l not in matchLR])
        lev.clear();depth=0;seen=set()
        # Build each level
        while len(L)>0:
            for l in L: lev[l]=depth
            L1=set()
            for l in L:
                for r in adjLR[l]:
                    if r not in seen:
                        # l->r must be unmatched now, since 'seen' prevents return to l's parent on right
                        seen.add(r)
                        if r not in matchRL: return True
                        L1.add(matchRL[r])
            depth+=1;L=L1
        return False

    def dfs(l,targl):
        if lev.get(l,None)!=targl: return False
        lev[l]=None
        for r in adjLR[l]:
            if r not in matchRL or dfs(matchRL[r],targl+1): matchLR[l]=r;matchRL[r]=l;return True
        return False

    nmatch=0
    while bfs():
        for l in adjLR:
            if l not in matchLR and dfs(l,0): nmatch+=1
    return (nmatch,matchLR,matchRL)

if __name__ == '__main__':
    adj={0:{'a','b','c'}, 1:{'d'}, 2:{'a','c','e'}, 3:{'d','e'}, 4:{'d'}}
    print match_hopcroft_karp(adj)

Placeholder edit to start conversation with David Eppstein. (It's past my bedtime now.) I've reordered the statement 'lev[l]=None' because I think it's clearer that way, though equivalent. Each vertex gets marked as used as soon as it is processed. I think this means that dfs() can only be called at most 2m times in a given phase, once for each edge direction. (Under the normal assumption that the graph is connected, otherwise we'd have to say 2m+n.) I'm sorry if my reverts were rude - I'm not very familiar with Wikipedia editing and the best way to converse with other editors.Alex Selby (talk) 05:18, 15 April 2017 (UTC)[reply]

My general feeling is that we should not have example code in Wikipedia at all, except possibly for very simple algorithms. It is too hard to verify and too easy to make big mistakes, too much original research (unless inappropriately copied from somewhere else) and too far from something that will actually assist human readers in understanding the subject. I think my earlier objections (that because this doesn't make each individual dfs call fast, it doesn't meet the advertised time bound) may have been mistaken, but I think the bigger point is still valid. —David Eppstein (talk) 07:35, 15 April 2017 (UTC)[reply]
Ah, that is a more fundamental objection than I expected. From my point of view, I find example code a very useful adjunct in understanding a concept (alongside an informal description which is also very helpful), and judging by the comments here I am not the only one. Compared with an informal description, where a turn of phrase can hide subtle difficulties, actual code is at least unambiguous. I think Python code has both the advantages of pseudocode, in that it reads more-or-less like English, and also the advantages of a formal system in that it follows an unambiguous system that anyone can execute. For example, in the present pseudocode, Q isn't initialised anywhere. If you were trying to understand the algorithm for the first time, this would be annoying (does Q persist across calls to BFS or not?). And in fact in Hopcroft+Kahn's original paper, there is a bug in their DFS pseudocode (missing DELETE). Of course these can be fixed, but my point is they would never have been there in the first place in code that you can actually execute. Obviously I realise that code can have bugs, but it is forced at least to make sense by the discipline of the language.
For these reasons I'd be happy to do away with the pseudocode in favour of Python code (not necessarily the program I submitted). The present pseudocode is virtually Python already, but (as mentioned above) it wouldn't work as it is. (Also, I think it is a less straightforward/direct way of writing the algorithm to introduce the special 'NIL' node, though I suppose that is a matter of taste.)
But I'm not sure how deep your objections go. Perhaps you also think that pseudocode is bad in a Wikipedia article. I would disagree with that too, in that I believe that pseudocode is better than no code at all for similar reasons to those above. Pseudocode imposes more of a discipline than an informal description and makes it harder for the writer to gloss over a tricky point and easier for a reader to cut to the point of what they want to understand. Alex Selby (talk) 11:56, 15 April 2017 (UTC)[reply]
No, pseudocode is ok, because its primary purpose is to be read by humans. Another disadvantage of actual code is that when it is present the articles containing it quickly turn into code farms, with the Ruby and Go and Erland and Haskell and Javascript programmers all wanting to show how it's done in their languages. —David Eppstein (talk) 15:42, 15 April 2017 (UTC)[reply]
  • Comment. I've removed the Python code. Generally, there is only one implementation of an algorithm per article; other implementations would be redundant; WP does not need editors supplying implementations in their favorite language. This article has a pseudo code "implementation". Changing to a different implementation needs consensus on the talk page. I'm OK with the psuedocode and would not change to the Python.
Furthermore, algorithms should be algorithms. I'm opposed to any implementation that includes side-effects to a piece a paper.
Glrx (talk) 17:56, 24 April 2017 (UTC)[reply]

Remove the link to Mahajan's lecture notes[edit]

In the references section there is a link to lecture notes from Meena Mahajan. Up to a point it's sort of good. Corollary fifteen which is a central statement in this paper states something about an M* matching and in the proof it assumes it is a maximum matching while the statement is true for any matching and in the same paper it is used for a generic matching rather than a maximum matching. In the original paper of Hopcroft and Karp in Theorem 1 gives the statement for any matching in a refined manner and proves it correctly. Unfortunately the original article is still not freely available, but I'm referring to this paper. — Preceding unsigned comment added by Jakab922 (talkcontribs) 14:06, 14 March 2019 (UTC)[reply]