Talk:Clique problem

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Mathematics (Rated GA-class, Mid-importance)
WikiProject Mathematics
This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of Mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Mathematics rating:
GA Class
Mid Importance
 Field:  Discrete mathematics
WikiProject Computer science (Rated GA-class, Mid-importance)
WikiProject icon This article is within the scope of WikiProject Computer science, a collaborative effort to improve the coverage of Computer science related articles on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
 GA  This article has been rated as GA-Class on the project's quality scale.
 Mid  This article has been rated as Mid-importance on the project's importance scale.
 

I thought a clique was a fully connected sub-graph? The way it is described here makes me think a clique is two adjacent vertices... --Bryanlharris 19:21, May 7, 2004 (UTC)

Pairwise adjacent means every pair of vertices in the set are adjacent. This is equivalent to your definition. Deco 19:38, 26 Mar 2005 (UTC)

Proposed move[edit]

I propose we move this to maximum clique problem, which is a commonly used name that is more accurate and specific. I put a redirect in the way, but I'll take care of the move if there's consent. Deco 23:58, 27 May 2005 (UTC)

Seems like a reasonable idea. MathMartin 09:05, 28 May 2005 (UTC)

Removed proof[edit]

I removed the proof that maximum clique is NP-complete and replaced it by an appeal to the NP-completeness of independent set. It was a good proof, but I think it's easier to go through independent set, because there's a lot less edges to talk about filling in. Admittedly the proof on independent set problem isn't all that nice, and any improvement to it would be great. Deco 01:49, 8 Jun 2005 (UTC)

Rewrote independent set problem now. Should be somewhat better. Deco 04:47, 8 Jun 2005 (UTC)

Algorithm feedback[edit]

I suggest a algorithm here. http://insaint03.cafe24.com/clique-en.html I can't be sure it's correct, I need some comment. --insaint 18:36, 26 May 2007 (UTC)

The problem is proven NP-complete, your algorithm runs in O(n^3) - either you deserve $1million prize or the algorithm is wrong. You might try on one of the larger examples given in the link at the bottom of the article and see whether you find the known answer.

Reference[edit]

Can anyone find a source for the 3-clique algorithm mentioned in version 111018197? Some special cases may be solved in less than exponential time. For k = 3, there exists an O(n1.41) algorithm where n is the number of edges. -- Rampion (talk) 15:27, 16 February 2008 (UTC)

I'm going to go ahead and remove it for now. If anyone wants to restore it, please note the reference. -- Rampion (talk) 15:00, 19 February 2008 (UTC)

Problem with maximal clique heuristic[edit]

This section seems to state that arbitrarily merging cliques will always find a maximal (as opposed to maximum) clique. This is not true. Consider 2 disjoint cliques A and B produced by merging. Assume that there is some edge missing between A and B so that they cannot be merged, but that there are vertices u in A and v in B such that u is adjacent to every vertex in B and v is adjacent to every vertex in A. Then A is contained in A union v and B is contained in B union u, so neither are maximal, but no further merging can be done. Merging in some intelligent order may work, but the algorithm as stated does not necessarily find any maximal cliques. Bjoeris (talk) 16:07, 17 April 2008 (UTC)

It seems wrong indeed. I've removed the passage. --Mellum (talk) 12:11, 12 May 2008 (UTC)

Article title[edit]

Maybe this is a good time to think about what this article should be called?

Currently, there are at least two problems I know that qualify as Clique problem. One is Algorithms for finding a clique or Computing a clique or Algorithmic aspects of finding a large complete subgraph or its complement. The other is the Party problem or the Ramsey number problem. Both terminologies can be found in the literature, but the algorithmic interpretation of course wins a google fight, though Mathworld uses the Ramsey interpretation.

Currently, Party problem redirects to Clique problem, and that’s something we clearly need to rectify. One solution is to let the current algorithm (under the current headline) describe both interpretations, but I can’t see how that would work. An easy alternative is to just fix the redirect. That being said, I am mildly in favour to a more descriptive, and exclusive, article title. Algorithms for clique or Finding a clique sound good (but not quite perfect) to me. Thore Husfeldt (talk) 13:27, 18 December 2009 (UTC)

I'm not opposed to changing the name. The imperfection in those titles, to me, is that they omit the complexity theoretic side of research on the clique problem, since that's less about finding a clique and more about not finding a clique. One possibility, though it doesn't excite me very much, would be Clique (computer science). One issue is whether we should choose a name such as the one we have now that refers only to cliques, or whether the name should also include independent sets (in which case, there's more material that needs to be added to the article, as the clique and independent set problems are quite different in most graph families and I omitted anything that was only about independent sets in my rewrite). —David Eppstein (talk) 17:15, 18 December 2009 (UTC)
Indeed, approximation algorithms in bounded-degree graphs are a fairly natural example... And let's not forget other models of computation. From the perspective of distributed computing, independent sets and cliques are very different things. On the one hand, finding a maximal independent set is a truly central and classical problem in the field (and also closely related to Graph coloring#Parallel and distributed algorithms). There are many seminal positive and negative results, and numerous reductions to and from maximal (or near-maximum) independent sets. To someone studying distributed computing, independent set problems are as central as, say, SAT and linear programming together to someone studying polynomial-time computation. On the other hand, I can't say much about clique-finding algorithms from the perspective of distributed computing – it isn't even obvious how one should formulate a "clique problem" so that it'd be interesting and relevant from the perspective of distributed computing. Hence if we try to cover all computational aspects of cliques and independent sets in one article, at least from this perspective it'd be strange if the article was called "Clique problem". — Miym (talk) 19:41, 18 December 2009 (UTC)

I think expanding the scope of the article is probably not what we need to do right now. Also, based on the large recent expansion, it looks like we're set to get a link to this article from the front page via "Did you know?" some time in the next week or so, so if we do make a decision on a better name we should probably hold off on changing it until later. —David Eppstein (talk) 08:15, 19 December 2009 (UTC)

I think a reasonable approach would be to merge Independent set problem into Independent set (graph theory), explain the computational aspects that are specific to independent sets in Independent set (graph theory) (or Maximal independent set), and refer to Clique problem for everything else. That way no changes are needed here. — Miym (talk) 11:30, 19 December 2009 (UTC)
I think the last suggestion is reasonable. Let this article be only about cliques (and things that are common to clique and independent set). Things that only apply to independent sets, can go in Independent set (graph theory) or Maximal independent set. The Independent set articles can link to this article for common algorithms. That way we can still call this article "clique problem", and let it be a comprehensive article which has everything there is to know about cliques (from an algorithms and complexity point of view). --Robin (talk) 21:08, 30 December 2009 (UTC)

Some details[edit]

Some minor details:

  • Figure caption: "Adapted from Sipser." It'd be nice to have a proper citation; we don't have Sipser in the list of references. I guess it refers to the textbook "Introduction to the Theory of Computation", but I don't have it here now.
  • "Parameterized complexity[33] is the complexity-theoretic study of problems such as finding k-cliques in graphs that are naturally equipped with a small integer parameter k, and for which the problem becomes more difficult as k increases": I guess this should be parsed "... study of problems ... that are naturally equipped with ...", but it's easy to read it "... in graphs that are naturally equipped with ..." Is it just me or should we rephrase this somehow?
  • "due to the fact that the clique number takes on integer values and is NP-hard to compute, it cannot have a fully polynomial-time approximation scheme": Strictly speaking, shouldn't it be "small integer values" instead of "integer values"?

Miym (talk) 19:24, 19 December 2009 (UTC)

I handled your second and third points. I'll let Thore address the first since he drew the figure and knows better than I where it came from. —David Eppstein (talk) 20:32, 19 December 2009 (UTC)

Accessible head[edit]

The topic of this article lives on the borderline of what might want to be accessed by a general audience, so I suggest we at least try to give a non-technical introduction in the article head. Here’s just an idea, to get the ball rolling.

The brute force algorithm finds a 4-clique in this 7-vertex graph (the complement of the 7-vertex Path graph) by systematically checking allC(7,4)=35 4-vertex subgraphs for completeness.

In computer science, the clique problem is to find a complete subgraph in a graph, i.e., a set of elements that are pairwise connected. An example is a social network, where the graph’s vertices represent individuals, and the graph’s edges represent mutual acquaintance. To find a largest subset of individuals, all of which know each other, one can systematically inspect all subsets, a process that is too time-consuming to be practical for instances with more than a few dozen individuals. Despite many attempts to find better algorithms for this problem, no significantly better approach is known, and much of the theory about the clique problem is devoted to establish its computational hardness in many models of computation.

What I aim to convey is what the problem is about, what it models, how an algorithm could look, and what kind of phenomenon the theory tries to address. I think the brute-force illustration is pretty informative; it should be clear both what we’re looking for, what the algorithm is doing, and that it would take a long time. An attractive alternative would be instead to illustrate the behaviour of a poly-time algorithm on a restricted graph class; the message here would then have to be that in general, we can’t get this approach to work. Thore Husfeldt (talk) 17:45, 20 December 2009 (UTC)

Your proposed new lede looks good to me. I agree that the existing one is too technical. —David Eppstein (talk) 17:49, 20 December 2009 (UTC)

Infobox?[edit]

[1]

Graph coloring
6n-graf-clique.svg
Decision
Name Clique, k-Clique
Input Graph G with n vertices. Integer k.
Output Does G contain a complete subgraph with k vertices?
Running time O(n.792k)
Complexity NP-complete, W[1]-complete
Garey–Johnson GT19
Optimisation
Name Clique number, Maximum Clique
Input Graph G with n vertices.
Output \alpha(G)
Complexity NP-hard
Approximability O(n log-3 n log log2n)
Inapproximability n1-\epsilon unless P = NP
Counting problem
Input Graph G with n vertices
Output The number of inclusion-maximal cliques in G
Running time O(3n/3)
Complexity #P-complete
Approximability FPRAS for restricted cases
Inapproximability No PTAS unless P = NP

I made an infobox for Graph coloring once, and maybe we want one here as well. In truth, I still don’t know if this kind of thing is a good idea. It’s certainly a quick way to get some information about the problem (for example, I’ve always found the compendia like Garey–Johnsson or Crescenzi et al. actually useful), but there are lots of problems about what information should go into such a box in the first place. Maybe its just spam or chemistry envy. Thore Husfeldt (talk) 19:22, 20 December 2009 (UTC)

I'm not a huge fan of infoboxes, because it is difficult to convey any nuance in them. For instance, under "decision problem" "running time", you have O(n^k k^2), the running time of the most naive brute force algorithm for the problem. But it doesn't say "running time for the most naive brute force algorithm", it just says "running time". There are algorithms with better worst case running times known (though still with an exponent that's linear in k). And for some algorithms, what we list as the running time isn't necessarily the actual running time of the algorithm even when considered on worst case inputs: it's the best upper bound that we know how to prove, a quite different thing. —David Eppstein (talk) 19:30, 20 December 2009 (UTC)
I think the infobox serves to present the abstracted common properties of problems (like approximability) in a structured way. The most important information here can be fairly condense ("approximation factor: 2" versus several sections of text on the Vertex Cover page that would be vague for someone who just wants the constant). Whoever wants to know more can read the full wikipedia article, go for the technical paper abstracts, or the papers themselves; whichever level of detail they wish to work on. I personally find the highly-condensed resources that Thore listed to be useful and I lament that Wikipedia does not serve that purpose, especially since the general repositories that exist are rarely, if ever, updated and should exist in wiki format.
What David Eppstein explained is precisely how I would interpret the "running time" field -- the best known (proven) upper bound on running time that someone has taken the time to add to Wikipedia, with an somewhat arbitrarily chosen level of detail (e.g. big-oh). I see no problem with the field as it is and, if necessary, a general template explanation or a simple foonote could be given. I see more issues with packing in too much information (e.g. parametrization) or vagueness (e.g. deterministic versus expected or heuristic running time). An example of these issues: [User:C._lorenz/sandbox]. Although I find an arbitrary policy for these problems sufficient for making infoboxes an asset.
Some fields that I am uncertain about are "Input&output" (could be lengthy and perhaps better given in text), "W[1]-complete" in "Complexity" seems like too much detail (we could add a lot more), and "Garey-Johnsson" - is this really useful? —C. lorenz (talk) 16:15, 21 December 2009 (UTC)
I agree on the Garey–Johnsson. It’s there because the original motivation to cook up an infobox in the first place is that I’d like Wikipedia’s entries to be at least as useful and informative as, say, the Maximum Clique entry in the Viggopendium, so I started from that framework, minus the comments. I admit that the Garey–Johnsson number is the clearest example of my chemistry envy. Thore Husfeldt (talk) 16:29, 21 December 2009 (UTC)
I think the infobox can be put in, it does present a lot of useful info in short. I like infoboxes that present the hardness of approximation result along with the best known approximation algorithm so one can compare them. W[1]-completeness might be more detail than needed in an infobox though. --Robin (talk) 05:23, 26 December 2009 (UTC)

References

Maximal cliques[edit]

Since there is so much information about listing all maximal cliques, the article should say somewhere that finding a single maximal clique is really easy (linear time). I don't know where to put this information though. --Robin (talk) 14:47, 21 December 2009 (UTC)

I agree. I guess Maximal cliques could form a section, consisting of the (easy) algorithm for finding one, and the listing/enumeration results. Thore Husfeldt (talk) 13:54, 23 December 2009 (UTC)

Example implementation[edit]

I have a more in depth description of the Tsukiyama et al. algorithm along with a C++ implementation. Since I put quite an effort into understanding the tiny details of the algorithm, I'd add it to references so others don't have to. Can some one please advise me if that would be appropriate? (I'm not a regular Wikipedia contributor, so I'm not sure about the rules yet.) Here is the link: http://c0de-x.com/how-to-find-maximal-cliques/ — Preceding unsigned comment added by Devillmeister (talkcontribs) 08:34, 1 June 2013 (UTC)

Question: Perhaps there should be more explanation on why the algorithim for finding a single maximum clique runs in linear time? I cannot immediately see why. Whether you store a graph as an adjacency matrix or an adjacency list the cost of constructing the sub-graph on K vertices is O(K^2) so one cannot just look at sub-graphs. The obvious implementation of the algorithim requires finding the interesection of two sets of integers. If the sets are stored as lists this is O(n+k) and even if they are stored as hash tables its O(min(n,k)) where n,k are the sizes of the sets.

Can someone explain more details of this algorithim and hy its running time is O(n) for non-sparse graphs? (I can see why it is O(n) for sparse graphs). Alternsatively maybe the author means linear in the size of the graph (so vertices ^ 2) in which case I can see the result. — Preceding unsigned comment added by Princess stargirl (talkcontribs) 03:50, 26 October 2014 (UTC)

Decision tree, help with SVG[edit]

I played around with drawing a decision tree, mainly because I think it’s a very accessible model of computation and makes a pretty picture. (Possibly too large.) Something is messed up in the SVG rendering (the file look alright when I let Safari’s SVG engine render it.) Any pointers to what I can do? Thore Husfeldt (talk) 14:03, 23 December 2009 (UTC)

A decision tree of depth 6 for detecting a 3-clique in a 4-vertex graph. No better constructions are possible, in the sense that it matches the lower bound of n(n-1)/2.
What's different between what it looks like here and what it's supposed to look like? —David Eppstein (talk) 17:37, 23 December 2009 (UTC)
If you look closely, I think the arrows don't look right. But when I render it in Firefox, it looks great. I think this is what Thore Husfeldt is referring to. See this blown up png for instance: [1] --Robin (talk) 17:41, 23 December 2009 (UTC)

GA-class[edit]

Since this article is in pretty good shape right now, has pictures, very detailed references, and at least a few editors actively editing it in the last few days, is anyone up for improving this article further and nominating it for GA-class? (Maybe it is more useful to bring 2 articles up to B-class, than one up to GA-class, so perhaps that isn't such a great idea.) --Robin (talk) 19:59, 27 December 2009 (UTC)

I’d be eager to hear David’s opinion on this, he’s got a few GAs under his belt. But in principle I think the article deserves a much higher rating and in many respects proudly stands as an example of what this medium can accomplish.
However, stability is an issue for GA-ing, and I would like to propose at least two larger changes to this article. So ’d like to wait. To tip my hand: 1. I’d like this article to be about Algorithms for clique and independent set (though not necessarily under that title) and 2. I’d like the structure to abandon the current Algorithm—Lower bounds dichotomy (reasons include (a) the lower bounds are mostly hardness results, not lower bounds, (b) some of the issues don’t really benefit from splitting along this fault line, such as circuit complexity, decision trees, the yet-to-be-written section on counting, etc.).
These proposals are connected—for example, I’d want a section each on Clique algorithms for special graph classes and Independent set algorithms for special graph classes, rather than the current awkward things like “finding cliques in the complement of bipartite graphs”—but need somewhat more time than I can devote to them in the next few days, so we’ll have to wait. Thore Husfeldt (talk) 20:19, 27 December 2009 (UTC)
I agree about the lower bound vs algorithms dichotomy. I don't like this split either. For example, instead of splitting the best algorithm and best bounds known for the decision problem, I would prefer a section on the decision problem which gives the best algorithms and gives the NP-completeness proof. Similarly, a section on the approximation problem, which states the best factor achievable and the best hardness result. Some topics are already covered like this, like decision tree complexity and circuit complexity.
I also agree about what the article should be about, but the article title should probably not be Algorithms for clique and independent set. I can't think of a good name off the top of my head.
About the other sections you want to add, that's great. I'm in no hurry at all, and of course we don't have a WP:DEADLINE. I just mentioned this GA-class idea to get everyone's opinions on it. --Robin (talk) 20:40, 27 December 2009 (UTC)

So in this proposed revision, would you have a separate article for hardness results? Or would you include all the current material as well as the algorithmic parts of Independent set (graph theory) and maximal independent set? I think separating out algorithms from hardness makes sense as a way of organizing the material (more sense than splitting cliques from independent sets) but that split could be into separate articles rather than as now separate sections of the same article, and the article we have now is already large and unwieldy. As for calling it "hardness" rather than "lower bounds", I have no objection to that. I do think the hardness bounds are important, though (at least the inapproximability and FPT ones) and wouldn't want to see them tacked on as a disconnected appendix to an article that's mostly about algorithms. As for GA: I think we have enough material here, written at a high enough level of quality, that we should seriously consider this, but a GA review is mostly going to cover low-level issues of grammar and wording; we need to find the best organization for the material ourselves first. —David Eppstein (talk) 20:51, 27 December 2009 (UTC)

I would have hoped to have had more time for a more though-out proposal, but the sections I envision are
Algorithms (exact algorithms for fixed size and maximum cliques) these are agnostic about whether we look for cliques or independent sets
Special graph classes algorithms for cliques, algorithms for indpendent sets. This could also be a subsection of Algorithms
Hardness (NPC, W[1])
Approximation (current 3.3 and 3.6)
Enumeration (listing maximal cliques, #cliques is #P-complete, but FPRASable for some special cases)
Other models of computation (decision trees, circuits)
(never mind the ordering and the titles) So basically it’s about moving some of the current subsections out of the current two-section Upper/Lower bound structure. Thore Husfeldt (talk) 21:28, 27 December 2009 (UTC)
First, making Other models of computation a separate section is a nice idea. That makes it clear that the rest of the article is about algorithms on real computers (or Turing machines). As far as the other secions go, I think there are 3 different clique problems, the decision problem, the optimization problem and the enumeration/counting problem. Each seems to have its own set of algorithms, and hardness results. In that order, the hardness results would be NP-completeness, inapproximability results, and #P-completeness. We do have a lot more to say about the decision problem, since it is well-studied, and is well-studied in several restricted settings: fixed parameter tractability, special graph classes, etc. So I would propose a structure which first deals with the decision problem, including all algorithms for fixed size, max clique, special graph classes, FPTs, etc. and includes all relevant hardness results about the decision problem. This section might be the largest of all. Then a section on the approximation problem, then a section on the counting problem, and then other models.
In short, all I'm saying is that Algorithms + Special graph classes + Hardness, in the suggestion above should be merged into one section with the theme of "decision problem" and hardness results should be presented along with the best algorithms. For instance, after stating that the best algorithm for maximum clique is like O(1.2^n), we give the NP-completeness proof, which explains why the best known algorithm is exponential time. The other sections are fine. But if this proposal will make one section very large compared to the others, or if this view of combining hardness+algorithms isn't shared by others, I'm fine with Thore's proposal above. --Robin (talk) 21:52, 27 December 2009 (UTC)
Re "I think there are 3 different clique problems": There are many more than three algorithmic versions of the problem about which one can find published papers: (1) yes/no, is there a clique of a given size, (2) list all cliques of a given (typically constant) size, (3) find a single maximal clique (not very interesting on a sequential computer but has been studied in parallel algorithms), (4) list all maximal cliques, (5) find a maximum clique, (6) find a minimum-sized maximal clique (more often studied in the complementary form of finding a minimum independent dominating set), (7) count the cliques of a given size (e.g. my WADS'09 paper with Spiro on counting triangles), (8) list all maximal cliques (or maximal independent sets) whose size is smaller than some bound (e.g. my older paper "Small maximal independent sets and faster exact graph coloring"). In addition I could imagine that there might be algorithms for counting maximal cliques or maximum cliques more quickly than listing them all — listing and counting are in general two different things — but I don't know of any published research on that variant. —David Eppstein (talk) 01:27, 28 December 2009 (UTC)
Fair enough, we can have more sections for the problems. I guess my main point is that for a given problem, I would prefer an organization where the best algorithm and hardness result are stated together, rather than in different parts of the article. Organizing results by the problem solved, i.e. listing max cliques or decision version of max ind. set on planar graphs, (rather than the nature of the result — algorithmic or hardness) seems more natural to me. --Robin (talk) 02:28, 28 December 2009 (UTC)
Where would we cover the distributed computing [and parallel computing] aspects: distributed algorithms for finding independent sets, lower bound results for these problems, the connection to distributed graph colouring, other applications of distributed independent set algorithms, etc.? Some possibilities might be:
  • A subsection on distributed algorithms in the "Other models of computation" section. (Might be a fairly long subsection.)
  • Together with the centralised algorithms and hardness results when we discuss the relevant problems. (I think the most relevant problems are "find a maximal independent set" and "find a large [near-maximum] independent set", both in general graphs and in special graph classes. I'm worried that covering distributed computing aspects in different places might make the text more difficult to follow.)
  • Somewhere else, e.g. Independent set (graph theory) + Maximal independent set?
Miym (talk) 19:39, 30 December 2009 (UTC)
Since I supported the proposal to have only clique-related problems in this article, I would choose option 3. There is a lot of literature on maximal independent sets (and its connection with vertex coloring), which would make a fine addition to an independent set article. Adding it here would make this article too long. Also, the distributed algorithms are really specific to finding an independent set (and not clique), so I think it's fair to put all those results in a different article. --Robin (talk) 02:07, 1 January 2010 (UTC)

Structure and scope[edit]

Since this discussion stopped (more than a month ago) before reaching a conclusion that we all agree with, I thought I should start the discussion again. So the question is, what is the scope of this article, and how should we structure it?

My viewpoint, in short, is the following: This article should only contain results related to cliques -- maximal cliques, counting cliques, listing cliques, decision trees, monotone circuit complexity of max clique, etc. Results that work for clique and ind. set. should be present in this article. Results that only work for ind. set., like ind. set. on planar graphs, or the distributed computing version should be elsewhere. As for structure, I would suggest a problem-oriented structure, where we list a problem, like "list all maximal cliques" and then present algorithms and hardness results. --Robin (talk) 20:57, 15 February 2010 (UTC)

What does "m" stand for exactly here? number of edges or vertices?[edit]

In the section "Cliques of fixed size" it says:

"In a graph with m vertices..." and in the same paragraph "...the time for this algorithm is proportional to the arboricity of the graph (a(G)) times the number of edges, which is O(m a(G))".

Please, what does "m" now stand for exactly in this context? The number of edges of vertices? It is crucial to make this clear, as in a later paragraph it says "...to find triangles in time O(n2.376), which may be faster than O(m3/2) for sufficiently dense graphs.". It makes quite a difference if one assumes the number of edges to be m or to be n in this scenario.

Thank you. —Preceding unsigned comment added by 93.211.185.46 (talk) 13:15, 17 May 2010 (UTC)

It's the number of edges. I've fixed that now. --Robin (talk) 14:02, 17 May 2010 (UTC)

Cliques of fixed size algorithm[edit]

It says, "there are O(nk) subgraphs..."
I think it's a combination (select k vertices from n), that's: , that means O(n) (the worst case if k = 1 is n)
--Vojta.jina (talk) 23:16, 21 November 2010 (UTC)

Why should you assume k = 1?? It's certainly not O(n) when k > 1. —David Eppstein (talk) 23:37, 21 November 2010 (UTC)

You are right, my assumption was wrong: k=1 is definitely not the worst case (that's around k = n/2). I guess, I should remove this paragraph. Thanks for pointing this out... --Vojta.jina (talk) 14:42, 28 November 2010 (UTC)

Hi. Whatever the worst case exactly is, "n choose k" is equal to n!/(k!*(n-k)!) and hence much worse than O(n) even if k=1. Even worse than O(n^k), which is mentioned in the article, so I don't see why it says that there. Can someone explain that or change the paragraph? —Preceding unsigned comment added by 129.215.5.255 (talk) 00:41, 10 March 2011 (UTC)

It's a crude estimate but it's always valid and when k is a constant then it's within a constant factor of the truth. —David Eppstein (talk) 01:13, 10 March 2011 (UTC)

Tsukiyama algorithm stated incorrectly[edit]

The article says that, given the maximal cliques in G\v, the maximal cliques in G are these plus those sets formed from them by adding v and removing nodes not connected to v. Let G={12,13,23,24,34,35,45}. Then the maximal cliques in G\{5} are {123} and {234}. If I add 5 to {123} and remove the nodes not connected to 5 then I get {35}, but this is not maximal in G since it's contained in {345}. I'm guessing that what the article should say is that the cliques of G are the cliques of G\v plus sets formed by adding v to the cliques of G(v), where G(v) is the set of nodes connected to v. --RDBury (talk) 17:24, 19 September 2013 (UTC)

I think I've figured this out. The words "a maximal clique C" in "... each maximal clique in G that does contain v can be formed from a maximal clique C in G \ v by adding v and removing the non-neighbors of v from C," should read "some maximal clique C", not "any maximal clique C" as I was interpreting it. So the lexicographic ordering not only prevents duplicates from appearing, as it states in the next paragraph, but also prevents non-maximal cliques from appearing, since lexicographic maximum implies maximal. I'll try to rephrase this section when a get a firmer grasp on how the algorithm actually works. --RDBury (talk) 21:17, 25 November 2013 (UTC)

GA Review[edit]

RESULT:
Symbol support vote.svg Promoted to GA
The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.

This review is transcluded from Talk:Clique problem/GA1. The edit link for this section can be used to add comments to the review.

Reviewer: Spinningspark (talk · contribs) 21:38, 26 December 2016 (UTC)


Looking... SpinningSpark 21:38, 26 December 2016 (UTC)

Lead
  • Symbol confirmed.svg You might consider wikilinking adjacent (graph theory). I know that only goes to the glossary, but that could be a good thing for a reader unfamiliar with the subject. It points them early on to a central place where they can get a quick def of unfamiliar terms.
  • Symbol confirmed.svg "worst-case optimal time" (or even "optimal time") is not a phrase found in the article body, nor is it explained or wikilinked.
    • This is a summarized version of the sourced claim in the first paragraph of subsection "Listing all maximal cliques" that "this provides a worst-case-optimal solution to the problem of listing all maximal cliques". Just before that claim, I added a new clarification "matching the number of cliques that might need to be listed" for why this is optimal. —David Eppstein (talk) 02:52, 2 January 2017 (UTC)
General
  • Symbol confirmed.svg There needs to be a link to big-O notation on first use. Likewise big-theta, big-omega, little-o etc.
  • Symbol confirmed.svg The brackets following the big-O etc butts right up. They need a space, or a thin space ( )
    • I added explanations for the first use of each notation. It turns out that   is far too wide but that {{italics correction}} does the job. Here is a comparison:
      • O(n) — {{math|''O''(''n'')}}
      • O(n) — {{math|{{italics correction|''O''}}(''n'')}}
      • O (n) — {{math|''O'' (''n'')}}
      • — <math>O(n)</math>
      • — <math>O\,(n)</math> (the LaTeX equivalent of &thinsp;)
    David Eppstein (talk) 04:38, 2 January 2017 (UTC)
History and applications
  • Symbol confirmed.svg "...reported at the time in major newspapers" is cited to the NYT, but that paper does not say it was reported in other papers. We need either a source saying it directly, cites to several different papers carrying the story, or just say it was reported in NYT.
Definitions
  • Symbol confirmed.svg "The clique decision problem is not of practical importance; it is formulated in this way in order to ..." Is this just a reformulation of the k-clique problem?
    • The problems are different because they have different outputs (a clique vs a boolean). The sentence above the bullets does say that they are closely related. —David Eppstein (talk) 07:59, 3 January 2017 (UTC)
Finding a single maximal clique
  • Symbol confirmed.svg "the one found by the greedy algorithm described above". This is the first mention of greedy algorithm, so it is unclear what this is referring to.
    • It was referring to "the algorithm described above" (in the same paragraph), and describing it as a greedy algorithm. But I moved the greedy link earlier in the paragraph in hope of making this clearer. —David Eppstein (talk) 01:21, 8 January 2017 (UTC)
Cliques of fixed size
  • Symbol confirmed.svg "When k is part of the input to the problem, however, the time is exponential." In what sense was k not part of the input in the prior cases?
    • It was not part of the input in the immediately preceding sentence, which says "whenever k is a fixed constant". Fixed constant means that it is hardcoded into the algorithm rather than being part of the input. I'm at a bit of a loss on how this could be phrased any more clearly; do you have suggestions? —David Eppstein (talk) 01:21, 8 January 2017 (UTC)
Understood. Any of "when k is not fixed...", "when k is variable...", "when k is not a constant..." would have made it clear to me on first reading. I'm not sure how important it is to say "part of the input", but both could be said if necessary "when k is not fixed and forms part of the input..."
  • Ok, copyedited per your suggestions. —David Eppstein (talk) 23:49, 9 January 2017 (UTC)
Listing all maximal cliques
  • Symbol confirmed.svg "...although the reverse of this order is NP-hard to generate". Interesting, but why is that relevant here?
    • Because it means that the ordering makes a big difference in whether it's possible to list the cliques efficiently. Edited to clarify what NP-hard means in this case (no polynomial-delay algorithm exists unless P = NP). —David Eppstein (talk) 01:21, 8 January 2017 (UTC)
  • Symbol confirmed.svg The sentence beginning "In particular, the planar graphs, and more generally..." is very hard to parse. I had to read it several times before understanding. Can the statements be broken into separate sentences?
Finding maximum cliques in arbitrary graphs
  • Symbol confirmed.svg The mention of adiabatic quantum computing concerns me on two counts. Firstly, as far as I am aware, this is still largely speculative, but you wouldn't know that from what is written (or by following the wikilink for that matter). Secondly the ref is a preprint; has this actually been published? I'm not saying don't use preprints as refs, but making solid claims about speculative subjects on the basis of them is a bit iffy. Maybe it could be reworded with "<author> has suggested that..." or some such.
    • Huh? The adiabatic reference, by Childs Farhi Goldstone and Gutmann, is a journal paper, and always has been listed as such. It had a link to the preprint version for convenience since most readers aren't going to be able to access the official journal version very easily. But I added the link for those who can. (Also, with a notable co-author, it might well pass the "recognized expert" clause of WP:SPS, but as a journal paper I don't think we need to worry about that.) —David Eppstein (talk) 23:49, 9 January 2017 (UTC)
Ok on the source then, but it still remains the case that this is a speculative proposal, no? Nobody has actually found cliques using quantum computers have they? Does a quantum computer capable of doing it actually exist? That's what troubles me about it; it just appears in a list of methods as if it is a real thing. I think some kind of caveat would be in order. SpinningSpark 00:28, 10 January 2017 (UTC)
Added "that have been suggested". —David Eppstein (talk) 07:55, 10 January 2017 (UTC)
Special classes of graphs
Approximation algorithms
  • Symbol confirmed.svg "Although the approximation ratio of this algorithm is weak, it is the best known to date" Is this sourced to Feige? If not, where?
    • Feige wouldn't really be a good source for this sort of evaluation of his own work. But I found and added a recent (2015) publication that says the same thing. —David Eppstein (talk) 06:10, 12 January 2017 (UTC)
NP-completeness
  • Symbol confirmed.svg "Karp's NP-completeness proof is a many-one reduction from the Boolean satisfiability problem for formulas in conjunctive normal form, which was proved NP-complete in the Cook–Levin theorem." Ugh! Too much jargon in one sentence. Although the sentence is understandable after following all the links, it is very difficult for a general reader. This might be another case where the same material would be ok if split into more than one sentence.
  • Symbol confirmed.svg "CNF" If the article is to use abbreviations, it should give them in brackets after the first use of the term in full.
    • Done. My usual tendency is just to spell them all out instead of using abbreviations, but I felt that three repetitions of this long phrase in a single paragraph would have been too much. —David Eppstein (talk) 07:57, 10 January 2017 (UTC)
Circuit complexity
Hardness of approximation
  • Symbol confirmed.svg "for every ε > 0" I'm not seeing a definition of ε.
  • Symbol confirmed.svg "However, an invalid proof may sometimes mistakenly be accepted, as long as the probability that a checker makes a mistake of this type is low." The logic of that statement escapes me. If the probability is high then an invalid proof may still be accepted. So surely this just says an invalid proof may sometimes be accepted without qualification? SpinningSpark 16:23, 29 December 2016 (UTC)
    • I don't understand your confusion here, to be honest. What part of the fact that the checker behaves randomly, that it is required that this random behavior should cause it to have low probability of making certain kinds of mistakes, and that other kinds of mistakes are strictly forbidden, is unclear? Why should the fact that a mistake can sometimes happen allow use to forget about the probability of it happening and just say that mistakes can happen without qualification? What needs to be changed here to make this less confusing? —David Eppstein (talk) 08:06, 12 January 2017 (UTC)
My problem is that I don't think the sentence is quite saying that semantically. If I have understood your reply correctly, the meaning is supposed to be that it can be acceptable for the algorithm to return some false positives provided the probability of doing so is low. What the sentence actually says (apparently) is that false positives only occur when there is a low probability that a checker makes such a mistake. Clearly that interpretation is illogical and not correct, but it hides the correct meaning. This leads me on to the meaning of the following sentence: "An instance of the satisfiability problem should have a valid proof if and only if it is satisfiable." How does that square with the possibility of false positives? SpinningSpark 14:58, 12 January 2017 (UTC)
Ah, I see. I hadn't noticed that ambiguity. Reworded, splitting into two sentences. —David Eppstein (talk) 04:18, 13 January 2017 (UTC)

Thanks for the thorough listing of issues! I plan to work on these over the next week or so, responding to them individually; I'll ping you again when I think I've done. —David Eppstein (talk) 20:04, 29 December 2016 (UTC)

  • The external links tool is showing a number of deadlinks (three 404 errors page not found, one server not found, and one 301 error redirecting problem.) While deadlinks are not a cause to fail a GA, that's quite a lot and you might want to do something about them. SpinningSpark 15:12, 12 January 2017 (UTC)
Yes, that's done, promoting to GA. A pleasure doing business with you. SpinningSpark 14:04, 13 January 2017 (UTC)

The discussion above is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.