Talk:Multi-label classification

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Computer science  
WikiProject icon This article is within the scope of WikiProject Computer science, a collaborative effort to improve the coverage of Computer science related articles on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
 ???  This article has not yet received a rating on the project's quality scale.
 ???  This article has not yet received a rating on the project's importance scale.

Label Combinations[edit]

Just someone with a causal interest in multi-label classification (i.e. definitely no expert), but assuming they are referring to the same approach, is the name `Label Powerset' not more common than Label Combinations? Certainly I have come across the former far more often in the literature (and indeed had never heard of Label Combinations before reading this article - relatively few hits on Google Scholar too). (talk) 21:23, 25 February 2013 (UTC)

While Label Powerset is the more common name in the literature, Label Combination is used in the WEKA/MEKA constellation of software. As this is the reference implementation in many cases, it is often used. However I have changed the article to your recommendation as I believe your analysis is correct. Doctorambient (talk) 00:57, 7 April 2014 (UTC)

Clarification of intro paragraph needed[edit]

The intro paragraph perhaps makes sense to those already familiar with the distinctions it tries to make, but for those who need the explanation I don't think it's possible to discern what it's trying to say.

"multi-label classification and the strongly related problem of multi-output classification are variants of the classification problem where multiple target labels must be assigned to each instance. Multi-label classification should not be confused with multiclass classification, which is the problem of categorizing instances into more than two classes. Formally, multi-label learning can be phrased as the problem of finding a model that maps inputs x to binary vectors y, rather than scalar outputs as in the ordinary classification problem."

  • What is a "target label" as opposed to just a label?
  • Are "labels" related or unrelated to levels of a categorical (nominal) classification variable?"
  • What is the significance of the switch from "Multi-label classification" to "multi-label learning" in sentences 2 and 3? Are they synonyms, or is this a non-sequitur?
  • What is a "binary vector"?

Here's my best guess as to what this is trying to say:

"Multi-class classification" apparently refers to the case where, in addition to the x input variable(s), there is a single class variable, that variable is categorical (nominal), and has three or more levels (alternatives). Not sure why this needs a special name, it seems to me that the case of having only two possible levels is the oddity. But whatever.

"Multi-label classification" appears to refer to the case where there are multiple class variables. I suppose that if these several variables are represented as a "vector" (ie: array with only one index), AND each array entry can only have one of two values ("binary"), then this is the "binary vector" referred to? But why would this be called "multi-label" rather than just "multiple class variables"? And what happens if the class variables have >2 levels?

So, I'm not too confident that this series of suppositions is correct. In any case, the intro paragraph could perhaps be revised to make all this clear.

Gwideman (talk) 03:55, 30 December 2014 (UTC)