Talk:Support-vector machine

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search


How is that relevant?[edit]

How is the section

When data is unlabelled, supervised learning is not possible, and an unsupervised learning approach is required, which attempts to find natural clustering of the data to groups, and then map new data to these formed groups. The support-vector clustering[2] algorithm, created by Hava Siegelmann and Vladimir Vapnik, applies the statistics of support vectors, developed in the support vector machines algorithm, to categorize unlabeled data, and is one of the most widely used clustering algorithms in industrial applications

relevant for this article? Should we delete it? --Raffamaiden (talk) 12:58, 26 May 2019 (UTC)

See the bottom of this talk page (and that is where you should add new entries, btw.). But I do consider Vladimir Vapnik to be an authority on SVMs... HelpUsStopSpam (talk) 19:27, 26 May 2019 (UTC)

Parameter selection unclear[edit]

The parameter selection section introduces a soft margin parameter C. Is this the same as λ from the soft margin section? Could this be more explicitly defined, possibly prior to this section. I would suggest in the soft margin section. Jasonzutty (talk) 21:35, 16 September 2016 (UTC)

Wrong formula for regression (?)[edit]

The formula reported for regression appears to be wrong (to me) in the "subject to" part. In particular, the left-hand sides are the same except the signs (the first one is the opposite of the second one). As it is it does not make sense. — Preceding unsigned comment added by 93.148.3.203 (talk) 10:52, 27 April 2017 (UTC)

I agree that it looks wrong—it looks like the absolute value of the expression ought to be no greater than epsilon. But reference [73], in its expression (2), writes it in the same odd way. Loraof (talk) 22:06, 17 August 2017 (UTC)
Actually the two constraints do have the effect of saying that the absolute value of the expression must be no greater than epsilon. With simplified notation the constraints say and If z≥0, the first constraint says z must be no greater than epsilon in absolute value, while the second constraint says that a negative must be no greater than a positive, which is nonbinding. And if z<0, the second constraint says that the absolute value of z must be no greater than epsilon, while the first constraint says nonbindingly that a negative must be no greater than a positive. So the combined effect is that Expressing it in absolute value form is more intuitive and readily readable, but expressing it as the article does is how you actually give the instruction to some regression packages. Loraof (talk) 03:51, 18 August 2017 (UTC)

Citation needed within Applications section?[edit]

"[SVMs] have been used to classify proteins with up to 90% of the compounds classified correctly." I suggest this requires a citation. The nearest references that follow do not relate to this point. TiredOfLondon (talk) 13:21, 5 March 2018 (UTC)

Unsourced claims[edit]

The statement "The support vector clustering algorithm [...] is one of the most widely used clustering algorithms in industrial applications." is too strong of a claim to be done without any kind of support from any source. As such, I added a "citation needed" tag.

Notice that this statement was introduced to replace a much much milder claim: it went from "[Support vector clustering] is used in industrial applications either when data are not labeled or when only some data are labeled as a preprocessing for a classification pass." (which seems reasonable, even without a reference), to the current (much stronger) claim that it is "one of the most widely used clustering algorithms in industrial applications".

Off the top of my head, I can think of much more popular clustering algorithms (that can be used with "industrial-sized" datasets): K-means clustering, K-medioids clustering, self-organizing maps, DBSCAN, OPTICS, etc.

Also, note that Wikipedia's article on "clustering methods" (https://en.wikipedia.org/wiki/Cluster_analysis) does not mention "support vector clustering", not even once. This should give you an idea of how popular this method is.