Talk:Machine learning

From Wikipedia, the free encyclopedia
Jump to: navigation, search
          This article is of interest to the following WikiProjects:
WikiProject Robotics (Rated Start-class, Top-importance)
WikiProject icon Machine learning is within the scope of WikiProject Robotics, which aims to build a comprehensive and detailed guide to Robotics on Wikipedia. If you would like to participate, you can choose to edit this article, or visit the project page (Talk), where you can join the project and see a list of open tasks.
Start-Class article Start  This article has been rated as Start-Class on the project's quality scale.
 Top  This article has been rated as Top-importance on the project's importance scale.
WikiProject Systems (Rated Start-class, High-importance)
WikiProject icon This article is within the scope of WikiProject Systems, which collaborates on articles related to systems and systems science.
Start-Class article Start  This article has been rated as Start-Class on the project's quality scale.
 High  This article has been rated as High-importance on the project's importance scale.
Taskforce icon
This article is within the field of Cybernetics.
WikiProject Computing (Rated Start-class, High-importance)
WikiProject icon This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Start-Class article Start  This article has been rated as Start-Class on the project's quality scale.
 High  This article has been rated as High-importance on the project's importance scale.
WikiProject Computer science (Rated Start-class, High-importance)
WikiProject icon This article is within the scope of WikiProject Computer science, a collaborative effort to improve the coverage of Computer science related articles on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Start-Class article Start  This article has been rated as Start-Class on the project's quality scale.
 High  This article has been rated as High-importance on the project's importance scale.

General discussion[edit]

I find the machine learning page pretty good. However, the distinction between machine learning and data mining presented in this article is misleading and probably not right. The terms 'data mining' and 'machine learning' are used interchangeably by the masters of the field along with plenty of us regular practitioners. The distinction presented in this article--that one deals with knowns and the other with unknowns--just isn't right. I'm not sure how to be positive about it. Data mining and machine learning engage in dealing with both knowns and unknowns because they're both really the same thing.

My primary source for there being no difference between the terms is the author of the definitive and most highly cited machine learning/data mining text, "Machine Learning" (Mitchell, Tom M. Burr Ridge, IL: McGraw Hill, 1997), Carnegie Mellon Machine Learning Department chief, Tom Mitchell. Mitchell actually tackles head-on the lack of real distinction between the terms in a paper he published for Communications of the ACM, published in 1999 ( I've also been in the field for a number of years and support Mitchell's unwillingness to distinguish the two.

Now, I can *imagine* that when we use the term 'data mining' we are also including 'web mining' under the umbrella of 'data mining.' We mining is a task that may involve data extraction performed without learning algorithms. 'Machine learning' places emphasis on the algorithmic learning aspect of mining. The widely used Weka text written by Witten and Frank does differentiate the two terms in this way. But more than a few of us in the community felt that when that text came out, as useful as it is for using Weka and teaching neophytes, the distinction was without precedent. It struck us as something the authors invented while writing the book's first edition. Their distinction is more along the learning versus extraction distinction, but that's a false distinction as learning is often used for extraction for structuring data, and learning patterns in a data set is always a sort of "extraction," "discovery," etc. But even Witten and Frank aren't suggesting that one is more for unknowns and the other for knowns, or one is more for prediction and the other for description. Data mining/machine learning is used in a statistical framework, where statistics is quite clearly a field dedicated to handling uncertainty, which is to say it's hard to predict, forecast, or understand the patterns within data.

I feel that 'data mining' should redirect to 'machine learning,' or 'machine learning' redirect to 'data mining,' the section distinguishing the two should be removed, and the contents of the two pages merged. Textminer (talk) 21:44, 11 May 2013 (UTC)

There is no discussion of validation, over-fit and the bias/variance tradeoff. To me this is the whole point and the reason why wide data problems are so elusive. Izmirlig (talk)

— Preceding unsigned comment added by Izmirlig (talkcontribs) 18:42, 12 September 2013 (UTC)

I modified the strong claim that Machine Learning systems try to create programs without an engineer's intutition. When a machine learning task is specified, a human decides how the data are to be represented (e.g. which attributes will be used or how the data need to be preprocessed). This is the "observation language". The designer also decides the "hypothesis language", i.e. how the learned concept will be represented. Decision trees, neural nets, SVMs all have subtlety different ways of describing the learned concept. The designer also decides on the kind of search that will be used, which biases the end result.

The way the page is written now, there is no distinguishing between machine learning and pattern recognition. machine learning is much more than simple classification. Robots that learn how to act in groups is machine learning but not pattern recognition. I am not an expert at ML, but am an expert in pattern recognition. So I hope that someone will edit this page and put in more information about machine learning that is not also pattern recognition.

I don't agree with this: I believe that pattern recognition is generally restricted to classification, while this page explicitly says that ML covers classification, supervised learning (which includes regression), unsupervised learning (such as clustering), and reinforcement learning.
Careful not to pigeonhole into the "unsupervised learning is clustering and vice versa". The data mining folks think this way and they're completely wrong, as my ML prof once said. User:
Notice that I said "such as clustering". The article does clearly state that unsupervised learning is modeling. -- hike395 16:02, 2 Mar 2005 (UTC)
Further, I don't think of pattern recognition as a specific method, but rather a collection of methods, generally described in the 1st edition of Duda and Hart. So, I deleted pattern recognition from "common methods". Also, a genetic algorithm is a generic optimization algorithm, not a machine learning algorithm. So, I removed it, too. -- hike395 01:13, 20 Dec 2004 (UTC)
There are those who would disagree on the subject of Genetic Algorithms and their relation to ML. Machine learning takes it's basic principles from those found in naturally occurring systems, so do GA's. You could call evolution a kind of "intelligence", I suppose. Anyway the call's been made, but there should be some mention in the "related".
I disagree with this statement --- machine learning has completely divorced itself from any natural "intelligent" system: it is a branch of statistics. I think you are thinking of the term "computational intelligence" (which is the new name for an IEEE society). I'm happy to have See also links to AI and CI. -- hike395 16:02, 2 Mar 2005 (UTC)

>You could call evolution a kind of "intelligence"

No. Evolution is not goal-directed.

Blaise 17:32, 30 Apr 2005 (UTC)

Unlike many in the ML community, who want to find computationally lightweight algorithms that scale to very large data sets, many statisticians are currently interested in computationally intensive algorithms. (We're interested in getting models that are as faithful as possible to the situation, and we generally work with smaller data sets, so the scaling isn't such a big issue.) The point I'm making is that the statement that "ML is synonymous with computational statistics" is just plain wrong.

Blaise 17:29, 30 Apr 2005 (UTC)

I had misgivings about that statement, too, so I just deleted it. Notice that I also deleted your edit that statistics deals with data uncertainty only, but ML deals with certain and uncertain data. I'd be willing to bet that you are a frequentist (right?). At the 50 kilometer level, frequentist statisticians deal with data uncertainty, but Bayesian statisticians deal with model uncertainty (keeping the observed data as an absolute, and integrating over different model parameters). I don't think you can make the distinction that statisticians are only frequentist (deal with data uncertainty), since Bayesian statisticians would violently disagree.
Now, if you say that ML people care more about accurate predictions, while statisticians care more about accurate models, that may be true, although I don't believe you can make an absolute statement. --- hike395 23:02, 30 Apr 2005 (UTC)

Algorithm Types[edit]

"Algorithm Types" should probably not link to Taxonomy. It is simpler and more precise to say "machine learning algorithms can be categorized by different qualities." StatueOfMike (talk) 18:18, 26 February 2013 (UTC)

Problem Types[edit]

I find the "Algorithm Types" section very help for providing context for the rest of the article. I propose adding a section/subsection "Problem Types" to provide a more complete context. For example. many portions of the rest of the article will say something like "is supervised learning method used for classification and regression". "Supervised Learning" is explained somewhat under the "Algorithm Types" section, but the problem types are not. Structured learning already has a good breakdown of problem types in machine learning. We could incorporate that here, and hopefully expand on it. StatueOfMike (talk) 23:12, 8 February 2013 (UTC)

Reinforcement Learning Placement[edit]

Shouldn't reinforcement learning be a subset of unsupervised learning?

I don't think so. Reinforcement learning is not completely unsupervised: the algorithm has access to a supervision signal (the reward). It's just that it is difficult to determine which action(s) led to the reward, and there's an exploitation vs. exploration tradeoff. So, it isn't strictly supervised learning, either. It's somewhere in-between. -- hike395 July 1, 2005 07:08 (UTC)

Radial basis function[edit]

Should this article link to the "radial basis function" article, instead of linking to the two articles "radial" and "basic function"?

Absolutely Yes check.svg Done --Adoniscik (talk) 20:54, 9 March 2008 (UTC)


Some people, mainly researchers of this field (ML) are blogging about this subject. Some blogs are really interesting. Is there a space in an encyclopedia for links to those blogs ? I can see 3 pbs with this:

  • advertising for people/blogs?
  • how to select relevant blogs
  • necessity to check if those blogs are enougth often updated.

What do you think of adding a blog links section ? Dangauthier 14:11, 13 March 2006 (UTC)

Can be interesting, the question is of course which ones to include. I posted recently a list of machine learning blogs on my blog: Damienfrancois 09:09, 7 June 2006 (UTC)

I deleted the link to a supposed ML blog [1] which wasn't relevant, and was not in english.

I oppose the inclusion of blogs. Most of the article right now consists of links. See WP:Linkspam --Adoniscik (talk) 21:01, 9 March 2008 (UTC)

Structured Data Mining is missing[edit]

The category Structured Data Mining is missing. See summarization Especially the sub-categories are also missing:

Two important books are:

  • Kernel Methods in Computational Biology, Bernhard Scholkopf, Koji Tsuda, Jean-Philippe Vert
  • Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology

JKW 11:50, 8 April 2006 (UTC)

deductive learning?[edit]

"At a general level, there are two types of learning: inductive, and deductive."

What's deductive learning? Isn't learning inductive? --Took 01:48, 10 April 2006 (UTC)

From a purely writing view, the rest of the paragraph (after the above quote) goes on to explain what inductive machine learning is, but deductive machine learning isn't covered at all. --Ferris37 03:49, 9 July 2006 (UTC)

I don't think the statement that the two basic learning approaches are inductive and deductive makes any sense. In supervised learning there is inductive and transductive learning, but I am not sure about the "deductive" one. At least I wouldn't know what it is.
The biggest learning categories are usually identified as: Supervised-, semi-supervised-, unsupervised- and reinforcement learning. Although reinforcement learning can be viewed as a special case of supervised learning.
There are also more subtle categories, such as e.g. active-learning, online-learning, batch-learning.

Non-homogeneous reference format[edit]

It's a minor, but I see in this article the format of the references is inconsistent. Bishop is cited once as Christopher M. Bishop and another one as Bishop, C.M. Is there a standard format for wikipedia references? Jose

I use WP:CITET inside WP:FOOT --Adoniscik (talk) 20:58, 9 March 2008 (UTC)

Help needed with "learn"[edit]


In the following context As a broad subfield of artificial intelligence, machine learning is concerned with the design and development of algorithms and techniques that allow computers to "learn" ,no definition of the last word in the sentense - "learn" - is given. However, it appears very essential, because it's central to this main definition.

A definition like "machine learning is an algorithm that allows machines to learn" sounds to me like a perfectly tautologous definition.

It's my understading that this article is about either computer science, or mathematics, or statistics, or some other "exact" discipline. All of these disciplines have quite exact definitions of everything, exept for those very few undefined terms that are declared upfront as axioms or undefined concepts. Examples: point, set, "Axiom of choice".

In this article, the purpose of Machine Learning and the tools it uses are clear to me as a reader. But the very method is obscure - what exactly it means for a machine to 'learn'. Would somebody please define "learn" in precise terms without resortiong to other obscure and not exactly defined in the technical world words like 'understand' or 'intelligence'?

There must exist a formal definition of 'learn', but if not, then, in my opinion, in order to avoid confusion, it should be clearly stated upfront that the very subject of machine learning is not clearly defined.

Compare this, for example, to how 'mathematics' is defined, or how the functions of ASIMO robot are clearly defined in Wikipedia.

Thanks in advance, Raokramer 13:28, 8 October 2007 (UTC)

There are formal definitions of what "learn" means. Basically it is about generalizing from a finite set of training examples, to allow the learning agent to do something (e.g. make a prediction, a classification, predict a probability, find a good representation) well (according to some mathematically defined criterion, such as prediction error) on new examples (that have something in common with the training examples, e.g., typically they are assumed to come from the same underlying distribution).

Yoshua Bengio March 26th, 2011. —Preceding undated comment added 01:18, 26 March 2011 (UTC).

Are there any learning algorithms that don't work by search?[edit]

Do all learning algorithms perform search? All rule/decision-tree algorithms certainly do search. Are there any exceptions?

Are there any other exceptions? Pgr94 (talk) 12:31, 16 April 2008 (UTC)

Most learning algorithms don't do search. Search is more an AI thing, not so much learning. Many algorithms are based on convex optimization: Support Vector Machines, Conditional Random Fields, logistic regression, etc.
Optimization is a kind of search: pgr94 (talk) 12:02, 1 May 2011 (UTC)
If you define search as "finding the solution to a mathematical formular" as wikipedia says, then optimization is search. And learning has to be search, too. Then naive Bayes is search to, because it solves a mathematical formula. Imho saying solving a formular is search is a little misleading. I think the term is mostly used for discrete problems, not continuous ones. But I would agree that most learning algorithms use some kind of optimization.
Also, one might ask the question "What is the search used for?"
Saying learning algorithms work by search sounds like they produce their answer by doing a lookup, which is certainly not the case for most algorithms. Most learning algorithms build some kind of model. Usually by some formula. If solving a formula is search, well then what other choices are there? Btw, this is really the wrong place for this kind of discussion so I'd be glad if you remove it. If you have questions about machine learning, I'd suggest T3kcit (talk) 06:16, 23 August 2011 (UTC)
Thank you for your reply T3k. The article currently does not mention the relationship between learning and search. According to Mitchell's seminal article generalization is a search problem.

One capability central to many kinds of learning is the ability to generalize [...] The purpose of this paper is to compare various approaches to generalization in terms of a single framework. Toward this end, generalization is cast as a search problem, and alternative methods for generalization are characterized in terms of search strategies that they employ. [...] Conclusion: The problem of generalization may be viewed as a search problem involving a large hypothesis space of generalizations. [...] Generalization as search, Tom Mitchell, Artificial Intelligence (1982) doi:10.1016/0004-3702(82)90040-6

I am enquiring here if there are any more recent publications that qualify this very general principle. pgr94 (talk) 20:10, 23 August 2011 (UTC)
Saying that different approaches can be cast as search doesn't mean that they are search, nor that they use search. 20:18, 23 August 2011 (UTC)
I am not quite sure this is what you are looking for but there is the study of empirical risk minimization. This is a standard formulation of the learning problem. You could say that it defines learning as a search problem, also I guess most people would rather call it an optimization problem. T3kcit (talk) 10:24, 24 August 2011 (UTC)

Archive bin required?[edit]

suggestion = archive bin required Sanjiv swarup (talk) 07:44, 17 September 2008 (UTC)

If you mean that the talk page should be archived I disagree. It is pretty managable at the moment. Typically months pass in between comments! --Adoniscik(t, c) 08:04, 17 September 2008 (UTC)

Promoting the article's growth[edit]

Does anyone think snipping the FR section (and moving it here) would encourage people to actually write something? --Adoniscik(t, c) 02:40, 13 October 2008 (UTC)

FR? pgr94 (talk) 12:04, 1 May 2011 (UTC)

Column formatting[edit]

Is there any reason that the See Also section is formatted in columns? Or was that just the result of some vestigial code... WDavis1911 (talk) 20:38, 27 July 2009 (UTC)

"Labeled examples"[edit]

On this page, and the main unsupervised learning page, the phrase "labeled examples" is not explained or defined before being used. Can somebody come up with a concise definition? --Bcjordan (talk) 16:31, 15 September 2009 (UTC)

Ref mess[edit]

In this diff, the "Bibliography" section was converted to "Further reading". Looking at the history, it's clearly an aggregation of actual sources with other things just added for the heck of it. It is sometimes possible to see what an editor was adding when he added a source there, so there are good clues for how we could go away citing sources for the contents of the article. It's too bad it developed so far so early, before there was much of an ethic of actually citing sources, because now it will be a real pain to fix. Anyone up for working on it? Dicklyon (talk) 18:56, 10 April 2011 (UTC)

Be bold! pgr94 (talk) 12:05, 1 May 2011 (UTC)

Representation learning notabiity[edit]

Is "representation learning" sufficiently notable to warrant a subsection? The machine learning journal and journal of machine learning research have no articles with "representation learning" in the title. Does anyone have any machine learning textbooks with a chapter on the topic (none of mine do)? There is no wikipedia article on the subject. Any objections to deleting? pgr94 (talk) 22:38, 15 August 2011 (UTC)

I would agree that it might not yet pass WP:Notability. And that's why it doesn't have its own article. But a paragraph seems OK. Other sources that discuss the topic include this 1991 paper and this 1997 paper, and this 2010 paper; others may use different words for the same ideas. Dicklyon (talk) 00:09, 16 August 2011 (UTC)
Machine learning is a large field spanning 50 odd years. Three or four articles is therefore hardly notable. WP:UNDUE states that "represents all significant viewpoints [..] in proportion to the prominence of each viewpoint". Unless there is more evidence for the significance of representation learning this section needs to be removed. pgr94 (talk) 19:37, 26 August 2011 (UTC)
Is the term "representation learning" what you think is too uncommon? The small paragraph in question is just a very quick survey of some techniques that are in common use these days. There are tons of sources covering the topics of that paragraph. Dicklyon (talk) 22:08, 26 August 2011 (UTC)
I have issue with the term "representation learning" which is uncommon. The section should be renamed dimension reduction. This is the more common term. Do you have any objection? pgr94 (talk) 09:51, 2 October 2011 (UTC)
That leaves out the other end of the spectrum, sparse coding, which is usually a dimension increase. Dicklyon (talk) 16:22, 2 October 2011 (UTC)
I think you're pushing a point of view that is not supported by the literature. As editors, we should reflect the literature, and not seek to adapt it. As I have already said above, there are few references for "representation learning". I really don't see why you're insisting... pgr94 (talk) 16:41, 2 October 2011 (UTC)

Connection to pattern recognition[edit]

This article should definitely link to pattern recognition. And I feel there should be some discussion on what belongs on pattern recognition and what on machine learning. T3kcit (talk) 06:21, 23 August 2011 (UTC)

Adversarial Machine Learning[edit]

Recently I've heard the term Adversarial Machine Learning a few times but I can't find anything about it on Wikipedia. Is this a real field which should be covered in this article, or even get its own article? — Hippietrail (talk) 07:47, 29 July 2012 (UTC)

Lead section badly-written and confusing[edit]

The lead section of the article is badly-written and very confusing.

Machine learning, a branch of artificial intelligence, is a scientific discipline concerned with the design and development of algorithms that take as input empirical data, such as that from sensors or databases, and yield patterns or predictions thought to be features of the underlying mechanism that generated the data. A learner can take advantage of examples (data) to capture characteristics of interest of their unknown underlying probability distribution. Data can be seen as instances of the possible relations between observed variables. A major focus of machine learning research is the design of algorithms that recognize complex patterns and make intelligent decisions based on input data. One fundamental difficulty is that the set of all possible behaviors given all possible inputs is too large to be included in the set of observed examples (training data). Hence the learner must generalize from the given examples in order to produce a useful output in new cases.

For example, the word "learner" is introduced without any context. For another example, the beginning sentence is very long and meandering. Finally, the end sentence is very poorly explained and seems to be a detail which does not belong in a lead section. A lot of words are tagged on. This lead certainly does not summarize the article. Thus I am tagging this article. JoshuSasori (talk) 06:09, 28 September 2012 (UTC)

Tried to improve the Lead based on your feedback. Any further feedback that you may provide would be helpful. Thanks. IjonTichyIjonTichy (talk) 15:05, 5 November 2012 (UTC)

A section for preprocessing for learning?[edit]

I recently read an article about distance metric learning ( and it appears that there should be a section dedicated to preprocessing techniques. Distance metric learning has to do with learning a Mahalanobis distance which describes whether samples are similar or not. One could proceed to transform the data into a space where irrelevant variation is minimized and the variation that is correlated to the learning task is preserved (relevant component analysis). I think feature selection/extraction should also be mentioned.

I believe a brief section discussing preprocessing and linking to the relevant sections would be beneficial. However, such a change should have the support of the community. Please comment and provide your opinions. — Preceding unsigned comment added by (talk) 22:36, 28 September 2012 (UTC)

This is a good idea; the Mahalanobis distance is used in practice in industry, and should be mentioned here. But probably only briefly, as the article seems to be quite technical already and not so easy to read for non-experts. IjonTichyIjonTichy (talk) 15:10, 5 November 2012 (UTC)

Definition by Samuel[edit]

The definition by Arthur Samuel,(1959) seems to be non-existent. Some papers/books cite his key-paper on ML in Checkers-games (see: but that doesn't contain a definition whatsoever (better yet, it states "While this is not the place to dwell on the importance of machine-learning procedures, or to discourse on the philosophical aspects" p.71). So I wonder whether we should keep that definition in the wiki-page... Otherwise I'm happy to receive the source+page where that definition is stated :)

Generalization in lede[edit]

From the lede:

The core of machine learning deals with representation and generalization. Representation of data instances and functions evaluated on these instances are part of all machine learning systems. Generalization is the property that the system will perform well on unseen data instances;

This doesn't cover transductive learning, where the data are finite and available upfront, but the pattern is unknown. Much unsupervised learning (clustering, topic modeling) follows this pattern as well. QVVERTYVS (hm?) 17:32, 23 July 2014 (UTC)

I got rid of the offending paragraph and wrote a completely new lede. QVVERTYVS (hm?) 18:07, 23 July 2014 (UTC)