Talk:Backpropagation: Difference between revisions

Content deleted Content added

Inline

Revision as of 23:21, 31 August 2011

Robotics Start‑class Mid‑importance

	This article is within the scope of WikiProject Robotics, a collaborative effort to improve the coverage of Robotics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.RoboticsWikipedia:WikiProject RoboticsTemplate:WikiProject RoboticsRobotics articles
Start	This article has been rated as Start-class on Wikipedia's content assessment scale.
Mid	This article has been rated as Mid-importance on the project's importance scale.

Questions

Using the neurons weights on it's incoming connections - What does this mean? - Zoe 06:22, 31 January 2003 (UTC)[reply]

Hopefully this has been clarified. - Eyrian 05:30, 22 Apr 2005 (UTC)

Backpropagation usually allows quick convergence on local minima - What does this mean? - Zoe 06:22, 31 January 2003 (UTC)[reply]

I think that this is reasonable terminology for use in an article on optimization (though hopefully clarified somewhat). - Eyrian 05:30, 22 Apr 2005 (UTC)

Please translate from German to English

There seems to be much more information in the German version of this page. Please translate it to English. Thanks! --Jcarroll 23:19, 5 May 2006 (UTC)[reply]

History of the technique

I had heard at some point that a review of Gauss's work indicated that the technique we call backpropagation was actually developed by Gauss. Werbos would then be an independent re-discoverer of the technique. It may even have been Werbos who told me this; I wasn't making notes at the time (about 1990). But I'm not finding corroboration of this, so I thought I would at least broach the topic here. Until there is such corroboration, of course, there should be no change to the article. --Wesley R. Elsberry 11:58, 27 August 2006 (UTC)[reply]

I somehow remember that Yann LeCun is blamed for this technique. On his web page, I found an article "generalization using back-propagation" from 1987. (http://yann.lecun.com/exdb/publis/index.html)

The mathematical theory was there for a long time, so a historical rewiev is indeed interesting. —Preceding unsigned comment added by 83.216.234.177 (talk) 09:58, August 28, 2007 (UTC)

Any technique for function minimization (in this case, classification error minimization) that is not simple trial-and-error will predate to the works of Isaac Newton and/or Leibnitz. Perceptron and MLP training are not exceptions, most training algorithms will be variations of Gradient Descent or Newton-Raphson (like Levenberg-Marquadt). Historical review on this would be endless.

--Lucas Gallindo (talk) 19:45, 8 September 2009 (UTC)[reply]

Tautology?

"Backpropagation usually allows quick convergence on satisfactory local minima for error in the kind of networks to which it is suited." —The preceding unsigned comment was added by 129.170.31.11 (talk • contribs) 03:16, 16 January 2007 (UTC).[reply]

I don't think so. The sentence indicates that backpropagation works well on certain networks. --Eyrian 05:49, 16 January 2007 (UTC)[reply]

Alternatives

Actually i would like some links to alternative methods for teaching multi-layered networks. Maybe a see also-section —Preceding unsigned comment added by 80.164.105.90 (talk) 07:11, 8 September 2007 (UTC)[reply]

Algorithm

Citation: Material seems to be directly copied from http://www2.cs.uh.edu/~ceick/ai/nn.ppt (slide 6); if it is, a reference should be included.

Explanation: I'm not understanding how to compute the per-node error. How much blame should be subtracted from each connection's weight, and how much should be propagated back to the inputing nodes'? (I'm assuming that it backpropagates according to the proportion of error contributed by each earlier connection.) --Jesdisciple 03:54, 8 October 2007 (UTC)[reply]

I detail the error propagation algorithm in multilayer perceptron. Perhaps some of the math should be copied to here? SamuelRiv (talk) 13:04, 18 November 2007 (UTC)[reply]

Connection to Biology

term "Backpropagation" is ambiguous

Someone added a section on backpropagation of action potential in biological neurons. I just backed this change out. The material is irrelevant in the context of this article. (In this article, backpropgation refers to reverse propagation of errors through the computation graph. As far as anyone knows, this has nothing to do with the potential jump of action potentials invading the proximal regions of the dendritic arbor) Barak (talk) 19:10, 17 November 2007 (UTC)[reply]

How about some tact here. I added 2000 bytes of information, so maybe there should be some discussion before wiping it. The idea of backpropagation as a biological model, as it has been applied by several research groups (IBNS for one) is useless without some justification. A computational model is effectively a neuroscience model, and any information added to this effect is useful. Anyway, if more context or a separate article is needed, then we can add that. But please don't just blank a good half-hour of factual and relevant information. SamuelRiv (talk) 21:14, 17 November 2007 (UTC)[reply]

I'm sorry if my change seemed abrupt. The material you added is both interesting and (I say this as someone knowledgeable in the field) entirely correct. The reason I removed it is because it belongs in a different article. Perhaps in an article about action potentials in neurons, or in one about Hebbian synaptic plasticity, or some place like that. But unfortunately, it is not appropriate here: "back propagation" of potential from the soma into the dendritic arbor is a different sense of the term "backpropagation" than that being used here, which refers to a mathematical construction for an estimation problem. Please do not take this personally though, and I am serious when I say that the text you wrote deserves a home. It just doesn't fit here. Barak (talk) 11:05, 18 November 2007 (UTC)[reply]

Thanks for the detailed response. I disagree, obviously, because from my perspective artificial and biological neural nets are inseparable concepts. The nonlinear activation function, for example, was partially inspired by biology, and in this case the success of backprop as an algorithm encouraged biologists to find evidence for it in biological systems. I don't think there is enough there for it to warrant a separate article (it still seems like a lot of speculation as to the algorithmic purpose of the retrograde signal). It belongs here because backpropagation is a feedforward algorithm, where as Hebbian learning, for example, has no such bias (it could work with a feedback loop to transmit errors, for instance). I don't know, personally I can't think of a more appropriate place for this information, especially as this article is pretty short already. I'll think about it more, though, and see if there might be a better fit somewhere. SamuelRiv (talk) 13:02, 18 November 2007 (UTC)[reply]

I'm sorry but what you're saying doesn't really support the idea that this particular material should be included in this particular article. You're summarizing a biology paper that happens to have the word "backpropagation" in its title. But that paper is using that word in a completely irrelevant way. Any other paper about some detailed electrophyiological measurement would be just as relevant. This article is short, yes. There are many relevant hunks of information that could be added. Eg: a review of backprop-based work that has cast new light on a variety of biological phenomena, such as NETtalk and language acquisition, Qian et al's shape-from-shading model and the bar detector story. Or some hooks into the literature of practical applications could be added: PAPnet is a billion-dollar thousands-of-lives success of backprop. Credit card transaction validation. Loan application testing. ALVINN. Face detection. Facial gender discrimination. Or hooks into the relevant theoretical literature: theory of generalization in such networks, VC dimension bounds, early stopping, optimal brain damage. Convolutive networks and TDNNs enjoy wide application (leNet is the gold standard for handwritten digit recognition, for example.) Pointers to implementations that are really useful, like "lush", so people can go out and build real systems. But a summary of a bit of completely irrelevant biology data just doesn't belong here. (You mention Leon Cooper's group at Brown as if it makes this relevant. But that group doesn't work on backprop or related methods.) Barak (talk) 19:54, 18 November 2007 (UTC)[reply]

You still have not addressed why it is irrelevant. Here we have an algorithm that essentially cannot function without some method of backward propagation of signals (simple feedback loops don't work by nature of the correction algorithm). This is used in neural networks, which are almost by definition the best existing model of a functional brain. So as soon as you have an algorithm, you have a brain model. The Brown group works on brain models from both a top-down and bottom-up perspective, so they address both algorithms (mostly RBNs - granted I haven't seen any papers specifically working with backprop) and biological models. So as a model of the brain, which the backpropagation algorithm is, we would naturally want some evidence for it. That's where this addition comes into play.

Now I'd rather have a discussion than an edit war, but I fail to see why, in a dispute, you'd rather default to having less information visible in the article than more. SamuelRiv (talk) 20:15, 18 November 2007 (UTC)[reply]

Addendum: I have tagged this page on Wikipedia:Third opinion. Additionally, I want to clarify that IBNS has done work on backprop algorithms, but no paper deals specifically with them. Most use multilayer perceptrons in comparison to RBNs. SamuelRiv (talk) 20:43, 18 November 2007 (UTC)[reply]

Let me answer your actual question above, which I paraphrase: "Here we have an algorithm that cannot function without some method of backward propagation of signals ... why is evidence for backward (i.e., into the dendritic arbour) propagation of action potentials not relevant?" The reason is that what is being retrograde transmitted into the dendrites are action potentials, which is the activity of the neurons. It is not the error, which is what would need to be transmitted to implement backprop. It is backward something, but the wrong thing! It is not backward propagation of errors. Barak (talk) 21:00, 18 November 2007 (UTC)[reply]

Okay, new paragraph because I'm sick of typing colons. In response, your point about error is fair, in that error or its energy is not necessarily what is being backpropagated. But it can be. Imagine a 3-layer perceptron that wants to learn to match its output to a memorized binary pattern, say. So its output layer connects in a linear 1-1 fashion with the memory, producing a spike frequency in the memory layer that increases with the amount of error in the output, that is, if the output approaches "0" frequency, the memory wants frequency "1" and spikes at full frequency to signal that error. Once the signals match, the memory stops spiking. So each of these error spikes backpropagates, and then you get the full LTP/LTD effects (this last sentence has not yet been observed). I'll see if I can find a paper on this mechanism when I get to the office. SamuelRiv (talk) 15:15, 19 November 2007 (UTC)[reply]

Addendum - see [1] if you can open it for a paper outlining a biological backprop mechanism. Does information from this at least deserve to be in the article? SamuelRiv (talk) 19:04, 19 November 2007 (UTC)[reply]

That's not a bad paper, although there are two other efforts I'd look at first. One is a paper by Paul Werbos on implementing backprop in neural hardware. The other is the "Recirculation Neural network" architecture (G. E. Hinton and J. L. McClelland, J. L., "Learning representations by recirculation", in D. Z. Anderson (ed), "Neural Information Processing Systems," pages 358-366, AIP, [2]), which was the start of a line of research that, in a modern probabilistic context, developed into the Helmholtz Machine and the Restricted Boltzmann Machine architectures.

A wikipedia article on Biologically Plausible NN Learning Algorithms would be a useful place to have pointers to these various strands of research, and to put them all in perspective. Barak (talk) 10:51, 26 November 2007 (UTC)[reply]

Yikes, another thing on my "to do" list. I'll explore making that article (I have to look at the literature on biological SVMs and CMs first). Meanwhile, thanks for the papers, except I can't find the Werbos one - do you remember the title or keywords? SamuelRiv (talk) 16:28, 26 November 2007 (UTC)[reply]

Request for comments on biological evidence section

The question is whether or not a section which was alternately added and removed is pertinent, relevant, and suitable for inclusion in this article. — Athaenara ✉ 03:16, 19 November 2007 (UTC)[reply]

Responding to RfC. This article is about a computer algorithm, while the section in question describes a biological process that works in the same way. I don't believe the section should be included, as it is of little relevanse to what I believe the article should be about (the algorithm and its properties). But the information presented in the section is interesting and seems to be notable, so I beleive it should be added to a seperate article and linked to from this article. Labongo (talk) 11:58, 19 November 2007 (UTC)[reply]

My main problem is that every neural network algorithm doubles as a biological model for the time being, because we simply don't know for sure what algorithms the brain uses. We have some evidence for several different algorithms, so where do we put all that information? SamuelRiv (talk) 15:17, 19 November 2007 (UTC)[reply]

The information should probably be in the other article or some related article, since how the brain works is clearly outside the scope of this article. Labongo (talk) 16:00, 20 November 2007 (UTC)[reply]

I created Neural backpropagation, as per the third opinion. However, I frankly think this is ridiculous, as the literature only refers to the phenomenon as "backpropagation" and there is enough overlap between these topics that they deserve separate articles. I'd also like to say that repeatedly deleting large-scale factual and arguably relevant contributions to an article seems to me to violate the spirit of Wikipedia, as stated in its revert policies. Someone will inevitably come along and suggest these articles be merged, if anyone ever reads them. SamuelRiv (talk) 23:58, 20 November 2007 (UTC)[reply]

I added otheruses templates to both algorithms, to ensure that there is no confusion with regard to the term. Since the issues seems to be resolved I have also removed to RfC tag from the talkpage. Labongo (talk) 15:32, 22 November 2007 (UTC)[reply]

Poor summary?

It's a very good idea to have a summary of an algorithm like this, but I don't find the summary as it currently stands, to be very good. Thomas Tvileren (talk) 10:01, 19 January 2009 (UTC)[reply]

Much better now! Thanks to Strife911. Thomas Tvileren (talk) 17:42, 23 February 2011 (UTC)[reply]

Invented by

The article attributed the first description to "Paul Werbos in 1974". According to my source this is incorrect, I have updated the text to cite Bryson and Ho 1969. I am mentioning this in Talk since I am surprised to see a long-standing error of such importance. 78.105.234.140 (talk) 17:01, 15 October 2009 (UTC)[reply]

Relationship to Delta Rule?

The sentence enclosing the link to the Delta rule for this page states:

"It [Back-propagation] is a supervised learning method, and is an implementation of the Delta rule."

Implying that back-propagation is a subset of Delta Rule, but for the link to Back-propagation from the Delta Rule page it states:

"It [The Delta Rule] is a special case of the more general backpropagation algorithm."

Can this be? Or is the wording simply a bit confusing? —Preceding unsigned comment added by 72.221.98.197 (talk) 02:44, 20 December 2010 (UTC)[reply]

If you look in books, it appears that the delta rule only works at outputs where you have a target, and the back-prop is a generalization of it. So the first quoted sentence should have "implementation" changed to "generalization", I think. Dicklyon (talk) 03:17, 20 December 2010 (UTC)[reply]

If you read the original article by Rumelhart, Hinton & Williams, they explicitly call it the "generalized delta rule." The original delta rule applies to linear single-layer networks, although it can be shown to be identical to the perceptron algorithm. The generalization is to nonlinear networks with a squashing function, and to multilayer networks (hence, "backpropagation", which is unnecessary in single layer networks). -gary cottrell —Preceding unsigned comment added by 75.80.97.109 (talk) 10:00, 25 January 2011 (UTC)[reply]

A new algorithm proposal

Could we change the layout of the algorithm? Why not make each instruction a link to additional information? But should this information be put in the same page, or a one relating to the 'next level of detail', for the algorithm? —Preceding unsigned comment added by 139.80.206.172 (talk) 08:57, 1 May 2011 (UTC)[reply]

Equations

The article is missing the equation(s) for back propagation learning algorithms.

@@ Line 105: / Line 105: @@
 Could we change the layout of the algorithm? Why not make each instruction a link to additional information? But should this information be put in the same page, or a one relating to the 'next level of detail', for the algorithm?  <span style="font-size: smaller;" class="autosigned">—Preceding [[Wikipedia:Signatures|unsigned]] comment added by [[Special:Contributions/139.80.206.172|139.80.206.172]] ([[User talk:139.80.206.172|talk]]) 08:57, 1 May 2011 (UTC)</span><!-- Template:UnsignedIP --> <!--Autosigned by SineBot-->
+== Equations ==
+The article is missing the equation(s) for back propagation learning algorithms.