Recursive neural network: Difference between revisions

Content deleted Content added

Inline

Revision as of 14:07, 10 September 2017

A recursive neural network (RNN) is a kind of deep neural network created by applying the same set of weights recursively over a structure, to produce a structured prediction over variable-size input structures, or a scalar prediction on it, by traversing a given structure in topological order. RNNs have been successful for instance in learning sequence and tree structures in natural language processing, mainly phrase and sentence continuous representations based on word embedding. RNNs have first been introduced to learn distributed representations of structure, such as logical terms.^[1] Models and general frameworks have been developed in further works since the 90s.^[2]^[3]

Architectures

Basic

In the most simple architecture, nodes are combined into parents using a weight matrix that is shared across the whole network, and a non-linearity such as tanh. If c₁ and c₂ are n-dimensional vector representation of nodes, their parent will also be an n-dimensional vector, calculated as

$p_{1,2}=\tanh \left(W[c_{1};c_{2}]\right)$

Where W is a learned $n\times 2n$ weight matrix.

This architecture, with a few improvements, has been used for successfully parsing natural scenes and for syntactic parsing of natural language sentences.^[4]

Recursive Cascade Correlation (RecCC)

RecCC is a constructive neural network approach to deal with tree domains^[2] with pioneering applications to chemistry^[5] and extension to directed acyclic graphs.^[6]

Unsupervised RNN

A framework for unsupervised RNN has been introduced in 2004.^[7]^[8]

Tensor

Recursive neural tensor networks use one, tensor-based composition function for all nodes in the tree.^[9]

Training

Stochastic gradient descent

Typically, stochastic gradient descent (SGD) is used to train the network. The gradient is computed using backpropagation through structure (BPTS), a variant of backpropagation through time used for recurrent neural networks.

Properties

Universal approximation capability of RNN over trees has been proved in literature.^[10]^[11]

Related models

Recurrent Neural Networks

Recurrent neural networks are recursive artificial neural networks with a certain structure: that of a linear chain. Whereas recursive neural networks operate on any hierarchical structure, combining child representations into parent representations, recurrent neural networks operate on the linear progression of time, combining the previous time step and a hidden representation into the representation for the current time step.

Tree Echo State Networks

An efficient approach to implement recursive neural networks is given by the Tree Echo State Network,^[12] within the Reservoir Computing paradigm.

Extension to Graphs

Extensions to graphs include Graph Neural Network (GNN),^[13] Neural Network for Graphs (NN4G),^[14] and more recently convolutional neural networks for graphs.

References

^ Goller, C.; Küchler, A. "Learning task-dependent distributed representations by backpropagation through structure". Neural Networks, 1996., IEEE. doi:10.1109/ICNN.1996.548916.
^ ^a ^b Sperduti, A.; Starita, A. (1997-05-01). "Supervised neural networks for the classification of structures". IEEE Transactions on Neural Networks. 8 (3): 714–735. doi:10.1109/72.572108. ISSN 1045-9227.
^ Frasconi, P.; Gori, M.; Sperduti, A. (1998-09-01). "A general framework for adaptive processing of data structures". IEEE Transactions on Neural Networks. 9 (5): 768–786. doi:10.1109/72.712151. ISSN 1045-9227.
^ Socher, Richard; Lin, Cliff; Ng, Andrew Y.; Manning, Christopher D. "Parsing Natural Scenes and Natural Language with Recursive Neural Networks" (PDF). The 28th International Conference on Machine Learning (ICML 2011).
^ Bianucci, Anna Maria; Micheli, Alessio; Sperduti, Alessandro; Starita, Antonina (2000). "Application of Cascade Correlation Networks for Structures to Chemistry". Applied Intelligence. 12 (1–2): 117–147. doi:10.1023/A:1008368105614. ISSN 0924-669X.
^ Micheli, A.; Sona, D.; Sperduti, A. (2004-11-01). "Contextual processing of structured data by recursive cascade correlation". IEEE Transactions on Neural Networks. 15 (6): 1396–1410. doi:10.1109/TNN.2004.837783. ISSN 1045-9227.
^ Hammer, Barbara; Micheli, Alessio; Sperduti, Alessandro; Strickert, Marc (2004). "Recursive self-organizing network models". Neural Networks. 17: 1061–1085.
^ Hammer, Barbara; Micheli, Alessio; Sperduti, Alessandro; Strickert, Marc (2004-03-01). "A general framework for unsupervised processing of structured data". Neurocomputing. 57: 3–35. doi:10.1016/j.neucom.2004.01.008.
^ Socher, Richard; Perelygin, Alex; Y. Wu, Jean; Chuang, Jason; D. Manning, Christopher; Y. Ng, Andrew; Potts, Christopher. "Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank" (PDF). EMNLP 2013.
^ Hammer, Barbara (2007-10-03). Learning with Recurrent Neural Networks. Springer. ISBN 9781846285677.
^ Hammer, Barbara; Micheli, Alessio; Sperduti, Alessandro (2005-05-01). "Universal Approximation Capability of Cascade Correlation for Structures". Neural Computation. 17 (5): 1109–1159. doi:10.1162/0899766053491878.
^ Gallicchio, Claudio; Micheli, Alessio (2013-02-04). "Tree Echo State Networks". Neurocomputing. 101: 319–337. doi:10.1016/j.neucom.2012.08.017.
^ Scarselli, F.; Gori, M.; Tsoi, A. C.; Hagenbuchner, M.; Monfardini, G. (2009-01-01). "The Graph Neural Network Model". IEEE Transactions on Neural Networks. 20 (1): 61–80. doi:10.1109/TNN.2008.2005605. ISSN 1045-9227.
^ Micheli, A. (2009-03-01). "Neural Network for Graphs: A Contextual Constructive Approach". IEEE Transactions on Neural Networks. 20 (3): 498–511. doi:10.1109/TNN.2008.2010350. ISSN 1045-9227.

This artificial intelligence-related article is a stub. You can help Wikipedia by expanding it.

[1] Goller, C.; Küchler, A. "Learning task-dependent distributed representations by backpropagation through structure". Neural Networks, 1996., IEEE. doi:10.1109/ICNN.1996.548916.

[:0-2] Sperduti, A.; Starita, A. (1997-05-01). "Supervised neural networks for the classification of structures". IEEE Transactions on Neural Networks. 8 (3): 714–735. doi:10.1109/72.572108. ISSN 1045-9227.

[3] Frasconi, P.; Gori, M.; Sperduti, A. (1998-09-01). "A general framework for adaptive processing of data structures". IEEE Transactions on Neural Networks. 9 (5): 768–786. doi:10.1109/72.712151. ISSN 1045-9227.

[4] Socher, Richard; Lin, Cliff; Ng, Andrew Y.; Manning, Christopher D. "Parsing Natural Scenes and Natural Language with Recursive Neural Networks" (PDF). The 28th International Conference on Machine Learning (ICML 2011).

[5] Bianucci, Anna Maria; Micheli, Alessio; Sperduti, Alessandro; Starita, Antonina (2000). "Application of Cascade Correlation Networks for Structures to Chemistry". Applied Intelligence. 12 (1–2): 117–147. doi:10.1023/A:1008368105614. ISSN 0924-669X.

[6] Micheli, A.; Sona, D.; Sperduti, A. (2004-11-01). "Contextual processing of structured data by recursive cascade correlation". IEEE Transactions on Neural Networks. 15 (6): 1396–1410. doi:10.1109/TNN.2004.837783. ISSN 1045-9227.

[7] Hammer, Barbara; Micheli, Alessio; Sperduti, Alessandro; Strickert, Marc (2004). "Recursive self-organizing network models". Neural Networks. 17: 1061–1085.

[8] Hammer, Barbara; Micheli, Alessio; Sperduti, Alessandro; Strickert, Marc (2004-03-01). "A general framework for unsupervised processing of structured data". Neurocomputing. 57: 3–35. doi:10.1016/j.neucom.2004.01.008.

[9] Socher, Richard; Perelygin, Alex; Y. Wu, Jean; Chuang, Jason; D. Manning, Christopher; Y. Ng, Andrew; Potts, Christopher. "Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank" (PDF). EMNLP 2013.

[10] Hammer, Barbara (2007-10-03). Learning with Recurrent Neural Networks. Springer. ISBN 9781846285677.

[11] Hammer, Barbara; Micheli, Alessio; Sperduti, Alessandro (2005-05-01). "Universal Approximation Capability of Cascade Correlation for Structures". Neural Computation. 17 (5): 1109–1159. doi:10.1162/0899766053491878.

[12] Gallicchio, Claudio; Micheli, Alessio (2013-02-04). "Tree Echo State Networks". Neurocomputing. 101: 319–337. doi:10.1016/j.neucom.2012.08.017.

[13] Scarselli, F.; Gori, M.; Tsoi, A. C.; Hagenbuchner, M.; Monfardini, G. (2009-01-01). "The Graph Neural Network Model". IEEE Transactions on Neural Networks. 20 (1): 61–80. doi:10.1109/TNN.2008.2005605. ISSN 1045-9227.

[14] Micheli, A. (2009-03-01). "Neural Network for Graphs: A Contextual Constructive Approach". IEEE Transactions on Neural Networks. 20 (3): 498–511. doi:10.1109/TNN.2008.2010350. ISSN 1045-9227.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

@@ Line 17: / Line 17: @@
 === Recursive Cascade Correlation (RecCC) ===
-RecCC is a constructive neural network approach to deal with tree domains<ref name=":0" />  with pioneering applications to chemistry<ref>{{Cite journal|last=Bianucci|first=Anna Maria|last2=Micheli|first2=Alessio|last3=Sperduti|first3=Alessandro|last4=Starita|first4=Antonina|year=2000|title=Application of Cascade Correlation Networks for Structures to Chemistry|url=http://link.springer.com/article/10.1023/A:1008368105614|journal=Applied Intelligence|language=en|volume=12|issue=1-2|pages=117–147|doi=10.1023/A:1008368105614|issn=0924-669X|via=}}</ref> and extension to [[directed acyclic graph]]s.<ref>{{Cite journal|last=Micheli|first=A.|last2=Sona|first2=D.|last3=Sperduti|first3=A.|date=2004-11-01|title=Contextual processing of structured data by recursive cascade correlation|url=http://ieeexplore.ieee.org/document/1353277/|journal=IEEE Transactions on Neural Networks|volume=15|issue=6|pages=1396–1410|doi=10.1109/TNN.2004.837783|issn=1045-9227}}</ref>
+RecCC is a constructive neural network approach to deal with tree domains<ref name=":0" />  with pioneering applications to chemistry<ref>{{Cite journal|last=Bianucci|first=Anna Maria|last2=Micheli|first2=Alessio|last3=Sperduti|first3=Alessandro|last4=Starita|first4=Antonina|year=2000|title=Application of Cascade Correlation Networks for Structures to Chemistry|url=https://link.springer.com/article/10.1023/A:1008368105614|journal=Applied Intelligence|language=en|volume=12|issue=1-2|pages=117–147|doi=10.1023/A:1008368105614|issn=0924-669X|via=}}</ref> and extension to [[directed acyclic graph]]s.<ref>{{Cite journal|last=Micheli|first=A.|last2=Sona|first2=D.|last3=Sperduti|first3=A.|date=2004-11-01|title=Contextual processing of structured data by recursive cascade correlation|url=http://ieeexplore.ieee.org/document/1353277/|journal=IEEE Transactions on Neural Networks|volume=15|issue=6|pages=1396–1410|doi=10.1109/TNN.2004.837783|issn=1045-9227}}</ref>
 === Unsupervised RNN ===
@@ Line 31: / Line 31: @@
 == Properties ==
-Universal approximation capability of RNN over trees has been proved in literature.<ref>{{Cite book|url=https://books.google.it/books?hl=en&lr=&id=H3_1BwAAQBAJ&oi=fnd&pg=PA1&ots=Q0Flua0hSY&sig=3FtZGOm6LWNnQ53fMK1Fu32qr-M#v=onepage&q&f=false|title=Learning with Recurrent Neural Networks|last=Hammer|first=Barbara|date=2007-10-03|publisher=Springer|isbn=9781846285677|language=en}}</ref><ref>{{Cite journal|last=Hammer|first=Barbara|last2=Micheli|first2=Alessio|last3=Sperduti|first3=Alessandro|date=2005-05-01|title=Universal Approximation Capability of Cascade Correlation for Structures|url=http://cognet.mit.edu/journal/10.1162/0899766053491878|journal=Neural Computation|language=en|volume=17|issue=5|pages=1109–1159|doi=10.1162/0899766053491878}}</ref>
+Universal approximation capability of RNN over trees has been proved in literature.<ref>{{Cite book|url=https://books.google.com/books?hl=en&lr=&id=H3_1BwAAQBAJ&oi=fnd&pg=PA1&ots=Q0Flua0hSY&sig=3FtZGOm6LWNnQ53fMK1Fu32qr-M#v=onepage&q&f=false|title=Learning with Recurrent Neural Networks|last=Hammer|first=Barbara|date=2007-10-03|publisher=Springer|isbn=9781846285677|language=en}}</ref><ref>{{Cite journal|last=Hammer|first=Barbara|last2=Micheli|first2=Alessio|last3=Sperduti|first3=Alessandro|date=2005-05-01|title=Universal Approximation Capability of Cascade Correlation for Structures|url=http://cognet.mit.edu/journal/10.1162/0899766053491878|journal=Neural Computation|language=en|volume=17|issue=5|pages=1109–1159|doi=10.1162/0899766053491878}}</ref>
 ==Related models==