Jump to content

User:JPxG/Bi-LSTM

From Wikipedia, the free encyclopedia
user · talk · index (prefix) · Wbot · Hbot · logs (CSD · XfD · PROD) V·E

Bidirectional long short-term memory is an artificial neural network architecture used for deep learning.

Background

[edit]

Since the origin of computing, artificial intelligence has been an object of study, but during the second half of the 20th century, processing power became more easily accessible and computer-based research became more commonplace. The term "machine learning", used as early as 1959 by IBM researcher Arthur Samuel,[1] currently encompasses a broad variety of statistical learning, data science and neural network approaches to computational problems (often falling under the aegis of artificial intelligence). The first neural network, the perceptron, was introduced in 1957 by Frank Rosenblatt.[2] This machine attempted to recognize pictures and classify them into categories. It consisted of a network of "input neurons" and "output neurons"; each input neuron was connected to every single output neuron, with "weights" (set with potentiometers) determining the strength of each connection's influence on output.[3] The architecture of Rosenblatt's perceptron is what would now be referred to as a fully-connected single-layer feed-forward neural network (FFNN). Since then, many different innovations have occurred, the most significant being the development of deep learning models in which one or more "layers" of neurons exists between the input and output.[4][5]

Neural networks are typically initialized with random weights, and "trained" to give consistently correct output for a known dataset (the "training set") using backpropagation to perform gradient descent, in which a system of equations is used to determine the optimal adjustment of all weights in the entire network for a given input/output example.[5][4] In traditional feed-forward neural networks (like Rosenblatt's perceptron), each layer processes output from the previous layer only. Information does not flow backwards, which means that its structure contains no "cycles".[4] In contrast, a recurrent neural network (RNN) has at least one "cycle" of activation flow, where neurons can be activated by neurons in subsequent layers.[4]

RNNs, unlike FFNNs, are suited to processing sequential data, since they are capable of encoding different weights (and producing different output) for the same input based on previous activation states. That is to say, a text-prediction model using recurrence could process the string "The dog ran out of the house, down the street, loudly" and produce "barking", while producing "meowing" for the same input sequence featuring "cat" in the place of "dog". Achieving the same output from a purely feed-forward neural network, on the other hand, would require separate activation pathways to be trained for both sentences in their entirety.[6][7]

However, RNNs and FFNNs are both vulnerable to the "vanishing gradient problem"; since gradients (stored as numbers of finite precision) must be backpropagated over every layer of a model to train it, a model with a large number of layers tends to see gradients "vanish" to zero or "explode" to infinity before getting all the way across. To resolve this problem, long short-term memory (LSTM) models were introduced by Sepp Hochreiter and Jürgen Schmidhuber in 1995—1997, featuring a novel architecture of multiple distinct "cells" with "input", "output" and "forget" gates.[8][9][10]. LSTMs would find use in a variety of tasks that RNNs performed poorly at, like learning fine distinctions between rhythmic pattern sequences.[11]

While LSTMs proved useful for a variety of applications, like handwriting recognition,[12] they remained limited in their ability to process context; a unidirectional RNN or LSTM's output can only be influenced by previous sequence items.[6] Similar to how the history of the Roman Empire is contextualized by its decline, earlier items in a sequence of images or words tend to take on different meanings based on later items. One example is the following sentence:

He loved his bird more than anything, and cared for it well, and was very distraught to find it had a broken propeller.

Here, the "bird" is being used as a slang term for an airplane, but this only becomes apparent upon parsing the last word ("propeller"). While a human reading this sentence can update their interpretation of the first part after reading the second, a unidirectional neural network (whether feedforward, recurrent, or LSTM) cannot.[6] To provide this capability, bidirectional LSTMs were created. Bidirectional RNNs were first described in 1997 by Schuster and Paliwal as an extension of RNNs.[13]

NLP crap

[edit]

Bidirectional algorithms have long been used in domains outside of deep learning; in 2011, the state of the art in part-of-speech (POS) tagging classifiers consisted of classifiers trained on windows of text which then fed into bidirectional decoding algorithms during inference; Collobert et al. cited examples of high-performance POS tagging systems whose decoding systems' bidirectionality was instantiated in dependency networks and Viterbi decoders.[14]


2015, Wang et al, unified tagging solution using bi-lstm rnn with word embedding[15]

2015, Ling et al, compositional character models for open vocabulary word representation[16]

2015, Kawakami et al, representing words in context with multilingual supervision[17]

2016, Li et al, sentence relation modeling with auxiliary character-level embedding[18]

Speech / handwriting

[edit]

In a 2005 paper, Graves et al. used bidirectional LSTMs for improved phoneme classification and recognition.[19]

In a 2013 paper, Graves et al. used deep bidirectional LSTM for hybrid speech recognition.[20]

2014, Zhang et al, distant speech recognition using Highway LSTM RNNs[21]

2016, Zayats et al, disfluency detection with bi-LSTM[22]


In a 2007 paper, Liwicki et al. did a heckin novel approach to on-line handwriting recognition based on bi-LSTM.[23]

Sequence tagging

[edit]

2015, Huang et al, bi-LSTM-CRFs for sequence tagging[24]

Else

[edit]

2016, Kiperwasser et al., dependency parsing using bi-LSTM feature representations[25]

2016, Zhang et al., driving behavior recognition model with multi-scale cnn and bi-lstm[26]

2021, Deligiannidis et al. analyzed performance versus complexity of bi-RNNs versus volterra nonlinear equalizers in digital coherent systems.[27]

2021, Oluwalade et al, human activity recognition using smartphone and smartwatch sensor data[28]

2021, Dang et al, lstm models for malware classification (are they bidirectional?)[29]

2016, Lample et al, neural architectures for named entity recognition[30]

2016, Wang et al, image captioning with bi-LSTM[31]



References

[edit]
  1. ^ Samuel, Arthur (1959). "Some Studies in Machine Learning Using the Game of Checkers". IBM Journal of Research and Development. 3 (3): 210–229. CiteSeerX 10.1.1.368.2254. doi:10.1147/rd.33.0210.
  2. ^ Rosenblatt, Frank (1957). "The Perceptron—a perceiving and recognizing automaton". Report 85-460-1. Cornell Aeronautical Laboratory.
  3. ^ Bishop, Christopher M. (2006). Pattern Recognition and Machine Learning. Springer. ISBN 0-387-31073-8.
  4. ^ a b c d Wilson, Bill (24 June 2012). "The Machine Learning Dictionary". www.cse.unsw.edu.au. Archived from the original on 26 August 2018. Retrieved 19 January 2021.
  5. ^ a b Goodfellow, Ian; Bengio, Yoshua; Courville, Aaron (2016). "6.5 Back-Propagation and Other Differentiation Algorithms". Deep Learning. MIT Press. pp. 200–220. ISBN 9780262035613. Archived from the original on 2018-01-27. Retrieved 2021-03-16.
  6. ^ a b c Bajpai, Akash (23 February 2019). "Recurrent Neural Networks: Deep Learning for NLP". Towards Data Science. Archived from the original on 16 March 2021. Retrieved 19 January 2021.
  7. ^ Olah, Chris; Carter, Shan (8 September 2016). "Attention and Augmented Recurrent Neural Networks". Distill. Archived from the original on 22 December 2020. Retrieved 22 January 2021.
  8. ^ Sepp Hochreiter; Jürgen Schmidhuber (21 August 1995), Long Short Term Memory, Wikidata Q98967430
  9. ^ Sepp Hochreiter; Jürgen Schmidhuber (1997). "LSTM can Solve Hard Long Time Lag Problems" (PDF). Advances in Neural Information Processing Systems 9. Advances in Neural Information Processing Systems. Wikidata Q77698282.
  10. ^ Sepp Hochreiter; Jürgen Schmidhuber (1997). "Long short-term memory". Neural Computation. 9 (8): 1735–1780. doi:10.1162/neco.1997.9.8.1735. PMID 9377276. S2CID 1915014. Archived from the original on 2021-01-22. Retrieved 2021-03-08.
  11. ^ Felix A., Gers; Nicol N., Schraudolph; Jürgen, Schmidhuber (2002). "Learning precise timing with LSTM recurrent networks". Journal of Machine Learning Research. 3 (1): 115–143. Archived from the original on 2019-04-04. Retrieved 2021-03-16.
  12. ^ Graves, A.; Liwicki, M.; Fernández, S.; Bertolami, R.; Bunke, H.; Schmidhuber, J. (May 2009). "A Novel Connectionist System for Unconstrained Handwriting Recognition". IEEE Transactions on Pattern Analysis and Machine Intelligence. 31 (5): 855–868. CiteSeerX 10.1.1.139.4502. doi:10.1109/tpami.2008.137. ISSN 0162-8828. PMID 19299860. S2CID 14635907.
  13. ^ Schuster, Mike; Paliwal, Kuldip K. (December 1997). "Bidirectional Recurrent Neural Networks". IEEE Transactions on Signal Processing. 45 (11): 2673–2681. Bibcode:1997ITSP...45.2673S. doi:10.1109/78.650093. S2CID 18375389.
  14. ^ Collobert, Ronan; Weston, Jason; Bottou, Leon; Karlen, Michael; Kavukcuoglu, Koray; Kuksa, Pavel (2011). "Natural Language Processing (Almost) from Scratch" (PDF). Journal of Machine Learning Research. 12: 2493–2537. Archived (PDF) from the original on 2020-12-10. Retrieved 2021-03-08.
  15. ^ Wang, Peilu; Qian, Yao; Soong, Frank K.; He, Lei; Zhao, Hai (2015). "A Unified Tagging Solution: Bidirectional LSTM Recurrent Neural Network with Word Embedding". arXiv:1511.00215 [cs.CL]. {{cite arXiv}}: Unknown parameter |url= ignored (help)
  16. ^ Ling, Wang; Luís, Tiago; Marujo, Luís; Ramón Fernandez Astudillo; Amir, Silvio; Dyer, Chris; Black, Alan W.; Trancoso, Isabel (2015). "Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation". arXiv:1508.02096 [cs.CL]. {{cite arXiv}}: Unknown parameter |url= ignored (help)
  17. ^ Kawakami, Kazuya; Dyer, Chris (2015). "Learning to Represent Words in Context with Multilingual Supervision". arXiv:1511.04623 [cs.CL]. {{cite arXiv}}: Unknown parameter |url= ignored (help)
  18. ^ Li, Peng; Huang, Heng (2016). "Enhancing Sentence Relation Modeling with Auxiliary Character-level Embedding". arXiv:1603.09405 [cs.CL]. {{cite arXiv}}: Unknown parameter |url= ignored (help)
  19. ^ Graves, Alex; Fernández, Santiago; Schmidhuber, Jürgen (2005). Bidirectional LSTM networks for improved phoneme classification and recognition (PDF). Springer Berlin Heidelberg. pp. 799–804. Archived (PDF) from the original on 2019-07-07. Retrieved 2021-03-08. {{cite book}}: |work= ignored (help)
  20. ^ Graves, Alan; Jaitly, Navdeep; Mohamed, Abdel-rahman (2013). "Hybrid speech recognition with deep bidirectional LSTM" (PDF). 2013 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU). Archived (PDF) from the original on 2020-06-05. Retrieved 2021-03-08. {{cite journal}}: Cite journal requires |journal= (help)
  21. ^ Zhang, Yu; Chen, Guoguo; Yu, Dong; Yao, Kaisheng; Khudanpur, Sanjeev; Glass, James (2015). "Highway Long Short-Term Memory RNNS for Distant Speech Recognition". arXiv:1510.08983 [cs.NE]. {{cite arXiv}}: Unknown parameter |url= ignored (help)
  22. ^ Zayats, Vicky; Ostendorf, Mari; Hajishirzi, Hannaneh (2016). "Disfluency Detection using a Bidirectional LSTM". arXiv:1604.03209 [cs.CL]. {{cite arXiv}}: Unknown parameter |url= ignored (help)
  23. ^ Liwicki, Marcus; Graves, Alex; Bunke, Horst; Schmidhuber, Jürgen (2007). "A novel approach to on-line handwriting recognition based on bidirectional long short-term memory networks" (PDF). 1. Proceedings of the 9th International Conference on Document Analysis and Recognition. Archived (PDF) from the original on 2019-07-07. Retrieved 2021-03-08. {{cite journal}}: Cite journal requires |journal= (help)
  24. ^ Huang, Zhiheng; Xu, Wei; Yu, Kai (2015). "Bidirectional LSTM-CRF Models for Sequence Tagging". arXiv:1508.01991 [cs.CL]. {{cite arXiv}}: Unknown parameter |url= ignored (help)
  25. ^ Kiperwasser, Eliyahu; Goldberg, Yoav (2016). "Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations". Transactions of the Association for Computational Linguistics. 4: 313–327. arXiv:1603.04351. Bibcode:2016arXiv160304351K. doi:10.1162/tacl_a_00101. S2CID 1642392. Archived from the original on 2021-02-24. Retrieved 2021-03-08.
  26. ^ Zhang, He; Nan, Zhixiong; Yang, Tao; Liu, Yifan; Zheng, Nanning (2021). "A Driving Behavior Recognition Model with Bi-LSTM and Multi-Scale CNN". arXiv:2103.00801 [cs.CV]. {{cite arXiv}}: Unknown parameter |url= ignored (help)
  27. ^ Deligiannidis, Stavros; Mesaritakis, Charis; Bogris, Adonis (2021). "Performance and Complexity Analysis of Bi-Directional Recurrent Neural Network Models Versus Volterra Nonlinear Equalizers in Digital Coherent Systems". Journal of Lightwave Technology. 39 (18): 5791–5798. arXiv:2103.03832. doi:10.1109/JLT.2021.3092415. S2CID 232135347.
  28. ^ Oluwalade, Bolu; Neela, Sunil; Wawira, Judy; Adejumo, Tobiloba; Purkayastha, Saptarshi (2021). "Human Activity Recognition using Deep Learning Models on Smartphones and Smartwatches Sensor Data". arXiv:2103.03836 [eess.SP]. {{cite arXiv}}: Unknown parameter |url= ignored (help)
  29. ^ Dang, Dennis; Fabio Di Troia; Stamp, Mark (2021). "Malware Classification Using Long Short-Term Memory Models". arXiv:2103.02746 [cs.CR]. {{cite arXiv}}: Unknown parameter |url= ignored (help)
  30. ^ Lample, Guillaume; Ballesteros, Miguel; Subramanian, Sandeep; Kawakami, Kazuya; Dyer, Chris (2016). "Neural Architectures for Named Entity Recognition". arXiv:1603.01360 [cs.CL]. {{cite arXiv}}: Unknown parameter |url= ignored (help)
  31. ^ Wang, Cheng; Yang, Haojin; Bartz, Christian; Meinel, Christoph (2016). "Image Captioning with Deep Bidirectional LSTMS". arXiv:1604.00790 [cs.CV]. {{cite arXiv}}: Unknown parameter |url= ignored (help)

Cite error: A list-defined reference named "comparison" is not used in the content (see the help page).