Gated recurrent unit

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

Gated recurrent units (GRUs) are a gating mechanism in recurrent neural networks, introduced in 2014 by Kyunghyun Cho et al.[1] Their performance on polyphonic music modeling and speech signal modeling was found to be similar to that of long short-term memory (LSTM). However, GRUs have been shown to exhibit better performance on smaller datasets.[2]

They have fewer parameters than LSTM, as they lack an output gate.[3]

Architecture[edit]

There are several variations on the full gated unit, with gating done using the previous hidden state and the bias in various combinations, and a simplified form called minimal gated unit.

The operator denotes the Hadamard product in the following.

Fully gated unit[edit]

Gated Recurrent Unit, fully gated version

Initially, for , the output vector is .

Variables

  • : input vector
  • : output vector
  • : update gate vector
  • : reset gate vector
  • , and : parameter matrices and vector

Activation functions

Alternative activation functions are possible, provided that .

Type 1
Type 2
Type 3

Alternate forms can be created by changing and [4]

  • Type 1, each gate depends only on the previous hidden state and the bias.
  • Type 2, each gate depends only on the previous hidden state.
  • Type 3, each gate is computed using only the bias.

Minimal gated unit[edit]

The minimal gated unit is similar to the fully gated unit, except the update and reset gate vector is merged into a forget gate. This also imply that the equation for the output vector must be changed [5]

Variables

  • : input vector
  • : output vector
  • : forget vector
  • , and : parameter matrices and vector

See also[edit]

References[edit]

  1. ^ Cho, Kyunghyun; van Merrienboer, Bart; Gulcehre, Caglar; Bahdanau, Dzmitry; Bougares, Fethi; Schwenk, Holger; Bengio, Yoshua (2014). "Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation". arXiv:1406.1078Freely accessible [cs.CL]. 
  2. ^ Chung, Junyoung; Gulcehre, Caglar; Cho, KyungHyun; Bengio, Yoshua (2014). "Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling". arXiv:1412.3555Freely accessible [cs.NE]. 
  3. ^ "Recurrent Neural Network Tutorial, Part 4 – Implementing a GRU/LSTM RNN with Python and Theano – WildML". Wildml.com. Retrieved May 18, 2016. 
  4. ^ Dey, Rahul; Salem, Fathi M. (2017-01-20). "Gate-Variants of Gated Recurrent Unit (GRU) Neural Networks". arXiv:1701.05923Freely accessible [cs.NE]. 
  5. ^ Heck, Joel; Salem, Fathi M. (2017-01-12). "Simplified Minimal Gated Unit Variations for Recurrent Neural Networks". arXiv:1701.03452Freely accessible [cs.NE].