Gated recurrent unit

Gated recurrent units are a gating mechanism in recurrent neural networks, introduced in 2014. Their performance on polyphonic music modeling and speech signal modeling was found to be similar to that of long short-term memory.^[1] They have fewer parameters than LSTM, as they lack an output gate.^[2]

Architecture

$h_{0}=0$ . $\circ$ denotes the Hadamard product.

{\begin{aligned}z_{t}&=\sigma _{g}(W_{z}x_{t}+U_{z}h_{t-1}+b_{z})\\r_{t}&=\sigma _{g}(W_{r}x_{t}+U_{r}h_{t-1}+b_{r})\\h_{t}&=(1-z_{t})\circ h_{t-1}+z_{t}\circ \sigma _{h}(W_{h}x_{t}+U_{h}(r_{t}\circ h_{t-1})+b_{h})\end{aligned}}

Variables

$x_{t}$ : input vector
$h_{t}$ : output vector
$z_{t}$ : update gate vector
$r_{t}$ : reset gate vector
$W$ , $U$ and $b$ : parameter matrices and vector

Activation functions

$\sigma _{g}$ : The original is a sigmoid function.
$\sigma _{h}$ : The original is a hyperbolic tangent.

References

^ Chung, Junyoung; Gulcehre, Caglar; Cho, KyungHyun; Bengio, Yoshua (2014). "Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling". arXiv:1412.3555 [cs.NE].
^ "Recurrent Neural Network Tutorial, Part 4 – Implementing a GRU/LSTM RNN with Python and Theano – WildML". Wildml.com. Retrieved May 18, 2016.

[MyUser_Arxiv.org_May_18_2016c-1] Chung, Junyoung; Gulcehre, Caglar; Cho, KyungHyun; Bengio, Yoshua (2014). "Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling". arXiv:1412.3555 [cs.NE].

[MyUser_Wildml.com_May_18_2016c-2] "Recurrent Neural Network Tutorial, Part 4 – Implementing a GRU/LSTM RNN with Python and Theano – WildML". Wildml.com. Retrieved May 18, 2016.

[1]

[2]