(Vanilla) Recurrent Neural Network

ht=fw(ht1,xt)h_t = f_w(h_{t-1},x_t) —> ht=tanh(Whhht1+Wxhxt)yt=Whyhth_t = \tanh(W_{hh}h_{t-1}+W_{xh}x_t) \\ y_t = W_{hy}h_t

Sequence 2 Sequence :

Many2One : Encode input sequence in a single vector

One2Many: Produce output sequence from single input vector

image-20250405205644773

Truncated Backpropagation Thru Time: Chunk the data and backprop in the every small chunk . They will remember the hidden weights.

Vanilla RNN Gradient Flow

image-20250405214127290

LSTM: σ\sigma stands for sigmoid function

(ifog)=(σσσtanh)W(ht1xt)ct=fct1+ight=otanh(ct)\begin{aligned}\left(\begin{array}{l}i \\ f \\ o \\ g\end{array}\right) & =\left(\begin{array}{c}\sigma \\ \sigma \\ \sigma \\ \tanh \end{array}\right) W\binom{h_{t-1}}{x_t} \\ c_t & =f \odot c_{t-1}+i \odot g \\ h_t & =o \odot \tanh \left(c_t\right)\end{aligned}

input gate: whether to write the cell

forget gate: whether to erase the cell

output gate: how much to reveal cell

gate gate: how much to write to cell