Sequence Model
Used when input/output data is a sequence. Used on tasks such as speech recongition, music generation, sentiment classification, DNA sequence analysis, Machine translation, Video acivity recongition etc.
Notation:
x : input. We use x<t> to denote the t's sequence of the input.
y: output. We use y<t> to denote the t's sequence of the output.
T can be used to denote length. Txβ, Tyβ denote length of input, output.
Xiβ,yiβ can be used to denote ith training/testing data. Txiβ,Tyiβ denote its length.
Word representation: we can use a vocabulary vector, and then use one-hot encoding for all the words.
RNN
Problem with standard NN: input, output can have different length. Doesn't share features learned across different positions of text.
RNN:

Activation of previous input (word) can be used in the next one. So allow output of one node to affect subsequence nodes. (Unidirection RNN)
Forward prop: Initialize a<0>=0. a<t>=g(waaβa<tβ1>+waxβx<t>+baβ), y^β<t>=g(wyaβa<t>+byβ) where g is an activation function, waaβ is the weight for learning calculating activating layer from activation. waxβ is weight to learn calculate a from x, and so on...
Common activation - tanh, sometimes relu for a . For y , can be sigmoid, or anything.(depend on problem)
Last updated