Sequence Model

Used when input/output data is a sequence. Used on tasks such as speech recongition, music generation, sentiment classification, DNA sequence analysis, Machine translation, Video acivity recongition etc.

Notation:

  • xx : input. We use x<t>x^{<t>} to denote the tt's sequence of the input.

  • yy: output. We use y<t>y^{<t>} to denote the tt's sequence of the output.

  • TT can be used to denote length. TxT_x, TyT_y denote length of input, output.

  • Xi,yiX_i, y_i can be used to denote ithi^{th} training/testing data. Txi,TyiT_x^i, T_y^i denote its length.

Word representation: we can use a vocabulary vector, and then use one-hot encoding for all the words.

RNN

Problem with standard NN: input, output can have different length. Doesn't share features learned across different positions of text.

RNN:

Activation of previous input (word) can be used in the next one. So allow output of one node to affect subsequence nodes. (Unidirection RNN)

Forward prop: Initialize a<0>=0a^{<0>} = 0. a<t>=g(waaa<t1>+waxx<t>+ba)a^{<t>} = g(w_{aa}a^{<t-1>} + w_{ax}x^{<t>} + b_a), y^<t>=g(wyaa<t>+by)\hat{y}^{<t>} = g(w_{ya}a^{<t>} + b_y) where gg is an activation function, waaw_{aa} is the weight for learning calculating activating layer from activation. waxw_{ax} is weight to learn calculate a from x, and so on...

Common activation - tanh, sometimes relu for aa . For yy , can be sigmoid, or anything.(depend on problem)

Last updated