Generative Pre-Training
https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf
Summary: Large gains on language tasks can be realized by generative pre-training of language model on a diverse corpus of unlabeled text, followed by discriminative fine-tuning on each task. Use task-aware input transformations during fine-tuning to achieve effective transfer while requiring minimal changes to model architecture.
This paper: combination of unsupervised pre-training and supervised fine-tuning. Learn a universal representation that transfers with little adaptation to wide range of tasks. Employ two-stage training procedure. 1 - language modeling objective on unlabeled data to learn initial parameters of a NN. Subsequently, adapt parameters to a task using corresponding supervised objective.
Model architrcture: transformer. Evaluate on 4 language understanding tasks - natural language inference, question answering, semantic similarity and text classification.
Framework:
Unsupervised pre-training.
Given unsuprervised corpus of tokens , use standard language modeling objective to max the following likelihood.
Here, k is the size of context window and condotional probability P is modeled using a NN with parameter Theta.
Transformer decoder for language model, variant of transformer. Applies a multi-headed self-attention on operation over input context tokens followed by position-wise feedforward layers to produce an output distribution over target tokens:
Where U is context vector of tokens, n is number of layers, W_e is token embedding matrix, W_p is position embedding matrix.
Supervised fine-tuning:
After training, we adapt the parameter to supervised target task. Assume a labelled dataset C, each instance consist of sequence of input tokens x1, ... , xm , along with label y. The input are passed through sequence pre-trained model to obtain final activation which then fed into an added linear output layer parameters to predict
This gives the following objective to maximize:
Including language modeling as an auxiliary obective to fine tuning helped learning by improving generalization of supervised model, and accelerating convergence, so we optimize the following objective:

Need modification for certain tasks, significant amount of customization. Use a traversal-style approach, convert structured input into an ordered sequence that the pre-trained model can process. - see paper for detail.
Last updated