Title
Improving Language Understanding by Generative Pre-Training
Abstract
Natural language understanding comprises a wide range of diverse tasks such as textual entailment, question answering, semantic similarity assessment, and document classifification.Although large unlabeled text corpora are abundant, labeled data for learning these specifific tasks is scarce, making it challenging for discriminatively trained models to perform adequately. We demonstrate that large gains on these tasks can be realized by generative pre-training of a language model on a diverse corpus of unlabeled text, followed by discriminative fifine-tuning on each specifific task.
We introduced a framework for achieving strong natural language understanding with a single task-agnostic model through generative pre-training and discriminative fifine-tuning.
解决的问题:用来学习specific task的labled data是稀缺的。让针对性的模型充分执行具有挑战性。
解决方法:generative pre-training,discriminative fifine-tuning(task-aware input transformations)
结果:improve:8.9%-CR。5.7%-QA,1.5%-TE
Introduction
为了释放在nlp中对监督学习的依赖,从原始文本中进行有效学习很重要,该模型针对从未标记数据中提取语言信息获取更多的注释提出了一种替代方案。
从未标记的文本中提取字级别的信息有两个主要的挑战:
- 无法确定最有效的优化目标
- 最有效的传送方法未知
pre-training:unsupervised-one corpus of unlabeled text
fine-tunning:supervised-several dataset with manually annotated traning examples(target tasks)
训练步骤:
- 使用基于未标记的数据的模型目标去学习神经网络模型的初始参数
- 用相关的监督目标来调整参数适应 target task
model architecture:Transformer
more structured memory:hande long-dependency in text,在不同tasks中都有很好的跨越表现,同时将结构化文本处理为单个连续的文本序列,这样能在对预训练模型进行最小改变的前提下进行有效的fine-tunning
[^计算相似性将两种顺序都进行计算的原因可能是增加鲁棒性]:
自然语言理解包括很多任务,比如textual entailment,QA,SSA(semantic similarity assessment),document classification.实现了无监督的预处理以及在target tasks上有监督的微调,无监督学习从word-level,phrase-level的embedding发展到现在的sentence-level,为了捕捉更高级别的语义信息.无监督的预处理是无监督学习的特例,他是为了找到更好的初始化点,帮助神经网络在不同任务(图像分类,语音识别,实体消歧以及MT)上进行训练.之前借助LSTM进行文本信息捕捉,将预测范围限制的很短,本文使用Transformer模型改进了预测范围.预训练还将
红框标注出了Transformer模型的位置,token级的输入与Linear层之间.token级的输入是创新点,它实现了更细粒度的fine-tunning(微调).先使用未标记的数据学习模型的初始参数,然后再监督目标下将这些参数适应于特定的目标.目标任务与未标记数据集不需要在同一区域,跨任务的模型表现都很好.
Conclusion
model:Transformer
data set:text with long dependency
target task:QA,SSA(semantic similarity assessment),ED(entailment Denpendency),TC(text classification)
实现了很强的自然语言理解能力,使用Transformer对长连续文本的不同数据集进行无监督预训练,再在目标task上进行有监督微调,提高处理long-dependency文本的能力.