deep transformer-NMT

MT Admin NLP

PaperLookThrough

Publish Date: 2020-10-06

Update Date: 2020-10-12

Word Count: 120

Read Times: 1 Min

Read Count:

how to decrease the variance of the output layer in order to train deep Trsnaformer for NMT?

use a initialization named “ADMIN” to remedy the variance problem and stabilize training

outperform their 6-layer baseline,with up to 2.5BLEU improvement.

The code and trained models will be publicly available at: https://github.com/namisan/exdeep-nmt.

Weiruohe

https://weiruohe.github.io/2020/10/06/deep-transformer-nmt/

All articles in this blog are used except for special statements CC BY 4.0 reprint polocy. If reproduced, please indicate source Weiruohe !

MT Admin NLP

Title《MUSE: PARALLEL MULTI-SCALE ATTENTION FOR SEQUENCE TO SEQUENCE LEARNING》 AbstractQ:attention mechanism alone suffer

2020-10-06 PaperLookThrough

MT NLP attention

title:《Understanding Back-Translation at Scale》Q:how to improve neural machine translation with monolingual data? S:augm

2020-10-06 PaperLookThrough

MT Transformer NLP