dynamaticonv

PaperLookThrough

Publish Date: 2020-10-07

Update Date: 2020-10-12

Word Count: 362

Read Times: 1 Min

Read Count:

Title

《Pay Less Attention With Lightweight and Dynamic Convolutions](https://arxiv.org/abs/1901.10430)》

Abstract

self-attention在current time step中通过与每个元素比较决定每个文本元素的重要等级。paper中提出一种lightweight convolution以及一种dynamic convolution。

We predict separate convolution kernels based solely on the current time-step in order to determine the importance of context elements. The number of operations required by this approach scales linearly in the input length, whereas self-attention is quadratic.

separate convolution kernel主要基于当前time-step,操作次数与输入长度线性相关，而且self-attention是二次的。

1.Introduction

RNNs integrate context information by updating a hidden state at every time-step, CNNs summarize a fifixed size context through multiple layers, while as self-attention directly summarizes all context.

RNN通过在每个时间点更新隐藏状态

[^Self-attention computes attention weights by comparing all pairs of elements to each other hile as dynamic convolutions predict separate kernels for each time-step (b).]:

本文介绍了lightweight convolution,它depth-wise separable,softmax-normalized and share weight over the channel dimension.轻量级卷积相比于标准卷积的权值少几个量级，对上下文重复使用相同的权值忽略当前的时间点。

Dynamic convolution基于轻量级卷积在每个time-step对不同的卷积核进行预测。但是权值不是在训练后固定不变的，而是被模型动态生成的

Weiruohe

https://weiruohe.github.io/2020/10/07/dynamaticonv/

All articles in this blog are used except for special statements CC BY 4.0 reprint polocy. If reproduced, please indicate source Weiruohe !

MT CNN NLP

Convs2s-MT

Title《Convolutional Sequence to Sequence Learning》 Abstract We introduce an architecture based entirely on convolutional

2020-10-09 PaperLookThrough

MT CNN NLP

Multi-Scale attention-seq

Title《MUSE: PARALLEL MULTI-SCALE ATTENTION FOR SEQUENCE TO SEQUENCE LEARNING》 AbstractQ:attention mechanism alone suffer

2020-10-06 PaperLookThrough

MT NLP attention

Title

Abstract

1.Introduction

你的赏识是我前进的动力