Title
《Pay Less Attention With Lightweight and Dynamic Convolutions](https://arxiv.org/abs/1901.10430)》
Abstract
self-attention在current time step中通过与每个元素比较决定每个文本元素的重要等级。paper中提出一种lightweight convolution以及一种dynamic convolution。
We predict separate convolution kernels based solely on the current time-step in order to determine the importance of context elements. The number of operations required by this approach scales linearly in the input length, whereas self-attention is quadratic.
separate convolution kernel主要基于当前time-step,操作次数与输入长度线性相关,而且self-attention是二次的。
1.Introduction
RNNs integrate context information by updating a hidden state at every time-step, CNNs summarize a fifixed size context through multiple layers, while as self-attention directly summarizes all context.
RNN通过在每个时间点更新隐藏状态
[^Self-attention computes attention weights by comparing all pairs of elements to each other hile as dynamic convolutions predict separate kernels for each time-step (b).]:
本文介绍了lightweight convolution,它depth-wise separable,softmax-normalized and share weight over the channel dimension.轻量级卷积相比于标准卷积的权值少几个量级,对上下文重复使用相同的权值忽略当前的时间点。
Dynamic convolution基于轻量级卷积在每个time-step对不同的卷积核进行预测。但是权值不是在训练后固定不变的,而是被模型动态生成的