dynamaticonv


Title

《Pay Less Attention With Lightweight and Dynamic Convolutions](https://arxiv.org/abs/1901.10430)》

Abstract

self-attention在current time step中通过与每个元素比较决定每个文本元素的重要等级。paper中提出一种lightweight convolution以及一种dynamic convolution。

We predict separate convolution kernels based solely on the current time-step in order to determine the importance of context elements. The number of operations required by this approach scales linearly in the input length, whereas self-attention is quadratic.

separate convolution kernel主要基于当前time-step,操作次数与输入长度线性相关,而且self-attention是二次的。

1.Introduction

RNNs integrate context information by updating a hidden state at every time-step, CNNs summarize a fifixed size context through multiple layers, while as self-attention directly summarizes all context.

RNN通过在每个时间点更新隐藏状态

[^Self-attention computes attention weights by comparing all pairs of elements to each other hile as dynamic convolutions predict separate kernels for each time-step (b).]:

本文介绍了lightweight convolution,它depth-wise separable,softmax-normalized and share weight over the channel dimension.轻量级卷积相比于标准卷积的权值少几个量级,对上下文重复使用相同的权值忽略当前的时间点。

Dynamic convolution基于轻量级卷积在每个time-step对不同的卷积核进行预测。但是权值不是在训练后固定不变的,而是被模型动态生成的


Author: Weiruohe
Reprint policy: All articles in this blog are used except for special statements CC BY 4.0 reprint polocy. If reproduced, please indicate source Weiruohe !
  TOC