ConvBERT: Improving BERT with Span-based Dynamic ConvolutionConvBERT: Improving BERT with Span-based Dynamic Convolution 本文前置知识: Light Weight Convolution: 详见基于轻量级卷积和动态卷积替代的注意力机制.2021-02-12 深度学习NLP CNN Attention