LDConv: Linear deformable convolution for improving convolutional neural networks

Neural networks based on convolutional operations have achieved remarkable results in the field of deep learning, but there are two inherent flaws in standard convolutional operations. On the one hand, the convolution operation is confined to a local window, so it cannot capture information from other locations, and its sampled shapes is fixed. On the other hand, the size of the convolutional kernel are fixed to k $\times$ k, which is a fixed square shape, and the number of parameters tends to grow squarely with size. Although Deformable Convolution (Deformable Conv) address the problem of fixed sampling of standard convolutions, the number of parameters also tends to grow in a squared manner. In response to the above questions, the Linear Deformable Convolution (LDConv) is explored in this work, which gives the convolution kernel an arbitrary number of parameters and arbitrary sampled shapes to provide richer options for the trade-off between network overhead and performance. In LDConv, a novel coordinate generation algorithm is defined to generate different initial sampled positions for convolutional kernels of arbitrary size. To adapt to changing targets, offsets are introduced to adjust the shape of the samples at each position. LDConv corrects the growth trend of the number of parameters for standard convolution and Deformable Conv to a linear growth. Moreover, it completes the process of efficient feature extraction by irregular convolutional operations and brings more exploration options for convolutional sampled shapes. Object detection experiments on representative datasets COCO2017, VOC 7+12, and VisDrone-DET2021 fully demonstrate the advantages of LDConv. LDConv is a plug-and-play convolutional operation that can replace the convolutional operation to improve network performance. The code for the relevant tasks can be found at https://github.com/CV-ZhangXin/LDConv.

翻译：基于卷积运算的神经网络在深度学习领域取得了显著成果，但标准卷积运算存在两个固有缺陷。一方面，卷积运算局限于局部窗口，因此无法捕获其他位置的信息，且其采样形状是固定的。另一方面，卷积核的尺寸固定为k×k的正方形结构，参数量随尺寸增长呈平方趋势。尽管可变形卷积（Deformable Conv）解决了标准卷积采样固定的问题，但其参数量同样呈平方增长趋势。针对上述问题，本文探索了线性可变形卷积（LDConv），该卷积核可具备任意参数数量和任意采样形状，为网络开销与性能之间的权衡提供更丰富的选择。在LDConv中，定义了一种新颖的坐标生成算法，可为任意尺寸的卷积核生成不同的初始采样位置。为适应变化的目标，引入偏移量来调整每个位置的采样形状。LDConv将标准卷积和可变形卷积的参数数量增长趋势修正为线性增长。此外，它通过不规则卷积运算完成高效特征提取过程，并为卷积采样形状带来更多探索可能性。在代表性数据集COCO2017、VOC 7+12和VisDrone-DET2021上进行的目标检测实验充分证明了LDConv的优势。LDConv是一种即插即用的卷积运算，可替代常规卷积运算以提升网络性能。相关任务代码可在https://github.com/CV-ZhangXin/LDConv获取。