In computer vision, 2D convolution is arguably the most important operation performed by a ConvNet. Unsurprisingly, it has been the focus of intense software and hardware optimization and enjoys highly efficient implementations. In this work, we ask an intriguing question: can we make a ConvNet work without 2D convolutions? Surprisingly, we find that the answer is yes -- we show that a ConvNet consisting entirely of 1D convolutions can do just as well as 2D on ImageNet classification. Specifically, we find that one key ingredient to a high-performing 1D ConvNet is oriented 1D kernels: 1D kernels that are oriented not just horizontally or vertically, but also at other angles. Our experiments show that oriented 1D convolutions can not only replace 2D convolutions but also augment existing architectures with large kernels, leading to improved accuracy with minimal FLOPs increase. A key contribution of this work is a highly-optimized custom CUDA implementation of oriented 1D kernels, specialized to the depthwise convolution setting. Our benchmarks demonstrate that our custom CUDA implementation almost perfectly realizes the theoretical advantage of 1D convolution: it is faster than a native horizontal convolution for any arbitrary angle. Code is available at https://github.com/princeton-vl/Oriented1D.
翻译:在计算机视觉中,2D卷积无疑是ConvNet执行的最重要操作。毫不意外,它一直是软件和硬件优化的核心焦点,并享有高效的实现。在这项工作中,我们提出了一个引人深思的问题:能否让ConvNet在不使用2D卷积的情况下工作?令人惊讶的是,我们发现答案是肯定的——我们证明了完全由1D卷积组成的ConvNet在ImageNet分类上可以达到与2D卷积同等的性能。具体而言,我们发现高性能1D ConvNet的关键要素之一是定向一维核:这些一维核不仅限于水平或垂直方向,还包括其他角度。实验表明,定向一维卷积不仅能替代2D卷积,还能增强现有的大核架构,从而以极少的FLOPs增加提升准确率。本工作的一项核心贡献是高度优化的自定义CUDA实现,专门针对深度可分离卷积场景。基准测试表明,我们的自定义CUDA实现几乎完美地实现了1D卷积的理论优势:在任意角度下,其速度均快于原生水平卷积。代码已开源至 https://github.com/princeton-vl/Oriented1D。