In this paper, we present a comprehensive study and propose several novel techniques for implementing 3D convolutional blocks using 2D and/or 1D convolutions with only 4D and/or 3D tensors. Our motivation is that 3D convolutions with 5D tensors are computationally very expensive and they may not be supported by some of the edge devices used in real-time applications such as robots. The existing approaches mitigate this by splitting the 3D kernels into spatial and temporal domains, but they still use 3D convolutions with 5D tensors in their implementations. We resolve this issue by introducing some appropriate 4D/3D tensor reshaping as well as new combination techniques for spatial and temporal splits. The proposed implementation methods show significant improvement both in terms of efficiency and accuracy. The experimental results confirm that the proposed spatio-temporal processing structure outperforms the original model in terms of speed and accuracy using only 4D tensors with fewer parameters.
翻译:本文系统研究并提出多种创新技术,仅使用四维和/或三维张量,通过二维和/或一维卷积实现三维卷积块。我们的研究动机在于:采用五维张量的三维卷积计算成本极高,且部分实时应用(如机器人)的边缘设备可能无法支持此类运算。现有方法通过将三维卷积核分解为空间域与时间域来缓解此问题,但其实现仍依赖五维张量的三维卷积。我们通过引入适当的四维/三维张量重塑技术,以及空间-时间分解的新型组合方法,从根本上解决了这一难题。所提出的实现方法在效率与精度方面均展现出显著提升。实验结果表明,仅使用参数更少的四维张量,所提出的时空处理结构在速度与精度上均优于原始模型。