Convolutional neural network (CNN) has achieved impressive success in computer vision during the past few decades. The image convolution operation helps CNNs to get good performance on image-related tasks. However, it also has high computation complexity and hard to be parallelized. This paper proposes a novel Element-wise Multiplication Layer (EML) to replace convolution layers, which can be trained in the frequency domain. Theoretical analyses show that EMLs lower the computation complexity and easier to be parallelized. Moreover, we introduce a Weight Fixation mechanism to alleviate the problem of over-fitting, and analyze the working behavior of Batch Normalization and Dropout in the frequency domain. To get the balance between the computation complexity and memory usage, we propose a new network structure, namely Time-Frequency Domain Mixture Network (TFDMNet), which combines the advantages of both convolution layers and EMLs. Experimental results imply that TFDMNet achieves good performance on MNIST, CIFAR-10 and ImageNet databases with less number of operations comparing with corresponding CNNs.
翻译:卷积神经网络(CNN)在过去几十年中在计算机视觉领域取得了显著成功。图像卷积运算使CNN能够在图像相关任务中表现出色,但其计算复杂度高且难以并行化。本文提出了一种新颖的逐元素乘法层(EML)替代卷积层,该层可在频域中进行训练。理论分析表明,EML降低了计算复杂度且更易于并行化。此外,我们引入了权重固定机制以缓解过拟合问题,并分析了批归一化和Dropout在频域中的工作行为。为平衡计算复杂度与内存占用,我们提出了一种新的网络结构——时频域混合网络(TFDMNet),该结构结合了卷积层与EML的优势。实验结果表明,与对应的CNN相比,TFDMNet在MNIST、CIFAR-10和ImageNet数据集上以更少的运算次数取得了良好性能。