TFDMNet: A Novel Network Structure Combines the Time Domain and Frequency Domain Features

from arxiv, This paper is the updated edition of our paper Learning Convolutional Neural Networks in the Frequency Domain (arXiv:2204.06718). Comparing with the previous edition, we design a mixture model to get the balance between the computation complexity and memory usage

Convolutional neural network (CNN) has achieved impressive success in computer vision during the past few decades. The image convolution operation helps CNNs to get good performance on image-related tasks. However, it also has high computation complexity and hard to be parallelized. This paper proposes a novel Element-wise Multiplication Layer (EML) to replace convolution layers, which can be trained in the frequency domain. Theoretical analyses show that EMLs lower the computation complexity and easier to be parallelized. Moreover, we introduce a Weight Fixation mechanism to alleviate the problem of over-fitting, and analyze the working behavior of Batch Normalization and Dropout in the frequency domain. To get the balance between the computation complexity and memory usage, we propose a new network structure, namely Time-Frequency Domain Mixture Network (TFDMNet), which combines the advantages of both convolution layers and EMLs. Experimental results imply that TFDMNet achieves good performance on MNIST, CIFAR-10 and ImageNet databases with less number of operations comparing with corresponding CNNs.

翻译：卷积神经网络（CNN）在过去几十年中在计算机视觉领域取得了显著成功。图像卷积运算使CNN能够在图像相关任务中表现出色，但其计算复杂度高且难以并行化。本文提出了一种新颖的逐元素乘法层（EML）替代卷积层，该层可在频域中进行训练。理论分析表明，EML降低了计算复杂度且更易于并行化。此外，我们引入了权重固定机制以缓解过拟合问题，并分析了批归一化和Dropout在频域中的工作行为。为平衡计算复杂度与内存占用，我们提出了一种新的网络结构——时频域混合网络（TFDMNet），该结构结合了卷积层与EML的优势。实验结果表明，与对应的CNN相比，TFDMNet在MNIST、CIFAR-10和ImageNet数据集上以更少的运算次数取得了良好性能。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日