Image classification is one of the most fundamental tasks in Computer Vision. In practical applications, the datasets are usually not as abundant as those in the laboratory and simulation, which is always called as Data Hungry. How to extract the information of data more completely and effectively is very important. Therefore, an Adaptive Data Augmentation Framework based on the tensor T-product Operator is proposed in this paper, to triple one image data to be trained and gain the result from all these three images together with only less than 0.1% increase in the number of parameters. At the same time, this framework serves the functions of column image embedding and global feature intersection, enabling the model to obtain information in not only spatial but frequency domain, and thus improving the prediction accuracy of the model. Numerical experiments have been designed for several models, and the results demonstrate the effectiveness of this adaptive framework. Numerical experiments show that our data augmentation framework can improve the performance of original neural network model by 2%, which provides competitive results to state-of-the-art methods.
翻译:图像分类是计算机视觉中最基础的任务之一。在实际应用中,数据集通常不如实验室和仿真环境中丰富,这被称为“数据饥渴”问题。如何更完整、更有效地提取数据信息至关重要。为此,本文提出了一种基于张量T乘积算子的自适应数据增强框架,旨在将一份图像数据扩展为三份进行训练,并在仅增加不到0.1%参数量的前提下,综合三份图像的训练结果。同时,该框架兼具列图像嵌入和全局特征交叉功能,使模型不仅能获取空间域信息,还能获取频域信息,从而提升模型的预测精度。针对多个模型设计了数值实验,结果表明该自适应框架的有效性。数值实验显示,我们的数据增强框架可将原始神经网络模型的性能提升2%,达到了与最先进方法相竞争的结果。