Deep learning has yielded remarkable outcomes in various domains. However, the challenge of requiring large-scale labeled samples still persists in deep learning. Thus, data augmentation has been introduced as a critical strategy to train deep learning models. However, data augmentation suffers from information loss and poor performance in small sample environments. To overcome these drawbacks, we propose a feature augmentation method based on shape space theory, i.e., Geodesic curve feature augmentation, called GCFA in brevity. First, we extract features from the image with the neural network model. Then, the multiple image features are projected into a pre-shape space as features. In the pre-shape space, a Geodesic curve is built to fit the features. Finally, the many generated features on the Geodesic curve are used to train the various machine learning models. The GCFA module can be seamlessly integrated with most machine learning methods. And the proposed method is simple, effective and insensitive for the small sample datasets. Several examples demonstrate that the GCFA method can greatly improve the performance of the data preprocessing model in a small sample environment.
翻译:深度学习在多个领域已取得显著成果,但大规模标注样本的需求仍制约其发展。为此,数据增强被引入作为训练深度学习模型的关键策略。然而,数据增强在小样本场景中存在信息丢失和性能不足的问题。为克服这些缺陷,我们提出一种基于形状空间理论的特征增强方法——测地线曲线特征增强(简称GCFA)。首先,通过神经网络模型提取图像特征,随后将多幅图像特征投影至预形状空间。在预形状空间中构建测地线曲线以拟合特征,最终利用测地线曲线上生成的大量特征训练多种机器学习模型。GCFA模块可无缝集成于大多数机器学习方法中,且该方法简洁高效,对小样本数据集不敏感。多个实例证明,GCFA方法能显著提升小样本环境下数据预处理模型的性能。