Developing generalizable models that can effectively learn from limited data and with minimal reliance on human supervision is a significant objective within the machine learning community, particularly in the era of deep neural networks. Therefore, to achieve data-efficient learning, researchers typically explore approaches that can leverage more related or unlabeled data without necessitating additional manual labeling efforts, such as Semi-Supervised Learning (SSL), Transfer Learning (TL), and Data Augmentation (DA). SSL leverages unlabeled data in the training process, while TL enables the transfer of expertise from related data distributions. DA broadens the dataset by synthesizing new data from existing examples. However, the significance of additional knowledge contained within labels has been largely overlooked in research. In this paper, we propose a novel perspective on data efficiency that involves exploiting the semantic information contained in the labels of the available data. Specifically, we introduce a Language Semantic Graph (LSG) which is constructed from labels manifest as natural language descriptions. Upon this graph, an auxiliary graph neural network is trained to extract high-level semantic relations and then used to guide the training of the primary model, enabling more adequate utilization of label knowledge. Across image, video, and audio modalities, we utilize the LSG method in both TL and SSL scenarios and illustrate its versatility in significantly enhancing performance compared to other data-efficient learning approaches. Additionally, our in-depth analysis shows that the LSG method also expedites the training process.
翻译:发展能够从有限数据中有效学习且对人类监督依赖最小的泛化模型,是机器学习领域(尤其在深度神经网络时代)的重要目标。因此,为实现数据高效学习,研究者通常探索无需额外人工标注即可利用更多相关或无标签数据的方法,例如半监督学习、迁移学习和数据增强。半监督学习在训练过程中利用无标签数据,迁移学习能够从相关数据分布迁移专业知识,数据增强则通过从现有样本合成新数据来扩展数据集。然而,包含在标签中的额外知识在研究中很大程度上被忽视了。本文提出了一种关于数据效率的新视角,即利用可用数据标签中包含的语义信息。具体而言,我们引入了语言语义图——一种由自然语言形式描述的标签构建的图结构。在此图上,我们训练一个辅助图神经网络来提取高级语义关系,并用于指导主模型的训练,从而实现标签知识的更充分利用。在图像、视频和音频模态中,我们在迁移学习和半监督学习场景下应用语言语义图方法,展示了其相比其他数据高效学习方法在显著提升性能方面的多功能性。此外,我们的深入分析表明,语言语义图方法还能加速训练过程。