Technological advances facilitate the ability to acquire multimodal data, posing a challenge for recognition systems while also providing an opportunity to use the heterogeneous nature of the information to increase the generalization capability of models. An often overlooked issue is the cost of the labeling process, which is typically high due to the need for a significant investment in time and money associated with human experts. Existing semi-supervised learning methods often focus on operating in the feature space created by the fusion of available modalities, neglecting the potential for cross-utilizing complementary information available in each modality. To address this problem, we propose Cross-Modality Clustering-based Self-Labeling (CMCSL). Based on a small set of pre-labeled data, CMCSL groups instances belonging to each modality in the deep feature space and then propagates known labels within the resulting clusters. Next, information about the instances' class membership in each modality is exchanged based on the Euclidean distance to ensure more accurate labeling. Experimental evaluation conducted on 20 datasets derived from the MM-IMDb dataset indicates that cross-propagation of labels between modalities -- especially when the number of pre-labeled instances is small -- can allow for more reliable labeling and thus increase the classification performance in each modality.
翻译:技术进步促进了多模态数据的获取能力,这对识别系统提出了挑战,同时也为利用信息的异质性提升模型泛化能力提供了机遇。一个常被忽视的问题是标注过程的成本,由于需要投入大量与人类专家相关的时间和资金,这一成本通常较高。现有的半监督学习方法往往侧重于在通过可用模态融合创建的特征空间中操作,忽视了跨模态利用各模态中互补信息的潜力。为解决这一问题,我们提出了基于跨模态聚类的自标注方法(CMCSL)。基于少量预标注数据,CMCSL在深度特征空间中对属于每个模态的实例进行分组,然后在生成的聚类内传播已知标签。随后,基于欧氏距离交换各模态中实例的类别隶属信息,以确保更精确的标注。在源自MM-IMDb数据集的20个数据集上进行的实验评估表明,跨模态的标签传播——特别是当预标注实例数量较少时——能够实现更可靠的标注,从而提高每个模态的分类性能。