Data quality is critical for multimedia tasks, while various types of systematic flaws are found in image benchmark datasets, as discussed in recent work. In particular, the existence of the semantic gap problem leads to a many-to-many mapping between the information extracted from an image and its linguistic description. This unavoidable bias further leads to poor performance on current computer vision tasks. To address this issue, we introduce a Knowledge Representation (KR)-based methodology to provide guidelines driving the labeling process, thereby indirectly introducing intended semantics in ML models. Specifically, an iterative refinement-based annotation method is proposed to optimize data labeling by organizing objects in a classification hierarchy according to their visual properties, ensuring that they are aligned with their linguistic descriptions. Preliminary results verify the effectiveness of the proposed method.
翻译:数据质量对于多媒体任务至关重要,然而近期研究表明,图像基准数据集中存在多种类型的系统性缺陷。特别是,语义差距问题的存在导致图像提取信息与其语言描述之间存在多对多映射关系。这种不可避免的偏差进一步导致当前计算机视觉任务性能不佳。为解决这一问题,我们引入基于知识表征的方法论,为标注流程提供指导规范,从而间接将预期语义引入机器学习模型中。具体而言,我们提出一种基于迭代精化的标注方法,通过根据视觉属性将对象组织到分类层次结构中,确保其与语言描述对齐,从而优化数据标注过程。初步结果验证了所提方法的有效性。