Zero-shot learning is the problem of predicting instances over classes not seen during training. One approach to zero-shot learning is providing auxiliary class information to the model. Prior work along this vein have largely used expensive per-instance annotation or singular class-level descriptions, but per-instance descriptions are hard to scale and single class descriptions may not be rich enough. Furthermore, these works have used natural-language descriptions exclusively, simple bi-encoders models, and modality or task-specific methods. These approaches have several limitations: text supervision may not always be available or optimal and bi-encoders may only learn coarse relations between inputs and class descriptions. In this work, we present SemSup, a novel approach that uses (1) a scalable multiple description sampling method which improves performance over single descriptions, (2) alternative description formats such as JSON that are easy to generate and outperform text on certain settings, and (3) hybrid lexical-semantic similarity to leverage fine-grained information in class descriptions. We demonstrate the effectiveness of SemSup across four datasets, two modalities, and three generalization settings. For example, across text and image datasets, SemSup increases unseen class generalization accuracy by 15 points on average compared to the closest baseline.
翻译:零样本学习旨在对训练阶段未见过的类别实例进行预测。一种实现零样本学习的方法是为模型提供辅助类别信息。此前的相关研究大多采用昂贵的逐实例标注或单一类别级描述,但逐实例描述难以扩展,而单一类别描述可能信息不够丰富。此外,这些研究仅使用自然语言描述、简单双编码器模型以及特定模态或任务的方法。这些方法存在若干局限性:文本监督未必始终可用或最优,且双编码器可能仅能学习输入与类别描述之间的粗粒度关系。本文提出SemSup方法,其创新在于:(1)可扩展的多描述采样方法,性能优于单一描述;(2)替代性描述格式(如JSON),易于生成且在某些设定下优于文本;(3)混合词汇-语义相似度,可充分利用类别描述中的细粒度信息。我们在四个数据集、两种模态及三种泛化设定下验证了SemSup的有效性。例如,在文本和图像数据集上,与最接近的基线方法相比,SemSup将未见类别泛化准确率平均提升15个百分点。