Supervised learning of image classifiers distills human knowledge into a parametric model through pairs of images and corresponding labels (X,Y). We argue that this simple and widely used representation of human knowledge neglects rich auxiliary information from the annotation procedure, such as the time-series of mouse traces and clicks left after image selection. Our insight is that such annotation byproducts Z provide approximate human attention that weakly guides the model to focus on the foreground cues, reducing spurious correlations and discouraging shortcut learning. To verify this, we create ImageNet-AB and COCO-AB. They are ImageNet and COCO training sets enriched with sample-wise annotation byproducts, collected by replicating the respective original annotation tasks. We refer to the new paradigm of training models with annotation byproducts as learning using annotation byproducts (LUAB). We show that a simple multitask loss for regressing Z together with Y already improves the generalisability and robustness of the learned models. Compared to the original supervised learning, LUAB does not require extra annotation costs. ImageNet-AB and COCO-AB are at https://github.com/naver-ai/NeglectedFreeLunch.
翻译:图像分类器的监督学习通过图像与对应标签 (X,Y) 的配对,将人类知识蒸馏为参数化模型。我们认为,这种简单且广泛使用的人类知识表示方式忽视了标注过程中丰富的辅助信息,例如图像选择后留下的鼠标轨迹和点击的时间序列。我们的洞察是,此类标注副产品 Z 提供了近似的人类注意力,弱引导模型聚焦于前景线索,从而减少虚假相关性并防止捷径学习。为验证这一点,我们构建了 ImageNet-AB 和 COCO-AB 数据集。它们是通过复现各自原始标注任务收集的、包含样本级标注副产品的 ImageNet 和 COCO 训练集。我们将利用标注副产品训练模型的新范式称为“基于标注副产品的学习”(LUAB)。我们证明,简单的多任务损失(同时回归 Z 和 Y)即可提升所学模型的泛化能力和鲁棒性。与原始监督学习相比,LUAB 无需额外标注成本。ImageNet-AB 和 COCO-AB 数据集见 https://github.com/naver-ai/NeglectedFreeLunch。