Handling entirely unknown data is a challenge for any deployed classifier. Classification models are typically trained on a static pre-defined dataset and are kept in the dark for the open unassigned feature space. As a result, they struggle to deal with out-of-distribution data during inference. Addressing this task on the class-level is termed open-set recognition (OSR). However, most OSR methods are inherently limited, as they train closed-set classifiers and only adapt the downstream predictions to OSR. This work presents LORD, a framework to Leverage Open-set Recognition by exploiting unknown Data. LORD explicitly models open space during classifier training and provides a systematic evaluation for such approaches. We identify three model-agnostic training strategies that exploit background data and applied them to well-established classifiers. Due to LORD's extensive evaluation protocol, we consistently demonstrate improved recognition of unknown data. The benchmarks facilitate in-depth analysis across various requirement levels. To mitigate dependency on extensive and costly background datasets, we explore mixup as an off-the-shelf data generation technique. Our experiments highlight mixup's effectiveness as a substitute for background datasets. Lightweight constraints on mixup synthesis further improve OSR performance.
翻译:对于任何部署的分类器而言,处理完全未知的数据都是一项挑战。分类模型通常基于静态预定义数据集进行训练,而对于开放未分配的特征空间则处于盲区。因此,它们在推理阶段难以应对分布外数据。在类别层面解决该问题被称为开放集识别(OSR)。然而,多数开放集识别方法存在固有局限——它们训练闭集分类器,仅对下游预测结果进行开放集适配调整。本文提出LORD框架,通过利用未知数据实现开放集识别。LORD在分类器训练阶段显式建模开放空间,并为此类方法提供系统性评估。我们识别出三种利用背景数据的模型无关训练策略,并将其应用于成熟分类器。借助LORD的全面评估协议,我们一致证明了未知数据识别性能的提升。基准测试支持跨不同需求层级的深度分析。为减少对大规模高成本背景数据集的依赖,我们探索将mixup作为现成数据生成技术。实验表明,mixup可有效替代背景数据集,且对mixup合成的轻量约束能进一步提升开放集识别性能。