As the data demand for deep learning models increases, active learning (AL) becomes essential to strategically select samples for labeling, which maximizes data efficiency and reduces training costs. Real-world scenarios necessitate the consideration of incomplete data knowledge within AL. Prior works address handling out-of-distribution (OOD) data, while another research direction has focused on category discovery. However, a combined analysis of real-world considerations combining AL with out-of-distribution data and category discovery remains unexplored. To address this gap, we propose Joint Out-of-distribution filtering and data Discovery Active learning (Joda) , to uniquely address both challenges simultaneously by filtering out OOD data before selecting candidates for labeling. In contrast to previous methods, we deeply entangle the training procedure with filter and selection to construct a common feature space that aligns known and novel categories while separating OOD samples. Unlike previous works, Joda is highly efficient and completely omits auxiliary models and training access to the unlabeled pool for filtering or selection. In extensive experiments on 18 configurations and 3 metrics, \ours{} consistently achieves the highest accuracy with the best class discovery to OOD filtering balance compared to state-of-the-art competitor approaches.
翻译:随着深度学习模型对数据需求的增长,主动学习(AL)对于策略性地选择样本进行标注变得至关重要,这能最大化数据效率并降低训练成本。现实场景需要考虑AL中不完整的数据知识。先前的研究工作主要处理分布外(OOD)数据,而另一研究方向则聚焦于类别发现。然而,将主动学习、分布外数据与类别发现相结合的现实考量综合分析尚未得到探索。为填补这一空白,我们提出联合分布外过滤与数据发现主动学习(Joda),通过在选择标注候选样本前过滤OOD数据,以独特方式同时应对这两项挑战。与先前方法不同,我们将训练过程与过滤及选择深度耦合,以构建一个统一特征空间,该空间能对齐已知类别与新类别,同时分离OOD样本。不同于以往研究,Joda具有高效性,完全省略了辅助模型以及对未标注池进行过滤或选择的训练访问。在涵盖18种配置和3项评估指标的广泛实验中,相较于最先进的竞争方法,\ours{}始终以最佳的类别发现与OOD过滤平衡实现了最高准确率。