Out-of-distribution (OOD) detection discerns OOD data where the predictor cannot make valid predictions as in-distribution (ID) data, thereby increasing the reliability of open-world classification. However, it is typically hard to collect real out-of-distribution (OOD) data for training a predictor capable of discerning ID and OOD patterns. This obstacle gives rise to data generation-based learning methods, synthesizing OOD data via data generators for predictor training without requiring any real OOD data. Related methods typically pre-train a generator on ID data and adopt various selection procedures to find those data likely to be the OOD cases. However, generated data may still coincide with ID semantics, i.e., mistaken OOD generation remains, confusing the predictor between ID and OOD data. To this end, we suggest that generated data (with mistaken OOD generation) can be used to devise an auxiliary OOD detection task to facilitate real OOD detection. Specifically, we can ensure that learning from such an auxiliary task is beneficial if the ID and the OOD parts have disjoint supports, with the help of a well-designed training procedure for the predictor. Accordingly, we propose a powerful data generation-based learning method named Auxiliary Task-based OOD Learning (ATOL) that can relieve the mistaken OOD generation. We conduct extensive experiments under various OOD detection setups, demonstrating the effectiveness of our method against its advanced counterparts.
翻译:分布外(OOD)检测能够识别预测器无法像分布内(ID)数据一样做出有效预测的OOD数据,从而提升开放世界分类的可靠性。然而,在实际中收集真实的OOD数据来训练能够区分ID与OOD模式的预测器通常较为困难。这一障碍催生了基于数据生成的学习方法——通过数据生成器合成OOD数据用于预测器训练,无需任何真实OOD数据。这类方法通常先在ID数据上预训练生成器,并采用多种筛选策略挑选疑似OOD的样本。但生成数据仍可能与ID语义重合,即存在错误的OOD生成,导致预测器混淆ID与OOD数据。为此,我们提出可将包含错误OOD生成的生成数据用于设计辅助OOD检测任务,以促进真实OOD检测。具体而言,通过精心设计预测器的训练流程,我们可确保当ID与OOD部分的支撑集不相交时,该辅助任务的学习能够带来正向收益。基于此,我们提出了一种强大的基于数据生成的学习方法——辅助任务驱动的OOD学习(ATOL),该方法能有效缓解错误OOD生成问题。我们在多种OOD检测设置下进行了广泛实验,结果表明我们的方法相较于先进基线方法具有显著优势。