In the dynamic realms of machine learning and deep learning, the robustness and reliability of models are paramount, especially in critical real-world applications. A fundamental challenge in this sphere is managing Out-of-Distribution (OOD) samples, significantly increasing the risks of model misclassification and uncertainty. Our work addresses this challenge by enhancing the detection and management of OOD samples in neural networks. We introduce OOD-R (Out-of-Distribution-Rectified), a meticulously curated collection of open-source datasets with enhanced noise reduction properties. In-Distribution (ID) noise in existing OOD datasets can lead to inaccurate evaluation of detection algorithms. Recognizing this, OOD-R incorporates noise filtering technologies to refine the datasets, ensuring a more accurate and reliable evaluation of OOD detection algorithms. This approach not only improves the overall quality of data but also aids in better distinguishing between OOD and ID samples, resulting in up to a 2.5\% improvement in model accuracy and a minimum 3.2\% reduction in false positives. Furthermore, we present ActFun, an innovative method that fine-tunes the model's response to diverse inputs, thereby improving the stability of feature extraction and minimizing specificity issues. ActFun addresses the common problem of model overconfidence in OOD detection by strategically reducing the influence of hidden units, which enhances the model's capability to estimate OOD uncertainty more accurately. Implementing ActFun in the OOD-R dataset has led to significant performance enhancements, including an 18.42\% increase in AUROC of the GradNorm method and a 16.93\% decrease in FPR95 of the Energy method. Overall, our research not only advances the methodologies in OOD detection but also emphasizes the importance of dataset integrity for accurate algorithm evaluation.
翻译:在机器学习和深度学习的前沿领域,模型的鲁棒性与可靠性至关重要,尤其是在关键的现实世界应用中。该领域中的一个基本挑战是管理分布外(OOD)样本,这会显著增加模型误分类和不确定性的风险。我们的工作通过增强神经网络中OOD样本的检测与管理来应对这一挑战。我们提出了OOD-R(分布外修正),这是一个经过精心策划的开源数据集集合,具有增强的噪声抑制特性。现有OOD数据集中的分布内(ID)噪声可能导致检测算法的评估不准确。认识到这一点,OOD-R融入了噪声过滤技术以精炼数据集,从而确保对OOD检测算法进行更准确可靠的评估。这种方法不仅提高了数据的整体质量,还有助于更好地区分OOD和ID样本,导致模型准确率提升高达2.5%,假阳性率至少降低3.2%。此外,我们提出了ActFun,这是一种创新方法,可微调模型对不同输入的响应,从而提高特征提取的稳定性并减少特异性的问题。ActFun通过策略性地降低隐藏单元的影响来应对OOD检测中常见的模型过度自信问题,这增强了模型更准确估计OOD不确定性的能力。将ActFun应用于OOD-R数据集带来了显著的性能提升,包括GradNorm方法的AUROC提高了18.42%,以及Energy方法的FPR95降低了16.93%。总体而言,我们的研究不仅推进了OOD检测的方法论,还强调了数据集完整性对于准确算法评估的重要性。