Adversarially Robust Out-of-Distribution Detection Using Lyapunov-Stabilized Embeddings

Despite significant advancements in out-of-distribution (OOD) detection, existing methods still struggle to maintain robustness against adversarial attacks, compromising their reliability in critical real-world applications. Previous studies have attempted to address this challenge by exposing detectors to auxiliary OOD datasets alongside adversarial training. However, the increased data complexity inherent in adversarial training, and the myriad of ways that OOD samples can arise during testing, often prevent these approaches from establishing robust decision boundaries. To address these limitations, we propose AROS, a novel approach leveraging neural ordinary differential equations (NODEs) with Lyapunov stability theorem in order to obtain robust embeddings for OOD detection. By incorporating a tailored loss function, we apply Lyapunov stability theory to ensure that both in-distribution (ID) and OOD data converge to stable equilibrium points within the dynamical system. This approach encourages any perturbed input to return to its stable equilibrium, thereby enhancing the model's robustness against adversarial perturbations. To not use additional data, we generate fake OOD embeddings by sampling from low-likelihood regions of the ID data feature space, approximating the boundaries where OOD data are likely to reside. To then further enhance robustness, we propose the use of an orthogonal binary layer following the stable feature space, which maximizes the separation between the equilibrium points of ID and OOD samples. We validate our method through extensive experiments across several benchmarks, demonstrating superior performance, particularly under adversarial attacks. Notably, our approach improves robust detection performance from 37.8% to 80.1% on CIFAR-10 vs. CIFAR-100 and from 29.0% to 67.0% on CIFAR-100 vs. CIFAR-10.

翻译：尽管分布外（OOD）检测领域已取得显著进展，现有方法在面对对抗攻击时仍难以保持鲁棒性，从而影响了其在关键现实应用中的可靠性。先前研究尝试通过将检测器暴露于辅助OOD数据集并结合对抗训练来解决这一挑战。然而，对抗训练固有的数据复杂性增加，以及测试阶段OOD样本可能出现的无数种形式，往往阻碍了这些方法建立鲁棒的决策边界。为突破这些局限，我们提出AROS——一种利用神经常微分方程（NODEs）结合Lyapunov稳定性定理来获取鲁棒嵌入向量的新型OOD检测方法。通过设计定制化的损失函数，我们应用Lyapunov稳定性理论确保分布内（ID）与OOD数据在动态系统中均收敛至稳定平衡点。该方法促使任何受扰动的输入都能回归其稳定平衡状态，从而增强模型对抗对抗扰动的鲁棒性。为避免使用额外数据，我们通过从ID数据特征空间的低似然区域采样来生成模拟OOD嵌入，以此逼近OOD数据可能存在的边界区域。为进一步增强鲁棒性，我们提出在稳定特征空间后引入正交二元层，以最大化ID与OOD样本平衡点之间的分离度。我们在多个基准数据集上通过大量实验验证了所提方法的有效性，结果表明其具有卓越性能，尤其在对抗攻击场景下表现突出。值得注意的是，我们的方法在CIFAR-10 vs. CIFAR-100任务中将鲁棒检测性能从37.8%提升至80.1%，在CIFAR-100 vs. CIFAR-10任务中从29.0%提升至67.0%。