Out-of-distribution (OOD) detection in dynamic open-world environments requires a model to continually adapt to evolving data distributions while generalizing to covariate-shifted inputs and rejecting semantic-shifted OOD examples. Most existing OOD detection methods optimize only the current-step objective and do not explicitly account for how post-deployment environment changes affect future OOD behavior. In this paper, we establish a theoretical grounding for dynamic OOD detection using a reinforcement learning (RL)-guided optimizer that explicitly favors updates that reduce the semantic OOD false positive rate over time. We develop a novel augmented optimizer that uses an RL-guided correction term on top of standard gradient descent (GD) and show its improvement over both future-domain generalization and semantic-OOD rejection. We analyze temporal error decomposition in terms of model-change and environment-change generalization errors and develop a new theoretical framework for comparing the generalization errors under both GD and RL-guided optimizers.
翻译:动态开放环境中的分布外(OOD)检测要求模型在持续适应演化数据分布的同时,既能泛化到协变量偏移的输入,同时拒绝语义偏移的OOD样本。现有大多数OOD检测方法仅优化当前步目标,且未明确考虑部署后环境变化对未来OOD行为的影响。本文利用强化学习(RL)引导的优化器为动态OOD检测奠定理论基础,该优化器显式偏好能随时间降低语义OOD假阳性率的更新。我们开发了一种新型增强优化器,在标准梯度下降(GD)基础上引入RL引导的校正项,并证明其在未来域泛化与语义OOD拒绝两方面较传统方法的优越性。我们通过模型变化与环境变化的泛化误差对时间误差进行分解,并构建新的理论框架用于比较GD与RL引导优化器下的泛化误差差异。