Safe deployment of AI models requires proactive detection of failures to prevent costly errors. To this end, we study the important problem of detecting failures in deep regression models. Existing approaches rely on epistemic uncertainty estimates or inconsistency w.r.t the training data to identify failure. Interestingly, we find that while uncertainties are necessary they are insufficient to accurately characterize failure in practice. Hence, we introduce PAGER (Principled Analysis of Generalization Errors in Regressors), a framework to systematically detect and characterize failures in deep regressors. Built upon the principle of anchored training in deep models, PAGER unifies both epistemic uncertainty and complementary manifold non-conformity scores to accurately organize samples into different risk regimes.
翻译:人工智能模型的安全部署需要主动检测故障,以防止代价高昂的错误。为此,我们研究了深度回归模型中故障检测这一重要问题。现有方法依赖于认知不确定性估计或与训练数据的不一致性来识别故障。有趣的是,我们发现尽管不确定性是必要的,但在实践中它们不足以准确表征故障。因此,我们提出了PAGER(回归器泛化误差的机理分析框架),这是一个用于系统检测和表征深度回归器故障的框架。PAGER建立在深度模型中锚定训练原理的基础上,统一了认知不确定性和互补的流形非一致性评分,以将样本准确划分到不同的风险区域。