The Confidence Gate Theorem: When Should Ranked Decision Systems Abstain?

Ranked decision systems -- recommenders, ad auctions, clinical triage queues -- must decide when to intervene in ranked outputs and when to abstain. We study when confidence-based abstention monotonically improves decision quality, and when it fails. The formal conditions are simple: rank-alignment and no inversion zones. The substantive contribution is identifying why these conditions hold or fail: the distinction between structural uncertainty (missing data, e.g., cold-start) and contextual uncertainty (missing context, e.g., temporal drift). Empirically, we validate this distinction across three domains: collaborative filtering (MovieLens, 3 distribution shifts), e-commerce intent detection (RetailRocket, Criteo, Yoochoose), and clinical pathway triage (MIMIC-IV). Structural uncertainty produces near-monotonic abstention gains in all domains; structurally grounded confidence signals (observation counts) fail under contextual drift, producing as many monotonicity violations as random abstention on our MovieLens temporal split. Context-aware alternatives -- ensemble disagreement and recency features -- substantially narrow the gap (reducing violations from 3 to 1--2) but do not fully restore monotonicity, suggesting that contextual uncertainty poses qualitatively different challenges. Exception labels defined from residuals degrade substantially under distribution shift (AUC drops from 0.71 to 0.61--0.62 across three splits), providing a clean negative result against the common practice of exception-based intervention. The results provide a practical deployment diagnostic: check C1 and C2 on held-out data before deploying a confidence gate, and match the confidence signal to the dominant uncertainty type.

翻译：排序决策系统——推荐系统、广告拍卖、临床分诊队列——必须决定何时干预排序输出，何时选择弃权。我们研究基于置信度的弃权何时能单调提升决策质量，以及何时会失效。其形式化条件很简单：排序对齐性与无反转区域。实质贡献在于识别这些条件成立或失效的原因：结构不确定性（数据缺失，例如冷启动问题）与情境不确定性（情境缺失，例如时间漂移）之间的区别。我们在三个领域通过实证验证了这一区别：协同过滤（MovieLens，3种分布偏移）、电子商务意图检测（RetailRocket、Criteo、Yoochoose）以及临床路径分诊（MIMIC-IV）。结构不确定性在所有领域均产生近乎单调的弃权收益；基于结构的置信度信号（观测计数）在情境漂移下失效，在我们的MovieLens时间划分上产生的单调性违反次数与随机弃权相当。情境感知的替代方案——集成分歧度与时效性特征——显著缩小了差距（将违反次数从3次减少至1–2次），但未能完全恢复单调性，这表明情境不确定性带来了性质不同的挑战。基于残差定义的异常标签在分布偏移下性能显著下降（在三个划分上AUC从0.71降至0.61–0.62），这为基于异常干预的常见做法提供了一个清晰的负面结果。研究结果为实际部署提供了诊断方法：在部署置信门之前，在保留数据上检查条件C1与C2，并使置信度信号与主导的不确定性类型相匹配。

相关内容

排序

关注 313

排序是计算机内经常进行的一种操作，其目的是将一组“无序”的记录序列调整为“有序”的记录序列。分内部排序和外部排序。若整个排序过程不需要访问外存便能完成，则称此类排序问题为内部排序。反之，若参加排序的记录数量很大，整个序列的排序过程不可能在内存中完成，则称此类排序问题为外部排序。内部排序的过程是一个逐步扩大记录的有序序列长度的过程。

【斯坦福博士论文】基于理解和采用理性行为的数据驱动序列决策，193页pdf

专知会员服务

34+阅读 · 2024年1月18日

【斯坦福大学博士论文】基于数据的序列决策制定，通过理解并采纳理性行为，193页pdf

专知会员服务

38+阅读 · 2023年10月8日

【佐治亚理工博士论文】基于策略智能体和有限反馈的序列决策

专知会员服务

62+阅读 · 2023年4月12日

【博士论文】弱反馈的序列决策问题

专知会员服务

25+阅读 · 2023年1月2日