Conformal prediction (CP) has become a cornerstone of distribution-free uncertainty quantification, conventionally evaluated by its coverage and interval length. This work critically examines the sufficiency of these standard metrics. We demonstrate that the interval length might be deceptively improved through a counter-intuitive approach termed Prejudicial Trick (PT), while the coverage remains valid. Specifically, for any given test sample, PT probabilistically returns an interval, which is either null or constructed using an adjusted confidence level, thereby preserving marginal coverage. While PT potentially yields a deceptively lower interval length, it introduces practical vulnerabilities: the same input can yield completely different prediction intervals across repeated runs of the algorithm. We formally derive the conditions under which PT achieves these misleading improvements and provides extensive empirical evidence across various regression and classification tasks. Furthermore, we introduce a new metric interval stability which helps detect whether a new CP method implicitly improves the length based on such PT-like techniques.
翻译:保形预测已成为无分布不确定性量化的基石,传统上通过其覆盖率和区间长度进行评估。本研究批判性地审视了这些标准指标的充分性。我们证明,通过一种反直觉的偏见性技巧方法,区间长度可能被欺骗性地优化,同时覆盖率保持有效。具体而言,对于任意给定的测试样本,PT会概率性地返回一个区间,该区间要么为空集,要么通过调整置信水平构建,从而保持边际覆盖率。虽然PT可能产生欺骗性更低的区间长度,但它引入了实际脆弱性:同一输入在算法多次运行中可能产生完全不同的预测区间。我们形式化推导了PT实现这种误导性改进的条件,并在多种回归与分类任务中提供了广泛的实证证据。此外,我们提出了新的区间稳定性指标,有助于检测新的CP方法是否基于此类PT式技术隐式优化了区间长度。