Many practical machine learning systems, such as ranking and recommendation systems, consist of two concatenated stages: retrieval and ranking. These systems present significant challenges in accurately assessing and managing the uncertainty inherent in their predictions. To address these challenges, we extend the recently developed framework of conformal risk control, originally designed for single-stage problems, to accommodate the more complex two-stage setup. We first demonstrate that a straightforward application of conformal risk control, treating each stage independently, may fail to maintain risk at their pre-specified levels. Therefore, we propose an integrated approach that considers both stages simultaneously, devising algorithms to control the risk of each stage by jointly identifying thresholds for both stages. Our algorithm further optimizes for a weighted combination of prediction set sizes across all feasible thresholds, resulting in more effective prediction sets. Finally, we apply the proposed method to the critical task of two-stage ranked retrieval. We validate the efficacy of our method through extensive experiments on two large-scale public datasets, MSLR-WEB and MS MARCO, commonly used for ranked retrieval tasks.
翻译:许多实用的机器学习系统,例如排序和推荐系统,由两个串联的阶段组成:检索阶段和排序阶段。这些系统在准确评估和管理其预测中固有的不确定性方面提出了重大挑战。为应对这些挑战,我们将最近开发的保形风险控制框架(最初为单阶段问题设计)扩展至更复杂的两阶段设置。我们首先证明,若将保形风险控制简单地独立应用于每个阶段,可能无法将风险维持在预设水平。因此,我们提出一种综合考虑两个阶段的集成方法,通过联合确定两个阶段的阈值来设计控制各阶段风险的算法。我们的算法进一步在所有可行阈值上优化预测集大小的加权组合,从而产生更有效的预测集。最后,我们将所提出的方法应用于两阶段排序检索这一关键任务。通过在两个常用于排序检索任务的大规模公共数据集(MSLR-WEB 和 MS MARCO)上进行大量实验,我们验证了该方法的有效性。