Ranking models are extensively used in e-commerce for relevance estimation. These models often suffer from poor interpretability and no scale calibration, particularly when trained with typical ranking loss functions. This paper addresses the problem of post-hoc calibration of ranking models. We introduce MLPlatt: a simple yet effective ranking model calibration method that preserves the item ordering and converts ranker outputs to interpretable click-through rate (CTR) probabilities usable in downstream tasks. The method is context-aware by design and achieves good calibration metrics globally, and within strata corresponding to different values of a selected categorical field (such as user country or device), which is often important from a business perspective of an E-commerce platform. We demonstrate the superiority of MLPlatt over existing approaches on two datasets, achieving an improvement of over 10\% in F-ECE (Field Expected Calibration Error) compared to other methods. Most importantly, we show that high-quality calibration can be achieved without compromising the ranking quality.
翻译:排序模型在电子商务领域被广泛应用于相关性估计。这些模型通常存在可解释性差和缺乏尺度校准的问题,尤其是在使用典型排序损失函数进行训练时。本文针对排序模型的事后校准问题展开研究。我们提出了MLPlatt:一种简单而有效的排序模型校准方法,该方法在保持项目排序的同时,将排序器输出转换为可用于下游任务的可解释点击率概率。该方法在设计上具有上下文感知能力,不仅在全局层面实现了良好的校准指标,而且在对应于选定分类字段(如用户国家或设备)不同取值的分层内部也表现优异,这对于电子商务平台的商业视角通常至关重要。我们在两个数据集上证明了MLPlatt相对于现有方法的优越性,与其他方法相比,在F-ECE(字段期望校准误差)指标上实现了超过10%的提升。最重要的是,我们证明了高质量校准的实现无需以牺牲排序质量为代价。