CAIRO: Decoupling Order from Scale in Regression

Standard regression methods typically optimize a single pointwise objective, such as mean squared error, which conflates the learning of ordering with the learning of scale. This coupling renders models vulnerable to outliers and heavy-tailed noise. We propose CAIRO (Calibrate After Initial Rank Ordering), a framework that decouples regression into two distinct stages. In the first stage, we learn a scoring function by minimizing a scale-invariant ranking loss; in the second, we recover the target scale via isotonic regression. We theoretically characterize a class of "Optimal-in-Rank-Order" objectives -- including variants of RankNet and Gini covariance -- and prove that they recover the ordering of the true conditional mean under mild assumptions. We further show that subsequent monotone calibration recovers the true regression function at the population level and mathematically guarantees that finite-sample predictions are strictly auto-calibrated. Empirically, CAIRO combines the representation learning of neural networks with the robustness of rank-based statistics. It matches the performance of state-of-the-art tree ensembles on tabular benchmarks and significantly outperforms standard regression objectives in regimes with heavy-tailed or heteroskedastic noise.

翻译：标准回归方法通常优化单一逐点目标（如均方误差），这将序的学习与尺度的学习混为一谈。这种耦合使模型易受异常值和重尾噪声的影响。我们提出CAIRO（初始排序后校准）框架，将回归解耦为两个独立阶段：第一阶段通过最小化尺度不变的排序损失来学习评分函数；第二阶段通过等渗回归恢复目标尺度。我们从理论上刻画了一类“秩序最优”目标（包括RankNet变体和基尼协方差），并证明其在温和假设下能恢复真实条件均值的排序。进一步表明，后续的单调校准能在总体水平上恢复真实回归函数，并从数学上保证有限样本预测具有严格的自校准性。实证表明，CAIRO结合了神经网络的表示学习能力与基于秩统计的鲁棒性，在表格数据基准测试中与最先进的树集成方法性能相当，并在重尾或异方差噪声场景中显著优于标准回归目标。

相关内容

排序

关注 313

排序是计算机内经常进行的一种操作，其目的是将一组“无序”的记录序列调整为“有序”的记录序列。分内部排序和外部排序。若整个排序过程不需要访问外存便能完成，则称此类排序问题为内部排序。反之，若参加排序的记录数量很大，整个序列的排序过程不可能在内存中完成，则称此类排序问题为外部排序。内部排序的过程是一个逐步扩大记录的有序序列长度的过程。

论学习、公平性与复杂度

专知会员服务

11+阅读 · 2月28日

【IJCAI2025教程】基于梯度的多目标深度学习，221页ppt

专知会员服务

24+阅读 · 2025年8月31日

《子空间学习机 (SLM)：一种新的分类和回归方法》2022最新35页技术报告，美陆军研究实验室

专知会员服务

31+阅读 · 2022年11月28日