Standard regression methods typically optimize a single pointwise objective, such as mean squared error, which conflates the learning of ordering with the learning of scale. This coupling renders models vulnerable to outliers and heavy-tailed noise. We propose CAIRO (Calibrate After Initial Rank Ordering), a framework that decouples regression into two distinct stages. In the first stage, we learn a scoring function by minimizing a scale-invariant ranking loss; in the second, we recover the target scale via isotonic regression. We theoretically characterize a class of "Optimal-in-Rank-Order" objectives -- including variants of RankNet and Gini covariance -- and prove that they recover the ordering of the true conditional mean under mild assumptions. We further show that subsequent monotone calibration guarantees recovery of the true regression function. Empirically, CAIRO combines the representation learning of neural networks with the robustness of rank-based statistics. It matches the performance of state-of-the-art tree ensembles on tabular benchmarks and significantly outperforms standard regression objectives in regimes with heavy-tailed or heteroskedastic noise.
翻译:标准回归方法通常优化单一逐点目标(如均方误差),这混淆了顺序学习与尺度学习。这种耦合使模型易受异常值和重尾噪声的影响。我们提出CAIRO(初始排序后校准)框架,将回归解耦为两个独立阶段:第一阶段通过最小化尺度不变的排序损失来学习评分函数;第二阶段通过保序回归恢复目标尺度。我们从理论上刻画了一类“秩序最优”目标函数(包括RankNet变体和基尼协方差),并证明其在温和假设下能恢复真实条件均值的排序。进一步证明后续单调校准可保证真实回归函数的恢复。实证表明,CAIRO结合了神经网络的表征学习能力与基于秩统计的鲁棒性,在表格数据基准测试中与最先进的树集成方法性能相当,并在重尾或异方差噪声场景中显著优于标准回归目标。