Standard regression methods typically optimize a single pointwise objective, such as mean squared error, which conflates the learning of ordering with the learning of scale. This coupling renders models vulnerable to outliers and heavy-tailed noise. We propose CAIRO (Calibrate After Initial Rank Ordering), a framework that decouples regression into two distinct stages. In the first stage, we learn a scoring function by minimizing a scale-invariant ranking loss; in the second, we recover the target scale via isotonic regression. We theoretically characterize a class of "Optimal-in-Rank-Order" objectives -- including variants of RankNet and Gini covariance -- and prove that they recover the ordering of the true conditional mean under mild assumptions. We further show that subsequent monotone calibration recovers the true regression function at the population level and mathematically guarantees that finite-sample predictions are strictly auto-calibrated. Empirically, CAIRO combines the representation learning of neural networks with the robustness of rank-based statistics. It matches the performance of state-of-the-art tree ensembles on tabular benchmarks and significantly outperforms standard regression objectives in regimes with heavy-tailed or heteroskedastic noise.
翻译:标准回归方法通常优化单一逐点目标(如均方误差),这将序的学习与尺度的学习混为一谈。这种耦合使模型易受异常值和重尾噪声的影响。我们提出CAIRO(初始排序后校准)框架,将回归解耦为两个独立阶段:第一阶段通过最小化尺度不变的排序损失来学习评分函数;第二阶段通过等渗回归恢复目标尺度。我们从理论上刻画了一类“秩序最优”目标(包括RankNet变体和基尼协方差),并证明其在温和假设下能恢复真实条件均值的排序。进一步表明,后续的单调校准能在总体水平上恢复真实回归函数,并从数学上保证有限样本预测具有严格的自校准性。实证表明,CAIRO结合了神经网络的表示学习能力与基于秩统计的鲁棒性,在表格数据基准测试中与最先进的树集成方法性能相当,并在重尾或异方差噪声场景中显著优于标准回归目标。