Ordinal regression and ranking are challenging due to inherent ordinal dependencies that conventional methods struggle to model. We propose Ranking-Aware Reinforcement Learning (RARL), a novel RL framework that explicitly learns these relationships. At its core, RARL features a unified objective that synergistically integrates regression and Learning-to-Rank (L2R), enabling mutual improvement between the two tasks. This is driven by a ranking-aware verifiable reward that jointly assesses regression precision and ranking accuracy, facilitating direct model updates via policy optimization. To further enhance training, we introduce Response Mutation Operations (RMO), which inject controlled noise to improve exploration and prevent stagnation at saddle points. The effectiveness of RARL is validated through extensive experiments on three distinct benchmarks.
翻译:序数回归与排序因其固有的序数依赖关系而具有挑战性,传统方法难以有效建模。我们提出了排序感知强化学习(RARL),这是一种新颖的强化学习框架,能够显式地学习这些关系。RARL的核心在于一个统一的目标函数,它协同整合了回归任务与排序学习(L2R)任务,使得两者能够相互促进。该目标由一个排序感知的可验证奖励函数驱动,该函数联合评估回归精度与排序准确性,并通过策略优化直接促进模型更新。为了进一步提升训练效果,我们引入了响应突变操作(RMO),通过注入受控噪声来改善探索并防止模型陷入鞍点停滞。RARL的有效性在三个不同的基准数据集上进行了广泛实验验证。