NDCG, namely Normalized Discounted Cumulative Gain, is a widely used ranking metric in information retrieval and machine learning. However, efficient and provable stochastic methods for maximizing NDCG are still lacking, especially for deep models. In this paper, we propose a principled approach to optimize NDCG and its top-$K$ variant. First, we formulate a novel compositional optimization problem for optimizing the NDCG surrogate, and a novel bilevel compositional optimization problem for optimizing the top-$K$ NDCG surrogate. Then, we develop efficient stochastic algorithms with provable convergence guarantees for the non-convex objectives. Different from existing NDCG optimization methods, the per-iteration complexity of our algorithms scales with the mini-batch size instead of the number of total items. To improve the effectiveness for deep learning, we further propose practical strategies by using initial warm-up and stop gradient operator. Experimental results on multiple datasets demonstrate that our methods outperform prior ranking approaches in terms of NDCG. To the best of our knowledge, this is the first time that stochastic algorithms are proposed to optimize NDCG with a provable convergence guarantee. Our proposed methods are implemented in the LibAUC library at https://libauc.org/.
翻译:NDCG(归一化折损累计增益)是信息检索与机器学习领域广泛使用的排名指标。然而,目前仍缺乏高效且可证明的随机优化方法用于最大化NDCG,特别是针对深度模型。本文提出了一种优化NDCG及其top-$K$变体的系统性方法。首先,我们针对NDCG代理函数优化问题构建了新型组合优化框架,并针对top-$K$ NDCG代理函数优化问题构建了新型双层组合优化框架。随后,我们开发了面向非凸目标函数的高效随机算法,并给出了可证明的收敛性保证。与现有NDCG优化方法不同,本算法每次迭代的计算复杂度与最小批处理大小相关,而非总项目数量。为提升深度学习场景下的有效性,我们进一步提出采用初始预热与梯度停止算子的实用策略。多数据集实验结果表明,本方法在NDCG指标上优于现有排名方法。据我们所知,这是首次提出具有可证明收敛性保证的随机算法用于优化NDCG。本方法已在LibAUC库(https://libauc.org/)中实现。