In information retrieval (IR), learning-to-rank (LTR) methods have traditionally limited themselves to discriminative machine learning approaches that model the probability of the document being relevant to the query given some feature representation of the query-document pair. In this work, we propose an alternative denoising diffusion-based deep generative approach to LTR that instead models the full joint distribution over feature vectors and relevance labels. While in the discriminative setting, an over-parameterized ranking model may find different ways to fit the training data, we hypothesize that candidate solutions that can explain the full data distribution under the generative setting produce more robust ranking models. With this motivation, we propose DiffusionRank that extends TabDiff, an existing denoising diffusion-based generative model for tabular datasets, to create generative equivalents of classical discriminative pointwise and pairwise LTR objectives. Our empirical results demonstrate significant improvements from DiffusionRank models over their discriminative counterparts. Our work points to a rich space for future research exploration on how we can leverage ongoing advancements in deep generative modeling approaches, such as diffusion, for learning-to-rank in IR.
翻译:在信息检索领域,排序学习方法传统上局限于判别式机器学习方法,这些方法基于查询-文档对的特征表示,对文档与查询相关的概率进行建模。在本工作中,我们提出了一种基于去噪扩散的深度生成式排序学习替代方法,该方法转而建模特征向量与相关性标签的完整联合分布。在判别式设定下,一个过参数化的排序模型可能会找到不同的方式来拟合训练数据,而我们假设在生成式设定下,能够解释完整数据分布的候选解能产生更稳健的排序模型。基于此动机,我们提出了DiffusionRank,它扩展了TabDiff(一种现有的基于去噪扩散的表格数据集生成模型),以创建经典判别式逐点与成对排序学习目标的生成式等价模型。我们的实证结果表明,DiffusionRank模型相较于其判别式对应模型取得了显著改进。我们的工作为未来研究指明了广阔空间,即如何利用深度生成建模方法(如扩散模型)的持续进展,来推进信息检索中的排序学习。