The powerful generative abilities of large language models (LLMs) show potential in generating relevance labels for search applications. Previous work has found that directly asking about relevancy, such as ``How relevant is document A to query Q?", results in sub-optimal ranking. Instead, the pairwise ranking prompting (PRP) approach produces promising ranking performance through asking about pairwise comparisons, e.g., ``Is document A more relevant than document B to query Q?". Thus, while LLMs are effective at their ranking ability, this is not reflected in their relevance label generation. In this work, we propose a post-processing method to consolidate the relevance labels generated by an LLM with its powerful ranking abilities. Our method takes both LLM generated relevance labels and pairwise preferences. The labels are then altered to satisfy the pairwise preferences of the LLM, while staying as close to the original values as possible. Our experimental results indicate that our approach effectively balances label accuracy and ranking performance. Thereby, our work shows it is possible to combine both the ranking and labeling abilities of LLMs through post-processing.
翻译:大语言模型强大的生成能力在搜索引擎中生成相关性标签方面展现出潜力。以往研究发现,直接询问相关性(如“文档A与查询Q的相关性如何?”)会导致排序效果欠佳。相比之下,成对排序提示方法通过询问成对比较(例如“文档A是否比文档B与查询Q更相关?”)取得了优异的排序性能。因此,尽管大语言模型具备出色的排序能力,但这并未在其相关性标签生成中体现。本文提出一种后处理方法,将大语言模型生成的相关性标签与其强大的排序能力进行整合。该方法同时利用大语言模型生成的相关性标签和成对偏好,在尽可能保留原始标签值的前提下,根据模型的成对偏好调整标签。实验结果表明,我们的方法能有效平衡标签准确性与排序性能。由此证明,通过后处理可以融合大语言模型的排序与标签生成能力。