Relevance evaluation of a query and a passage is essential in Information Retrieval (IR). Recently, numerous studies have been conducted on tasks related to relevance judgment using Large Language Models (LLMs) such as GPT-4, demonstrating significant improvements. However, the efficacy of LLMs is considerably influenced by the design of the prompt. The purpose of this paper is to identify which specific terms in prompts positively or negatively impact relevance evaluation with LLMs. We employed two types of prompts: those used in previous research and generated automatically by LLMs. By comparing the performance of these prompts in both few-shot and zero-shot settings, we analyze the influence of specific terms in the prompts. We have observed two main findings from our study. First, we discovered that prompts using the term answerlead to more effective relevance evaluations than those using relevant. This indicates that a more direct approach, focusing on answering the query, tends to enhance performance. Second, we noted the importance of appropriately balancing the scope of relevance. While the term relevant can extend the scope too broadly, resulting in less precise evaluations, an optimal balance in defining relevance is crucial for accurate assessments. The inclusion of few-shot examples helps in more precisely defining this balance. By providing clearer contexts for the term relevance, few-shot examples contribute to refine relevance criteria. In conclusion, our study highlights the significance of carefully selecting terms in prompts for relevance evaluation with LLMs.
翻译:查询与段落的相关性评估是信息检索(IR)中的核心任务。近年来,利用GPT-4等大语言模型(LLMs)进行相关性判断的相关任务研究取得了显著进展。然而,LLMs的有效性在很大程度上受提示词设计的影响。本文旨在识别提示词中哪些特定术语会正向或负向影响LLMs的相关性评估。我们采用了两类提示词:一类来自以往研究,另一类由LLMs自动生成。通过比较这些提示词在少样本与零样本场景下的表现,我们分析了提示词中特定术语的作用。研究主要发现两点:首先,使用术语"answer"的提示词在相关性评估中比使用"relevant"的提示词更有效,这表明聚焦于回答查询的直接方法更能提升性能。其次,我们注意到合理平衡相关性范围的重要性。虽然术语"relevant"可能过度扩展范围导致评估精度下降,但恰当定义相关性对于精准评估至关重要。少样本示例有助于更精确地界定这一平衡:通过为"relevance"提供更清晰的上下文,少样本示例能够优化相关性标准的设定。总之,本研究揭示了在LLMs相关性评估中审慎选择提示词术语的重要性。