While question-like queries are gaining popularity and search engines' users increasingly adopt them, keyphrase search has traditionally been the cornerstone of web search. This query type is also prevalent in specialised search tasks such as academic or professional search, where experts rely on keyphrases to articulate their information needs. However, current dense retrieval models often fail with keyphrase-like queries, primarily because they are mostly trained on question-like ones. This paper introduces a novel model that employs the ColBERT architecture to enhance document ranking for keyphrase queries. For that, given the lack of large keyphrase-based retrieval datasets, we first explore how Large Language Models can convert question-like queries into keyphrase format. Then, using those keyphrases, we train a keyphrase-based ColBERT ranker (ColBERTKP_QD) to improve the performance when working with keyphrase queries. Furthermore, to reduce the training costs associated with training the full ColBERT model, we investigate the feasibility of training only a keyphrase query encoder while keeping the document encoder weights static (ColBERTKP_Q). We assess our proposals' ranking performance using both automatically generated and manually annotated keyphrases. Our results reveal the potential of the late interaction architecture when working under the keyphrase search scenario.
翻译:尽管类问题查询日益普及且搜索引擎用户越来越多地采用此类查询,关键词搜索历来是网络搜索的基石。此类查询在学术检索或专业检索等专门搜索任务中也普遍存在,专家依赖关键词来表达其信息需求。然而,当前稠密检索模型在处理类关键词查询时往往表现不佳,主要原因是这些模型大多基于类问题查询进行训练。本文提出一种采用ColBERT架构的新型模型,以增强针对关键词查询的文档排序效果。鉴于缺乏大规模基于关键词的检索数据集,我们首先探索如何利用大语言模型将类问题查询转换为关键词格式。随后,使用这些关键词训练基于关键词的ColBERT排序器(ColBERTKP_QD),以提升处理关键词查询时的性能。此外,为降低训练完整ColBERT模型的相关成本,我们研究了仅训练关键词查询编码器而保持文档编码器权重不变(ColBERTKP_Q)的可行性。我们通过自动生成和人工标注的关键词评估所提方案的排序性能。实验结果表明,在关键词搜索场景下,延迟交互架构具有显著潜力。