Query reformulation is a key mechanism to alleviate the linguistic chasm of query in ad-hoc retrieval. Among various solutions, query reduction effectively removes extraneous terms and specifies concise user intent from long queries. However, it is challenging to capture hidden and diverse user intent. This paper proposes Contextualized Query Reduction (ConQueR) using a pre-trained language model (PLM). Specifically, it reduces verbose queries with two different views: core term extraction and sub-query selection. One extracts core terms from an original query at the term level, and the other determines whether a sub-query is a suitable reduction for the original query at the sequence level. Since they operate at different levels of granularity and complement each other, they are finally aggregated in an ensemble manner. We evaluate the reduction quality of ConQueR on real-world search logs collected from a commercial web search engine. It achieves up to 8.45% gains in exact match scores over the best competing model.
翻译:查询重写是缓解临时检索中查询语言鸿沟的关键机制。在各种解决方案中,查询缩减可以有效去除冗余词项,从长查询中明确精简的用户意图。然而,捕捉隐藏且多样化的用户意图仍具有挑战性。本文提出一种利用预训练语言模型(PLM)的上下文感知查询缩减方法(ConQueR)。具体而言,该方法通过两种不同视角缩减冗长查询:核心词项提取与子查询选择。前者在词项级别从原始查询中提取核心词项,后者在序列级别判断子查询是否为原始查询的合适缩减。由于这两种方法在不同粒度层级上运作且相互补充,最终以集成方式聚合其结果。我们基于从商业网络搜索引擎收集的真实搜索日志评估了ConQueR的缩减质量。与最优竞争模型相比,该方法在精确匹配分数上取得了高达8.45%的提升。