The enormous growth of research publications has made it challenging for academic search engines to bring the most relevant papers against the given search query. Numerous solutions have been proposed over the years to improve the effectiveness of academic search, including exploiting query expansion and citation analysis. Query expansion techniques mitigate the mismatch between the language used in a query and indexed documents. However, these techniques can suffer from introducing non-relevant information while expanding the original query. Recently, contextualized model BERT to document retrieval has been quite successful in query expansion. Motivated by such issues and inspired by the success of BERT, this paper proposes a novel approach called QeBERT. QeBERT exploits BERT-based embedding and Citation Network Analysis (CNA) in query expansion for improving scholarly search. Specifically, we use the context-aware BERT-embedding and CNA for query expansion in Pseudo-Relevance Feedback (PRF) fash-ion. Initial experimental results on the ACL dataset show that BERT-embedding can provide a valuable augmentation to query expansion and improve search relevance when combined with CNA.
翻译:研究文献的急剧增长使得学术搜索引擎难以针对给定查询返回最相关的论文。多年来,为提升学术搜索有效性,人们提出了众多解决方案,包括利用查询扩展和引文分析技术。查询扩展技术可缓解查询用语与索引文档之间的不匹配问题,但这类技术可能在扩展原始查询时引入非相关信息。近年来,基于上下文化的BERT模型在文档检索查询扩展方面取得了显著成功。受此类问题驱动和BERT成功的启发,本文提出了一种名为QeBERT的新方法。QeBERT利用基于BERT的嵌入和引文网络分析( CNA )进行查询扩展以改进学术搜索。具体而言,我们采用上下文感知的BERT嵌入和CNA,以伪相关反馈( PRF )方式实现查询扩展。在ACL数据集上的初步实验结果表明,BERT嵌入能为查询扩展提供有价值的增强,并在与CNA结合时提升搜索相关性。