We argue that Content-based filtering (CBF) and Graph-based methods (GB) complement one another in Academic Search recommendations. The scientific literature can be viewed as a conversation between authors and the audience. CBF uses abstracts to infer authors' positions, and GB uses citations to infer responses from the audience. In this paper, we describe nine differences between CBF and GB, as well as synergistic opportunities for hybrid combinations. Two embeddings will be used to illustrate these opportunities: (1) Specter, a CBF method based on BERT-like deepnet encodings of abstracts, and (2) ProNE, a GB method based on spectral clustering of more than 200M papers and 2B citations from Semantic Scholar.
翻译:我们认为,在学术搜索推荐中,基于内容的过滤方法(CBF)与基于图的方法(GB)具有互补性。科学文献可被视为作者与读者之间的对话:CBF利用摘要推断作者的立场,而GB则通过引用关系推断读者的回应。本文阐述了CBF与GB之间的九点差异,并探讨了混合方法的协同机遇。为具体说明这些机遇,我们将采用两种嵌入表示:(1)Specter——基于类BERT深度网络对摘要进行编码的CBF方法;(2)ProNE——基于对Semantic Scholar中超过2亿篇论文和20亿条引文进行谱聚类的GB方法。