Aspect-Aware Content-Based Recommendations for Mathematical Research Papers

from arxiv, Accepted for publication at the 49th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR) July 20--24, 2026, Melbourne, VIC, Australia

Content-based research paper recommendation (CbRPR) has seen advances in computer science and biomedicine, but remains unexplored for mathematics, where paper relatedness is more conceptual than explicit textual or citation-based similarity. Mathematics papers may be connected through shared proof techniques, logical implications, or natural generalizations, yet exhibit minimal textual or citation overlap, rendering existing CbRPR ineffective. To address this gap, we first conduct an expert-driven study characterizing mathematical recommendations, revealing that relevance is inherently \textit{aspect}-driven. Grounded in this insight, we introduce GoldRiM (small, expert-annotated) and SilverRiM (large, automatically derived), the first datasets for \textit{aspect}-aware CbRPR in mathematics. Recognizing that LLM embeddings of mathematical content alone yield suboptimal representation, we propose AchGNN, an \textit{aspect}-conditioned heterogeneous GNN that jointly models textual semantics, citation structure, and author lineage. Across GoldRiM and SilverRiM, AchGNN consistently outperforms prior \textit{aspect}-based CbRPR methods, achieving substantial gains across all evaluated \textit{aspects}. We conduct ablation studies to analyze the contributions of individual \textit{aspect} supervision, authorship lineage, and graph-structural signals to AchGNN's performance. To assess domain generality, we further evaluate AchGNN on the \textit{Papers with Code} dataset of machine learning publications, demonstrating that our \textit{aspect}-aware approach effectively transfers beyond mathematics. We deploy our system on the MaRDI platform to help mathematicians with recommendations and release datasets and code publicly for reproducibility.

翻译：基于内容的研究论文推荐（CbRPR）在计算机科学和生物医学领域已取得进展，但在数学领域仍属空白——数学论文的相关性更多体现在概念层面，而非明确的文本或引文相似性。数学论文可能通过共享的证明技巧、逻辑推论或自然泛化相互关联，但文本或引文重叠极少，导致现有CbRPR方法失效。为填补这一空白，我们首先开展专家驱动研究，揭示数学推荐的相关性本质上具有"方面"驱动特性。基于此洞察，我们构建了首个面向方面的数学CbRPR数据集：GoldRiM（小型专家标注）和SilverRiM（大规模自动生成）。针对数学内容的大语言模型嵌入表征欠佳问题，我们提出AchGNN——一种面向方面的异构图神经网络，能够联合建模文本语义、引用结构和作者传承关系。在GoldRiM和SilverRiM数据集上，AchGNN持续优于现有面向方面的CbRPR方法，在所有评估方面均取得显著性能提升。通过消融实验，我们分析了各方面监督信号、作者传承关系和图结构信号对AchGNN性能的贡献。为评估领域泛化性，我们在机器学习论文数据集Papers with Code上进一步测试，证明本方法能有效迁移至数学之外领域。我们已在MaRDI平台部署该系统辅助数学家进行论文推荐，并公开数据集和代码以确保可复现性。