Searching for mathematical results remains difficult: most existing tools retrieve entire papers, while mathematicians and theorem-proving agents often seek a specific theorem, lemma, or proposition that answers a query. While semantic search has seen rapid progress, its behavior on large, highly technical corpora such as research-level mathematical theorems remains poorly understood. In this work, we introduce and study semantic theorem retrieval at scale over a unified corpus of $9.2$ million theorem statements extracted from arXiv and seven other sources, representing the largest publicly available corpus of human-authored, research-level theorems. We represent each theorem with a short natural-language description as a retrieval representation and systematically analyze how representation context, language model choice, embedding model, and prompting strategy affect retrieval quality. On a curated evaluation set of theorem-search queries written by professional mathematicians, our approach substantially improves both theorem-level and paper-level retrieval compared to existing baselines, demonstrating that semantic theorem search is feasible and effective at web scale. The theorem search tool is available at \href{https://huggingface.co/spaces/uw-math-ai/theorem-search}{this link}, and the dataset is available at \href{https://huggingface.co/datasets/uw-math-ai/TheoremSearch}{this link}.
翻译:数学成果的检索依然面临困难:现有工具大多返回整篇论文,而数学家和定理证明智能体通常需要寻找能够回答查询的特定定理、引理或命题。尽管语义搜索技术发展迅速,但其在大型、高度专业化的语料库(如研究级数学定理)上的表现仍鲜为人知。本研究首次在大规模统一语料库上引入并系统研究了定理语义检索,该语料库包含从arXiv及其他七个来源提取的920万条定理陈述,是目前最大的公开人类撰写研究级定理语料库。我们为每个定理构建简短的自然语言描述作为检索表示,并系统分析了表示上下文、语言模型选择、嵌入模型及提示策略对检索质量的影响。在由专业数学家编写的定理搜索查询评估集上,我们的方法在定理级和论文级检索性能上均显著优于现有基线,证明语义定理搜索在网络规模下具有可行性和有效性。定理搜索工具可通过\href{https://huggingface.co/spaces/uw-math-ai/theorem-search}{此链接}访问,数据集可通过\href{https://huggingface.co/datasets/uw-math-ai/TheoremSearch}{此链接}获取。