The expanding Lean 4 ecosystem poses challenges for navigating its vast libraries. This paper introduces LeanExplore, a search engine for Lean 4 declarations. LeanExplore enables users to semantically search for statements, both formally and informally, across select Lean 4 packages (including Batteries, Init, Lean, Mathlib, PhysLean, and Std). This search capability is powered by a hybrid ranking strategy, integrating scores from a multi-source semantic embedding model (capturing conceptual meaning from formal Lean code, docstrings, AI-generated informal translations, and declaration titles), BM25+ for keyword-based lexical relevance, and a PageRank-based score reflecting declaration importance and interconnectedness. The search engine is accessible via a dedicated website (https://www.leanexplore.com/) and a Python API (https://github.com/justincasher/lean-explore). Furthermore, the database can be downloaded, allowing users to self-host the service. LeanExplore integrates easily with LLMs via the model context protocol (MCP), enabling users to chat with an AI assistant about Lean declarations or utilize the search engine for building theorem-proving agents. This work details LeanExplore's architecture, data processing, functionalities, and its potential to enhance Lean 4 workflows and AI-driven mathematical research
翻译:日益扩展的Lean 4生态系统对浏览其庞大库文件提出了挑战。本文介绍了LeanExplore——一个面向Lean 4声明的搜索引擎。LeanExplore使用户能够跨选定Lean 4包(包括Batteries、Init、Lean、Mathlib、PhysLean和Std)以语义方式搜索形式化及非形式化的陈述。该搜索功能由混合排序策略驱动,整合了多源语义嵌入模型(从形式化Lean代码、文档字符串、AI生成的非形式化翻译及声明标题中捕获概念含义)的评分、基于关键词词汇相关性的BM25+算法,以及反映声明重要性与互联性的PageRank评分。该搜索引擎可通过专用网站(https://www.leanexplore.com/)和Python API(https://github.com/justincasher/lean-explore)访问。此外,数据库支持下载,允许用户自行托管服务。LeanExplore通过模型上下文协议(MCP)可轻松与大型语言模型集成,使用户能够就Lean声明与AI助手对话,或利用该搜索引擎构建定理证明智能体。本文详细阐述了LeanExplore的架构、数据处理流程、功能特性及其在优化Lean 4工作流程和AI驱动数学研究方面的潜力。