citecheck: An MCP Server for Automated Bibliographic Verification and Repair in Scholarly Manuscripts

from arxiv, 6 pages, 1 figure. Software paper on bibliography verification and repair for scholarly manuscripts; includes MCP server implementation and evaluation on repository-backed tests

Reference lists in scholarly manuscripts frequently contain errors, including incorrect identifiers, incomplete metadata, misattributed authors, and mismatches between preprint and published versions. These problems are tedious to repair manually and have become more visible in workflows that rely on large language models, which can fabricate or corrupt citations. We present citecheck, a TypeScript system and MCP server for automated bibliographic verification and repair in paper-like project folders. Given a manuscript file or workspace, citecheck selects the most likely paper artifact, extracts references from .bib, .tex, .md, .txt, or .docx, validates entries against PubMed, Crossref, arXiv, and Semantic Scholar, and returns structured correction proposals together with replacement-safety diagnostics. The current repository provides a working research prototype with multi-pass retrieval, manifestation-aware matching, policy-gated rewrite planning, and 47 passing tests covering repair behavior, malformed payload handling, transport failures, and MCP exposure. We position citecheck as infrastructure for agentic scholarly editing and as a practical guardrail against both traditional reference errors and LLM-induced citation hallucinations.

翻译：学术手稿的参考文献列表常存在错误，包括标识符错误、元数据不完整、作者归属错误以及预印本与正式版本不匹配等问题。这些错误不仅人工修正繁琐，且在依赖大型语言模型的工作流程中更为凸显——大模型可能生成或篡改引文。本文提出 citecheck——一种面向论文式项目文件夹的 TypeScript 系统及 MCP 服务器，用于自动验证与修复文献信息。给定手稿文件或工作空间后，citecheck 选取最可能的论文实体，从 .bib、.tex、.md、.txt 或 .docx 文件中提取参考文献，通过 PubMed、Crossref、arXiv 和 Semantic Scholar 验证条目，并返回结构化修正建议及替换安全诊断。当前代码库提供了可运行的研究原型，具备多轮检索、显式匹配、策略门控重写规划等功能，并通过47项测试覆盖修复行为、畸形载荷处理、传输故障及 MCP 暴露场景。我们将 citecheck 定位为智能化学术编辑的基础设施，以及抵御传统参考文献错误与LLM诱导引用幻觉的实用防护机制。

相关内容

服务器

关注 14

服务器，也称伺服器，是提供计算服务的设备。由于服务器需要响应服务请求，并进行处理，因此一般来说服务器应具备承担服务并且保障服务的能力。
服务器的构成包括处理器、硬盘、内存、系统总线等，和通用的计算机架构类似，但是由于需要提供高可靠的服务，因此在处理能力、稳定性、可靠性、安全性、可扩展性、可管理性等方面要求较高。

PaperOrchestra：一种面向自动化 AI 学术论文撰写的多智能体框架

专知会员服务

13+阅读 · 4月9日

如何画好论文框架图？北大谷歌发布PaperBanana：面向人工智能学者的学术论文绘图自动化系统

专知会员服务

19+阅读 · 2月5日

【CIKM2020】【CIKM2020-Tutorial】多模型数据查询语言与处理范式，96页ppt

专知会员服务

11+阅读 · 2020年10月27日

COVID-19文献知识图谱构建，UIUC-哥伦比亚大学

专知会员服务

43+阅读 · 2020年7月2日