Citations are the bedrock of scientific authority, yet their integrity is compromised by widespread miscitations: ranging from nuanced distortions to fabricated references. Systematic citation verification is currently unfeasible; manual review cannot scale to modern publishing volumes, while existing automated tools are restricted by abstract-only analysis or small-scale, domain-specific datasets in part due to the "paywall barrier" of full-text access. We introduce BibAgent, a scalable, end-to-end agentic framework for automated citation verification. BibAgent integrates retrieval, reasoning, and adaptive evidence aggregation, applying distinct strategies for accessible and paywalled sources. For paywalled references, it leverages a novel Evidence Committee mechanism that infers citation validity via downstream citation consensus. To support systematic evaluation, we contribute a 5-category Miscitation Taxonomy and MisciteBench, a massive cross-disciplinary benchmark comprising 6,350 miscitation samples spanning 254 fields. Our results demonstrate that BibAgent outperforms state-of-the-art Large Language Model (LLM) baselines in citation verification accuracy and interpretability, providing scalable, transparent detection of citation misalignments across the scientific literature.
翻译:引用是科学权威的基石,然而其完整性因普遍存在的错误引用而受到损害:从细微的曲解到捏造的参考文献。系统性的引用验证目前尚不可行;人工审阅无法扩展到现代出版规模,而现有的自动化工具受限于仅摘要分析或小规模、领域特定的数据集,部分原因在于全文访问的“付费墙障碍”。我们提出了BibAgent,一个可扩展的端到端智能体框架,用于自动化引用验证。BibAgent集成了检索、推理和自适应证据聚合,针对可访问和付费墙来源应用不同的策略。对于付费墙参考文献,它利用一种新颖的证据委员会机制,通过下游引用共识推断引用有效性。为支持系统性评估,我们贡献了一个五类别的错误引用分类法以及MisciteBench——一个大规模跨学科基准数据集,包含涵盖254个领域的6,350个错误引用样本。我们的结果表明,BibAgent在引用验证准确性和可解释性方面优于最先进的大型语言模型基线,为科学文献中的引用错位提供了可扩展、透明的检测能力。