Citations are the bedrock of scientific authority, yet their integrity is compromised by widespread miscitations: ranging from nuanced distortions to fabricated references. Systematic citation verification is currently unfeasible; manual review cannot scale to modern publishing volumes, while existing automated tools are restricted by abstract-only analysis or small-scale, domain-specific datasets in part due to the "paywall barrier" of full-text access. We introduce BibAgent, a scalable, end-to-end agentic framework for automated citation verification. BibAgent integrates retrieval, reasoning, and adaptive evidence aggregation, applying distinct strategies for accessible and paywalled sources. For paywalled references, it leverages a novel Evidence Committee mechanism that infers citation validity via downstream citation consensus. To support systematic evaluation, we contribute a 5-category Miscitation Taxonomy and MisciteBench, a massive cross-disciplinary benchmark comprising 6,350 miscitation samples spanning 254 fields. Our results demonstrate that BibAgent outperforms state-of-the-art Large Language Model (LLM) baselines in citation verification accuracy and interpretability, providing scalable, transparent detection of citation misalignments across the scientific literature.
翻译:引文是科学权威的基石,然而其完整性因广泛存在的误引现象而受到损害:从细微的扭曲到捏造的参考文献。系统性的引文验证目前尚不可行;人工审核无法扩展到现代出版规模,而现有的自动化工具受限于仅能分析摘要或使用小规模、特定领域的数据集,部分原因在于全文访问的“付费墙障碍”。我们提出了BibAgent,一个可扩展的端到端智能体框架,用于自动化引文验证。BibAgent集成了检索、推理和自适应证据聚合,针对可访问资源和付费墙资源应用不同的策略。对于付费墙参考文献,它利用一种新颖的证据委员会机制,通过下游引文共识推断引文有效性。为支持系统性评估,我们贡献了一个包含5个类别的误引分类法以及MisciteBench——一个大规模跨学科基准数据集,包含涵盖254个领域的6,350个误引样本。我们的结果表明,BibAgent在引文验证准确性和可解释性方面优于最先进的大型语言模型基线,为科学文献中的引文失准提供了可扩展、透明的检测方法。