Scientific papers make claims about prior work backed by citations. Verifying those citations at scale (that each cited paper exists, says what the citation claims, and is itself reliable) is structurally beyond what human review can deliver: a typical paper has dozens of citations, and a careful reviewer reads at most a handful end-to-end. AI-assisted writing makes this gap even more urgent: LLMs hallucinate references and may fill in plausible details from titles or abstracts of papers they never read, worse for the smaller local-weights models that privacy-aware researchers must use. sciwrite-lint applies the linting paradigm from software engineering to citation verification: it runs entirely on the researcher's machine (free public databases, a single consumer GPU, and open-weights models), is fast enough to re-lint between revisions so authors catch problems at the source while drafting, and serves journals and reviewers as an automated first pass. The pipeline checks reference existence, metadata accuracy, retraction status, and claim support, traverses one level into cited papers' bibliographies, and produces per-reference reliability scores. We evaluate on 30 unseen papers (arXiv and bioRxiv) with error injection and LLM-adjudicated false-positive analysis. The same linting workflow extends to internal consistency: numbers in text vs. tables, abstract vs. body, figure captions vs. content, statistical results vs. their verbal interpretation, plus structural cross-references (dangling cites, orphan references). As a separate experimental contribution we also propose SciLint Score: citation-chain integrity combined with a contribution component operationalizing five philosophy-of-science frameworks (Popper, Lakatos, Kitcher, Laudan, Mayo).
翻译:科学论文通过对前人工作的引用加以论证。大规模验证这些引用(即确保每篇被引文献真实存在、其内容与论文所声称的相符,且自身具备可靠性)在结构上超出了人类审稿的能力范围:一篇典型论文有数十条引用,而一位严谨的审稿人最多能通篇细读其中寥寥数篇。AI辅助写作使这一鸿沟更加严峻:大语言模型会虚构参考文献,并可能从它们从未通读过的论文的标题或摘要中填充看似合理的细节——这一点对于隐私意识强的研究者必须使用的较小规模的本地权重模型而言更为严重。sciwrite-lint将软件工程中的代码检查(linting)范式应用于引用验证:它完全在研究者本地机器上运行(依赖免费的公共数据库、一块消费级GPU和开放权重的模型),速度快到可在修订期间重新检查,使作者能在撰稿过程中及早源头发现问题;同时,它为期刊和审稿人提供自动化的首轮验证。该流水线检查文献存在的真实性、元数据准确性、撤稿状态以及声明支持度,并深入一层检查被引论文的参考文献列表,最后为每条引用生成可靠性评分。我们在30篇未见过的论文(来自arXiv和bioRxiv)上进行了评估,采用了错误注入和大语言模型裁决的假阳性分析。相同的检查流程还可扩展到内部一致性验证:正文与表格中的数字、摘要与正文、图注与内容、统计结果与其文字诠释,以及结构性的交叉引用(如悬空引用、孤立参考文献)。作为另一项实验性贡献,我们还提出了SciLint评分:将引用链完整性结合一个贡献组分,该组分将五种科学哲学框架(波普尔、拉卡托斯、基切尔、劳丹、梅奥)操作化。