Science currently offers two options for quality assurance, both inadequate. Journal gatekeeping claims to verify both integrity and contribution, but actually measures prestige: peer review is slow, biased, and misses fabricated citations even at top venues. Open science provides no quality assurance at all: the only filter between AI-generated text and the public record is the author's integrity. AI-assisted writing makes both worse by producing more papers faster than either system can absorb. We propose a third option: measure the paper itself. sciwrite-lint (pip install sciwrite-lint) is an open-source linter for scientific manuscripts that runs entirely on the researcher's machine (free public databases, a single consumer GPU, and open-weights models) with no manuscripts sent to external services. The pipeline verifies that references exist, checks retraction status, compares metadata against canonical records, downloads and parses cited papers, verifies that they support the claims made about them, and follows one level further to check cited papers' own bibliographies. Each reference receives a per-reference reliability score aggregating all verification signals. We evaluate the pipeline on 30 unseen papers from arXiv and bioRxiv with error injection and LLM-adjudicated false positive analysis. As an experimental extension, we propose SciLint Score, combining integrity verification with a contribution component that operationalizes five frameworks from philosophy of science (Popper, Lakatos, Kitcher, Laudan, Mayo) into computable structural properties of scientific arguments. The integrity component is the core of the tool and is evaluated in this paper; the contribution component is released as experimental code for community development.
翻译:当前科学界质量保障的两种方案均存在不足。期刊把关机制声称能验证研究的完整性与贡献,实则衡量学术声望——同行评审过程缓慢、存在偏见,即便在顶级期刊中也无法识别伪造引用。开放科学完全缺失质量保障:在AI生成文本与公共记录之间,唯一的过滤屏障是作者的学术诚信。AI辅助写作使这两种方案雪上加霜:其产出论文的速度远超两种系统的处理能力。我们提出第三种方案:直接度量论文本身。sciwrite-lint(pip install sciwrite-lint)是一个面向科学手稿的开源代码检查工具,完全运行于研究者本地设备(依赖免费公共数据库、单张消费级GPU及开源权重模型),手稿数据不会传输至外部服务。该流程可验证参考文献的存在性、检查撤稿状态、比对元数据与权威记录的异同、下载并解析被引论文、验证引文是否支撑其声称的结论,并进一步追溯被引论文的参考文献列表。每条引用将获得一个基于所有验证信号聚合的可靠性评分。我们在arXiv与bioRxiv的30篇未公开论文上开展评估,采用错误注入与大语言模型仲裁的假阳性分析。作为实验性扩展,我们提出SciLint评分体系:将完整性验证与贡献评估相结合,将科学哲学领域的五种框架(波普尔、拉卡托斯、基切尔、劳丹、梅奥)转化为可计算的科学论证结构属性。其中完整性验证组件是本工具的核心功能,并在本文中完成评估;贡献评估组件作为实验性代码发布,供社区共同开发。