Machine learning research has grown exponentially while its communication norms have not. We argue NeurIPS should adopt explicit, measurable writing standards. We analyze 2.8 million arXiv papers (1991-2025), 24,772 NeurIPS papers (1987-2024), and 24.5 million PubMed papers (1990-2025), applying classical readability scores, the Hohmann writing style suite (including sensational language), acronym density and reuse, an LLM as judge readability protocol, and citations from OpenAlex and Semantic Scholar. Four patterns emerge. First, NeurIPS abstracts score harder to read on every classical readability metric: Flesch Reading Ease falls from about 24 in 1987 to 13 in 2024, and sensational language rises by about 50 percent in NeurIPS abstracts between 2015 and 2024. Second, acronym density in NeurIPS titles has grown from 0.33 per 100 words in 1987 to 3.21 in 2024, and about 89 percent of NeurIPS acronyms are used fewer than ten times, ten points above the science-wide baseline. Third, more readable NeurIPS papers tend to receive more citations, suggesting readability and impact are correlated and that less readable papers risk remaining fragmented. LLM as judge scores rate NeurIPS abstracts as roughly stable from 1987 to 2022, with early signs of improvement thereafter, a pattern that disagrees with every classical readability metric and raises a design question for enforcement: is the target reader a human or an LLM? Lastly, NeurIPS volume has grown roughly 50-fold between 1987 and 2024. Assuming the goal is to optimise for human readers, we propose seven standards NeurIPS could pilot at NeurIPS 2027: an acronym budget with a venue-approved term list, a human readability threshold, stricter citation standards, standalone visual elements, a plain language summary, a pre-registered acronym glossary, and open source audit tooling.
翻译:机器学习研究呈指数级增长,而其交流规范却未能同步。我们认为NeurIPS应当采纳明确、可衡量的写作标准。我们分析了280万篇arXiv论文(1991-2025年)、24772篇NeurIPS论文(1987-2024年)以及2450万篇PubMed论文(1990-2025年),应用了经典可读性评分、Hohmann写作风格套件(包括煽情语言)、首字母缩略词密度与重用率、以LLM作为评判者的可读性协议,以及来自OpenAlex和Semantic Scholar的引用数据。研究浮现出四种模式。第一,在每项经典可读性指标上,NeurIPS摘要的得分更难读:Flesch阅读易读性从1987年的约24分降至2024年的13分,而NeurIPS摘要中煽情语言的比例在2015年至2024年间上升约50%。第二,NeurIPS标题中的首字母缩略词密度已从1987年的每100词0.33个增长至2024年的3.21个,且约89%的NeurIPS缩略词使用次数少于十次,比科学领域整体基线高出十个百分点。第三,可读性更高的NeurIPS论文往往获得更多引用,表明可读性与影响力相关,而可读性较低的论文则面临碎片化风险。以LLM作为评判者的评分显示,NeurIPS摘要从1987年至2022年大致稳定,此后有早期改善迹象,这一模式与每项经典可读性指标均不一致,并引发了一个执行层面的设计问题:目标读者是人类还是LLM?最后,NeurIPS的论文量在1987年至2024年间增长了约50倍。假设目标是为人类读者优化,我们提出七项NeurIPS可在2027年试点推行的标准:附有平台批准术语列表的缩略词预算、人类可读性阈值、更严格的引用标准、独立可用的视觉元素、简明语言摘要、预注册的缩略词词汇表,以及开源审计工具。