The Rise of Large Language Models and the Direction and Impact of US Federal Research Funding

Federal research funding shapes the direction, diversity, and impact of the US scientific enterprise. Large language models (LLMs) are rapidly diffusing into scientific practice, holding substantial promise while raising widespread concerns. Despite growing attention to AI use in scientific writing and evaluation, little is known about how the rise of LLMs is reshaping the public funding landscape. Here, we examine LLM involvement at key stages of the federal funding pipeline by combining two complementary data sources: confidential National Science Foundation (NSF) and National Institutes of Health (NIH) proposal submissions from two large US R1 universities, including funded, unfunded, and pending proposals, and the full population of publicly released NSF and NIH awards. We find that LLM use rises sharply beginning in 2023 and exhibits a bimodal distribution, indicating a clear split between minimal and substantive use. Across both private submissions and public awards, higher LLM involvement is consistently associated with lower semantic distinctiveness, positioning projects closer to recently funded work within the same agency. The consequences of this shift are agency-dependent. LLM use is positively associated with proposal success and higher subsequent publication output at NIH, whereas no comparable associations are observed at NSF. Notably, the productivity gains at NIH are concentrated in non-hit papers rather than the most highly cited work. Together, these findings provide large-scale evidence that the rise of LLMs is reshaping how scientific ideas are positioned, selected, and translated into publicly funded research, with implications for portfolio governance, research diversity, and the long-run impact of science.

翻译：联邦研究基金塑造了美国科学事业的方向、多样性与影响力。大型语言模型正快速渗透至科研实践领域，在展现巨大潜力的同时引发了广泛担忧。尽管人工智能在科学写作与评估中的应用日益受到关注，但关于LLM如何重塑公共资助格局的研究尚不充分。本研究通过整合两个互补数据源，考察了LLM在联邦资助关键环节的参与情况：其一是来自美国两所大型R1大学的美国国家科学基金会与国家卫生研究院保密项目提案（涵盖已资助、未资助及待审提案），其二是NSF与NIH已公开的全部资助项目数据。研究发现，LLM使用率自2023年起急剧上升，并呈现双峰分布特征，表明轻微使用与实质性使用存在明显分野。在保密提案与公开资助项目中，较高的LLM参与度始终与较低的语义独特性相关，使得项目内容更接近同一机构近期资助的研究方向。这种转变的影响具有机构差异性：在NIH体系中，LLM使用与提案成功率及后续论文产出量呈正相关；而在NSF体系中未观察到类似关联。值得注意的是，NIH体系中的生产力提升主要集中于非热点论文而非高被引研究。综上，这些发现提供了大规模证据，表明LLM的兴起正在重塑科学思想的定位、筛选及向公共资助研究的转化过程，对科研组合治理、研究多样性及科学长期发展具有深远影响。

相关内容

大语言模型

关注 66

大语言模型是基于海量文本数据训练的深度学习模型。它不仅能够生成自然语言文本，还能够深入理解文本含义，处理各种自然语言任务，如文本摘要、问答、翻译等。2023年，大语言模型及其在人工智能领域的应用已成为全球科技研究的热点，其在规模上的增长尤为引人注目，参数量已从最初的十几亿跃升到如今的一万亿。参数量的提升使得模型能够更加精细地捕捉人类语言微妙之处，更加深入地理解人类语言的复杂性。在过去的一年里，大语言模型在吸纳新知识、分解复杂任务以及图文对齐等多方面都有显著提升。随着技术的不断成熟，它将不断拓展其应用范围，为人类提供更加智能化和个性化的服务，进一步改善人们的生活和生产方式。

评估大语言模型在科学发现中的作用

专知会员服务

19+阅读 · 2025年12月19日

大型语言模型的规模效应局限

专知会员服务

14+阅读 · 2025年11月18日

医学领域大型语言模型的新进展

专知会员服务

25+阅读 · 2025年10月5日

从自动化到自主性：大型语言模型在科学发现中的应用综述

专知会员服务

26+阅读 · 2025年5月20日