Purpose: Despite the importance of peer review for grant funding decisions, academics are often reluctant to conduct it. This can lead to long delays between submission and the final decision as well as the risk of substandard reviews from busy or non-specialist scholars. At least one funder now uses Large Language Models (LLMs) to reduce the reviewing burden but the accuracy of LLMs for scoring grant proposals needs to be assessed. Design/methodology/approach: This article compares scores from a range of medium sized open weights LLMs with peer review scores for a well-researched dataset, the Swedish Medical Council's post-doctoral fellowship applications from 1994. Findings: Whilst the LLM scores correlate moderately between each other (mean Spearman correlation: 0.34), they correlated weakly but positively and mostly statistically significantly with the average expert scores (mean Spearman correlation: 0.22). The highest rank correlation between expert scores and LLMs was 0.33 for Gemma 3 27b based on proposal titles and summaries without their main texts, which is about half (56%) of the correlation between reviewers. Research limitations: The small sample size, old funding call and heterogeneous evaluation criteria all undermine the robustness of the analysis. Practical implications: Despite the ability of LLMs to score grant proposals being quantitatively weaker than that of experts, at least in this special case, they may have role in application triage or tie-breaking. Originality/value: This is the first assessment of the value of LLM scores for funding proposals.
翻译:目的:尽管同行评审对项目资助决策至关重要,但学者们往往不愿承担此项工作。这可能导致从提交到最终决定的漫长延迟,以及忙碌或非专业学者给出不合格评审的风险。目前已有至少一家资助机构使用大型语言模型(LLMs)来减轻评审负担,但需评估LLMs为项目申请书打分的准确性。设计/方法/路径:本文对比了一系列中等参数规模、权重公开的LLMs与同行评审分数,所用数据集为研究界广泛使用的1994年瑞典医学委员会博士后研究金申请。研究发现:尽管各LLM评分之间存在中等相关性(平均Spearman相关系数:0.34),但其与专家平均分呈弱正相关且多数具有统计显著性(平均Spearman相关系数:0.22)。其中,基于申请标题与摘要(不含正文)的Gemma 3 27b模型与专家评分的秩相关系数最高(0.33),约为评审间相关系数的56%。研究局限:样本量较小、资助项目年代久远及评价标准不一均削弱了分析的稳健性。实践意义:尽管LLMs对项目申请书的评分能力在量化表现上弱于专家(至少在本特殊案例中如此),但其可能在申请筛选或打破平局中发挥作用。原创性/价值:本研究首次评估了LLMs评分对项目资助申请的实际效用。