In recent years, there has been a growing use of generative AI, and large language models (LLMs) in particular, to support both the assessment and generation of scientific work. Although some studies have shown that LLMs can, to a certain extent, evaluate research according to perceived quality, our understanding of the internal mechanisms that enable this capability remains limited. This paper presents the first study that investigates how LLMs encode the concept of scientific quality through relevant monosemantic features extracted using sparse autoencoders. We derive such features under different experimental settings and assess their ability to serve as predictors across three tasks related to research quality: predicting citation count, journal SJR, and journal h-index. The results indicate that LLMs encode features associated with multiple dimensions of scientific quality. In particular, we identify four recurring types of features that capture key aspects of how research quality is represented: 1) features reflecting research methodologies; 2) features related to publication type, with literature reviews typically exhibiting higher impact; 3) features associated with high-impact research fields and technologies; and 4) features corresponding to specific scientific jargons. These findings represent an important step toward understanding how LLMs encapsulate concepts related to research quality.
翻译:近年来,生成式人工智能,特别是大型语言模型(LLMs),在科学工作的评估与生成方面得到日益广泛的应用。尽管已有研究表明LLMs在一定程度上能够根据感知质量来评估研究,但我们对其实现此能力的内在机制仍知之甚少。本文首次研究了LLMs如何通过使用稀疏自编码器提取的相关单义性特征来编码科学质量的概念。我们在不同的实验设置下推导出此类特征,并评估它们在三个与科研质量相关的任务中作为预测因子的能力:预测引用次数、期刊SJR指数和期刊h指数。结果表明,LLMs编码了与科学质量多个维度相关的特征。特别地,我们识别出四种反复出现的特征类型,它们捕捉了研究质量如何被表征的关键方面:1)反映研究方法的特征;2)与出版物类型相关的特征,其中文献综述通常表现出更高的影响力;3)与高影响力研究领域和技术相关的特征;4)对应特定科学术语的特征。这些发现为理解LLMs如何封装与科研质量相关的概念迈出了重要一步。