Recent advancements in large language models (LLMs) demonstrate strong potential for generating novel research ideas, yet such ideas often struggle with feasibility and effectiveness. In this paper, we investigate whether augmenting LLMs with relevant data during the ideation process can improve idea quality. Our framework integrates data at two stages: (1) incorporating metadata during idea generation to guide models toward more feasible concepts, and (2) introducing an automated preliminary validation step during idea selection to assess the empirical plausibility of hypotheses within ideas. We evaluate our approach in the social science domain, with a specific focus on climate negotiation topics. Expert evaluation shows that metadata improves the feasibility of generated ideas by 20%, while automated validation improves the overall quality of selected ideas by 7%. Beyond assessing the quality of LLM-generated ideas, we conduct a human study to examine whether these ideas, augmented with related data and preliminary validation, can inspire researchers in their own ideation. Participants report that the LLM-generated ideas and validation are highly useful, and the ideas they propose with such support are proven to be of higher quality than those proposed without assistance. Our findings highlight the potential of data-augmented research ideation and underscore the practical value of LLM-assisted ideation in real-world academic settings.
翻译:大型语言模型(LLM)的最新进展在生成新颖研究思路方面展现出巨大潜力,然而此类思路常面临可行性与有效性的挑战。本文探讨在构思过程中为LLM提供相关数据增强是否能提升构思质量。我们构建的框架在两个阶段整合数据:(1)在构思生成阶段融入元数据以引导模型形成更具可行性的概念;(2)在构思筛选阶段引入自动化初步验证步骤,以评估构思中假设的实证合理性。我们在社会科学领域(特别聚焦气候谈判主题)评估该方法。专家评估表明:元数据使生成构思的可行性提升20%,而自动化验证使筛选构思的整体质量提高7%。除评估LLM生成构思的质量外,我们还通过人工实验检验这些经过数据增强与初步验证的构思能否启发研究者自主构思。参与者反馈表明:LLM生成的构思及验证具有高度实用性,且在此支持下提出的构思质量显著高于无辅助状态下提出的构思。我们的研究结果揭示了数据增强型研究构思的潜力,并凸显了LLM辅助构思在真实学术场景中的实用价值。