The rapid advancements in large language models (LLMs) have demonstrated their potential to accelerate scientific discovery, particularly in automating the process of research ideation. LLM-based systems have shown promise in generating hypotheses and research ideas. However, current approaches predominantly rely on prompting-based pre-trained models, limiting their ability to optimize generated content effectively. Moreover, they also lack the capability to deal with the complex interdependence and inherent restrictions among novelty, feasibility, and effectiveness, which remains challenging due to the inherent trade-offs among these dimensions, such as the innovation-feasibility conflict. To address these limitations, we for the first time propose fine-tuning LLMs to be better idea proposers and introduce a novel framework that employs a two-stage approach combining Supervised Fine-Tuning (SFT) and controllable Reinforcement Learning (RL). In the SFT stage, the model learns foundational patterns from pairs of research papers and follow-up ideas. In the RL stage, multi-dimensional reward modeling, guided by fine-grained feedback, evaluates and optimizes the generated ideas across key metrics. Dimensional controllers enable dynamic adjustment of generation, while a sentence-level decoder ensures context-aware emphasis during inference. Our framework provides a balanced approach to research ideation, achieving high-quality outcomes by dynamically navigating the trade-offs among novelty, feasibility, and effectiveness.
翻译:大型语言模型(LLM)的快速发展已展现出其在加速科学发现方面的潜力,特别是在自动化研究构思过程中。基于LLM的系统在生成假设和研究思路上已显示出前景。然而,当前方法主要依赖于基于提示的预训练模型,限制了其有效优化生成内容的能力。此外,它们也缺乏处理新颖性、可行性与有效性之间复杂相互依赖性和内在约束的能力,由于这些维度之间固有的权衡(如创新-可行性冲突),这仍然具有挑战性。为应对这些局限,我们首次提出通过微调LLM以成为更好的思路提出者,并引入一种新颖框架,该框架采用结合监督微调(SFT)与可控强化学习(RL)的两阶段方法。在SFT阶段,模型从研究论文与后续思路的配对中学习基础模式。在RL阶段,基于细粒度反馈指导的多维奖励建模,评估并优化生成思路在关键指标上的表现。维度控制器支持生成的动态调整,而句子级解码器则在推理过程中确保上下文感知的强调。我们的框架为研究构思提供了一种平衡方法,通过动态权衡新颖性、可行性与有效性之间的关系,实现了高质量的结果。