Every scientific discovery starts with an idea inspired by prior work, interdisciplinary concepts, and emerging challenges. Recent advancements in large language models (LLMs) trained on scientific corpora have driven interest in AI-supported idea generation. However, generating context-aware, high-quality, and innovative ideas remains challenging. We introduce SCI-IDEA, a framework that uses LLM prompting strategies and Aha Moment detection for iterative idea refinement. SCI-IDEA extracts essential facets from research publications, assessing generated ideas on novelty, excitement, feasibility, and effectiveness. Comprehensive experiments validate SCI-IDEA's effectiveness, achieving average scores of 6.84, 6.86, 6.89, and 6.84 (on a 1-10 scale) across novelty, excitement, feasibility, and effectiveness, respectively. Evaluations employed GPT-4o, GPT-4.5, DeepSeek-32B (each under 2-shot prompting), and DeepSeek-70B (3-shot prompting), with token-level embeddings used for Aha Moment detection. Similarly, it achieves scores of 6.87, 6.86, 6.83, and 6.87 using GPT-4o under 5-shot prompting, GPT-4.5 under 3-shot prompting, DeepSeek-32B under zero-shot chain-of-thought prompting, and DeepSeek-70B under 5-shot prompting with sentence-level embeddings. We also address ethical considerations such as intellectual credit, potential misuse, and balancing human creativity with AI-driven ideation. Our results highlight SCI-IDEA's potential to facilitate the structured and flexible exploration of context-aware scientific ideas, supporting innovation while maintaining ethical standards.
翻译:每一项科学发现都始于一个想法,其灵感来源于先前的研究成果、跨学科概念以及新兴挑战。近年来,基于科学语料库训练的大语言模型(LLMs)取得了显著进展,推动了人们对人工智能辅助创意生成的兴趣。然而,生成具有上下文感知能力、高质量且富有创新性的想法仍然具有挑战性。本文提出SCI-IDEA框架,该框架利用LLM提示策略与“顿悟时刻”检测进行迭代式想法优化。SCI-IDEA从研究文献中提取关键要素,并从新颖性、启发性、可行性和有效性四个维度对生成的想法进行评估。全面的实验验证了SCI-IDEA的有效性:在新颖性、启发性、可行性和有效性上分别取得了6.84、6.86、6.89和6.84的平均分(评分范围为1-10)。评估使用了GPT-4o、GPT-4.5、DeepSeek-32B(均采用2-shot提示)和DeepSeek-70B(采用3-shot提示),并利用词元级嵌入进行“顿悟时刻”检测。同样地,在使用GPT-4o(5-shot提示)、GPT-4.5(3-shot提示)、DeepSeek-32B(零样本思维链提示)和DeepSeek-70B(5-shot提示)并结合句子级嵌入时,该框架在上述四个维度分别取得了6.87、6.86、6.83和6.87的分数。本文还探讨了知识产权归属、潜在滥用风险以及如何平衡人类创造力与AI驱动构思等伦理考量。我们的研究结果凸显了SCI-IDEA在促进结构化、灵活地探索上下文感知科学想法方面的潜力,能够在支持创新的同时维护伦理标准。