The exponential growth of knowledge and the increasing complexity of interdisciplinary research pose significant challenges for researchers, including information overload and difficulties in exploring novel ideas. The advancements in large language models (LLMs), such as GPT-4, have shown great potential in enhancing idea proposals, but how to effectively utilize large models for reasonable idea proposal has not been thoroughly explored. This paper proposes a scientific paper idea proposer (SciPIP). Based on a user-provided research background, SciPIP retrieves helpful papers from a literature database while leveraging the capabilities of LLMs to generate more novel and feasible ideas. To this end, 1) we construct a literature retrieval database, extracting lots of papers' multi-dimension information for fast access. Then, a literature retrieval method based on semantics, entity, and citation co-occurrences is proposed to search relevant literature from multiple aspects based on the user-provided background. 2) After literature retrieval, we introduce dual-path idea proposal strategies, where one path infers solutions from the retrieved literature and the other path generates original ideas through model brainstorming. We then combine the two to achieve a good balance between feasibility and originality. Through extensive experiments on the natural language processing (NLP) field, we demonstrate that SciPIP can retrieve citations similar to those of existing top conference papers and generate many ideas consistent with them. Additionally, we evaluate the originality of other ideas generated by SciPIP using large language models, further validating the effectiveness of our proposed method. The code and the database are released at https://github.com/cheerss/SciPIP.
翻译:知识的指数级增长与跨学科研究的日益复杂化,为研究者带来了信息过载与探索新思路困难等重大挑战。以GPT-4为代表的大语言模型(LLMs)在提升创意生成方面展现出巨大潜力,但如何有效利用大模型进行合理的创意生成尚未得到深入探索。本文提出一种科研论文创意生成器(SciPIP)。基于用户提供的研究背景,SciPIP从文献数据库中检索相关论文,同时利用大语言模型的能力生成更具新颖性和可行性的创意。为此,1)我们构建了一个文献检索数据库,提取大量论文的多维度信息以实现快速访问;进而提出一种基于语义、实体与引文共现的文献检索方法,从多角度检索与用户提供背景相关的文献。2)在文献检索后,我们引入双路径创意生成策略:一条路径从检索文献中推导解决方案,另一条路径通过模型头脑风暴生成原创性创意,随后将两者结合以实现可行性与原创性之间的良好平衡。通过在自然语言处理(NLP)领域的大量实验,我们证明SciPIP能够检索出与现有顶级会议论文相似的参考文献,并生成大量与之相符的创意。此外,我们使用大语言模型对SciPIP生成的其他创意的原创性进行了评估,进一步验证了所提方法的有效性。代码与数据库已发布于 https://github.com/cheerss/SciPIP。