Scientific discovery is a cumulative process and requires new ideas to be situated within an ever-expanding landscape of existing knowledge. An emerging and critical challenge is how to identify conceptually relevant prior work from rapidly growing literature, and assess how a new idea differentiates from existing research. Current embedding approaches typically conflate distinct conceptual aspects into single representations and cannot support fine-grained literature retrieval; meanwhile, LLM-based evaluators are subject to sycophancy biases, failing to provide discriminative novelty assessment. To tackle these challenges, we introduce the Ideation Space, a structured representation that decomposes scientific knowledge into three distinct dimensions, i.e., research problem, methodology, and core findings, each learned through contrastive training. This framework enables principled measurement of conceptual distance between ideas, and modeling of ideation transitions that capture the logical connections within a proposed idea. Building upon this representation, we propose a Hierarchical Sub-Space Retrieval framework for efficient, targeted literature retrieval, and a Decomposed Novelty Assessment algorithm that identifies which aspects of an idea are novel. Extensive experiments demonstrate substantial improvements, where our approach achieves Recall@30 of 0.329 (16.7% over baselines), our ideation transition retrieval reaches Hit Rate@30 of 0.643, and novelty assessment attains 0.37 correlation with expert judgments. In summary, our work provides a promising paradigm for future research on accelerating and evaluating scientific discovery.
翻译:科学发现是一个累积过程,需要将新思想置于不断扩展的现有知识版图中。一个新兴且关键的挑战是如何从快速增长的文献中识别概念上相关的前期工作,并评估新思想如何与现有研究区分开来。当前的嵌入方法通常将不同的概念方面混同于单一表征中,无法支持细粒度的文献检索;同时,基于大语言模型的评估器易受迎合性偏见影响,无法提供有区分度的新颖性评估。为应对这些挑战,我们引入了“创意空间”——一种将科学知识分解为三个独立维度(即研究问题、方法论与核心发现)的结构化表征,每个维度均通过对比学习训练获得。该框架支持对思想间概念距离的原则性度量,并能建模捕捉所提思想内部逻辑关联的创意跃迁。基于此表征,我们提出了用于高效、定向文献检索的“分层子空间检索框架”,以及识别思想哪些方面具有新颖性的“分解式新颖性评估算法”。大量实验表明,我们的方法取得了显著改进:在Recall@30指标上达到0.329(较基线提升16.7%),创意跃迁检索的Hit Rate@30达到0.643,新颖性评估与专家判断的相关性达0.37。总之,我们的工作为未来加速和评估科学发现的研究提供了一个有前景的范式。