Literature-Based Discovery (LBD) aims to discover new scientific knowledge by mining papers and generating hypotheses. Standard LBD is limited to predicting pairwise relations between discrete concepts (e.g., drug-disease links), and ignores critical contexts like experimental settings (e.g., a specific patient population where a drug is evaluated) and background motivations (e.g., to find drugs without specific side effects). We address these limitations with a novel formulation of contextualized-LBD (C-LBD): generating scientific hypotheses in natural language, while grounding them in a context that controls the hypothesis search space. We present a modeling framework using retrieval of ``inspirations'' from past scientific papers. Our evaluations reveal that GPT-4 tends to generate ideas with overall low technical depth and novelty, while our inspiration prompting approaches partially mitigate this issue. Our work represents a first step toward building language models that generate new ideas derived from scientific literature.
翻译:文献发现旨在通过挖掘论文并生成假设来发现新的科学知识。标准文献发现局限于预测离散概念之间的成对关系(例如药物-疾病关联),并忽略了关键上下文,如实验设置(例如评估药物的特定患者群体)和背景动机(例如寻找无特定副作用的药物)。我们通过提出一种新的上下文文献发现公式来克服这些局限:用自然语言生成科学假设,同时将其置于控制假设搜索空间的上下文之中。我们提出一个建模框架,利用从过往科学论文中检索“灵感”的方法。评估显示,GPT-4倾向于生成整体技术深度和新颖性较低的思路,而我们的灵感提示方法在一定程度上缓解了这一问题。我们的工作代表了向构建能从科学文献中衍生新思路的语言模型迈出第一步。