Hypothesis formulation and testing are central to empirical research. A strong hypothesis is a best guess based on existing evidence and informed by a comprehensive view of relevant literature. However, with exponential increase in the number of scientific articles published annually, manual aggregation and synthesis of evidence related to a given hypothesis is a challenge. Our work explores the ability of current large language models (LLMs) to discern evidence in support or refute of specific hypotheses based on the text of scientific abstracts. We share a novel dataset for the task of scientific hypothesis evidencing using community-driven annotations of studies in the social sciences. We compare the performance of LLMs to several state-of-the-art benchmarks and highlight opportunities for future research in this area. The dataset is available at https://github.com/Sai90000/ScientificHypothesisEvidencing.git
翻译:假设的提出与验证是实证研究的核心。一个强有力的假设是基于现有证据并在全面审视相关文献后得出的最佳推测。然而,随着每年发表科学论文数量的指数级增长,手动汇总和综合与特定假设相关的证据成为一项挑战。本文探究了当前大型语言模型(LLMs)能否根据科学摘要的文本辨别支持或反驳特定假设的证据。我们利用社会科学研究中社区驱动的标注数据,构建了一个用于科学假设证据识别任务的新型数据集。我们将LLMs的性能与多个最先进的基准进行了比较,并指出了该领域未来研究的机会。该数据集可通过 https://github.com/Sai90000/ScientificHypothesisEvidencing.git 获取。