Extraction and synthesis of structured knowledge from extensive scientific literature are crucial for advancing and disseminating scientific progress. Although many existing systems facilitate literature review and digest, they struggle to process multimodal, varied, and inconsistent information within and across the literature into structured data. We introduce SciDaSynth, a novel interactive system powered by large language models (LLMs) that enables researchers to efficiently build structured knowledge bases from scientific literature at scale. The system automatically creates data tables to organize and summarize users' interested knowledge in literature via question-answering. Furthermore, it provides multi-level and multi-faceted exploration of the generated data tables, facilitating iterative validation, correction, and refinement. Our within-subjects study with researchers demonstrates the effectiveness and efficiency of SciDaSynth in constructing quality scientific knowledge bases. We further discuss the design implications for human-AI interaction tools for data extraction and structuring.
翻译:从海量科学文献中提取并综合结构化知识,对于推动和传播科学进步至关重要。尽管现有诸多系统能辅助文献综述与精读,但在处理文献内及跨文献的多模态、多变且不一致的信息,并将其转化为结构化数据方面仍存在困难。我们提出SciDaSynth——一种新型交互式系统,它借助大语言模型的强大能力,使研究人员能够高效地从科学文献中规模化构建结构化知识库。该系统通过问答方式自动创建数据表格,以整理和总结用户感兴趣的文献知识。此外,它还支持对生成的数据表格进行多层次、多角度的探索,便于迭代验证、修正与优化。我们开展的受试者内实验表明,SciDaSynth在构建高质量科学知识库方面具有有效性和高效性。我们进一步探讨了用于数据提取与结构化的交互工具在人机交互方面的设计启示。