The rapid growth of scientific literature has made manual extraction of structured knowledge increasingly impractical. To address this challenge, we introduce SCILIRE, a system for creating datasets from scientific literature. SCILIRE has been designed around Human-AI teaming principles centred on workflows for verifying and curating data. It facilitates an iterative workflow in which researchers can review and correct AI outputs. Furthermore, this interaction is used as a feedback signal to improve future LLM-based inference. We evaluate our design using a combination of intrinsic benchmarking outcomes together with real-world case studies across multiple domains. The results demonstrate that SCILIRE improves extraction fidelity and facilitates efficient dataset creation.
翻译:科学文献的快速增长使得人工提取结构化知识日益困难。为应对这一挑战,我们提出了SCILIRE系统,该系统可从科学文献中创建数据集。SCILIRE围绕以数据验证与策管流程为核心的人机协同原则设计,支持研究人员通过迭代工作流审查并修正人工智能输出。此外,这种交互可作为反馈信号以优化未来基于大语言模型的推理。我们通过内在基准测试结果与多领域实际案例研究相结合的方式评估系统设计。结果表明,SCILIRE能有效提升信息提取的保真度,并促进高效的数据集构建。