Efficient and accurate information extraction from scientific papers is significant in the rapidly developing human-computer interaction research in the literature review process. Our paper introduces and analyses a new information retrieval system using state-of-the-art Large Language Models (LLMs) in combination with structured text analysis techniques to extract experimental data from HCI literature, emphasizing key elements. Then We analyze the challenges and risks of using LLMs in the world of research. We performed a comprehensive analysis on our conducted dataset, which contained the specified information of 300 CHI 2020-2022 papers, to evaluate the performance of the two large language models, GPT-3.5 (text-davinci-003) and Llama-2-70b, paired with structured text analysis techniques. The GPT-3.5 model gains an accuracy of 58\% and a mean absolute error of 7.00. In contrast, the Llama2 model indicates an accuracy of 56\% with a mean absolute error of 7.63. The ability to answer questions was also included in the system in order to work with streamlined data. By evaluating the risks and opportunities presented by LLMs, our work contributes to the ongoing dialogue on establishing methodological validity and ethical guidelines for LLM use in HCI data work.
翻译:在快速发展的 人机交互 研究文献综述过程中,高效、准确地从科学论文中提取信息至关重要。本文介绍并分析了一种结合先进大语言模型与结构化文本分析技术的新型信息检索系统,用于从人机交互文献中提取实验数据,并重点关注关键要素。随后,我们分析了在研究领域使用大语言模型面临的挑战与风险。我们对自行构建的数据集进行了全面分析,该数据集包含 300 篇 CHI 2020-2022 论文的指定信息,旨在评估两种大语言模型 GPT-3.5(text-davinci-003)和 Llama-2-70b 分别与结构化文本分析技术结合的性能表现。GPT-3.5 模型的准确率达到 58%,平均绝对误差为 7.00;而 Llama2 模型的准确率为 56%,平均绝对误差为 7.63。系统还集成了问答功能以处理精简数据。通过评估大语言模型带来的风险与机遇,我们的工作推动了关于在人机交互数据工作中建立大语言模型方法论有效性与伦理准则的持续探讨。