This study investigates the effectiveness of Large Language Models (LLMs) in processing semi-structured data from PDF documents into structured formats, specifically examining their application in updating the Finnish Sports Clubs Database. Through action research methodology, we developed and evaluated an AI-assisted approach utilizing OpenAI's GPT-4 and Anthropic's Claude 3 Opus models to process data from 72 sports federation membership reports. The system achieved a 90% success rate in automated processing, successfully handling 65 of 72 files without errors and converting over 7,900 rows of data. While the initial development time was comparable to traditional manual processing (three months), the implemented system shows potential for reducing future processing time by approximately 90%. Key challenges included handling multilingual content, processing multi-page datasets, and managing extraneous information. The findings suggest that while LLMs demonstrate significant potential for automating semi-structured data processing tasks, optimal results are achieved through a hybrid approach combining AI automation with selective human oversight. This research contributes to the growing body of literature on practical LLM applications in organizational data management and provides insights into the transformation of traditional data processing workflows.
翻译:本研究探讨了大型语言模型在处理PDF文档中半结构化数据并转化为结构化格式方面的有效性,特别关注其在更新芬兰体育俱乐部数据库中的应用。通过行动研究方法,我们开发并评估了一种利用OpenAI的GPT-4和Anthropic的Claude 3 Opus模型的人工智能辅助方法,用于处理来自72份体育联合会会员报告的数据。该系统在自动化处理中实现了90%的成功率,无错误地处理了72份文件中的65份,并转换了超过7,900行数据。尽管初始开发时间与传统人工处理相当(三个月),但已实施的系统显示出将未来处理时间减少约90%的潜力。主要挑战包括处理多语言内容、多页数据集以及管理无关信息。研究结果表明,虽然LLM在自动化半结构化数据处理任务方面展现出巨大潜力,但通过结合AI自动化和选择性人工监督的混合方法可获得最佳效果。本研究为不断增长的LLM在组织数据管理中的实际应用文献做出了贡献,并为传统数据处理工作流程的转型提供了见解。