The rise of Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) has rapidly increased the need for high-quality, curated information retrieval datasets. These datasets, however, are currently created with off-the-shelf annotation tools that make the annotation process complex and inefficient. To streamline this process, we developed a specialized annotation tool - AIANO. By adopting an AI-augmented annotation workflow that tightly integrates human expertise with LLM assistance, AIANO enables annotators to leverage AI suggestions while retaining full control over annotation decisions. In a within-subject user study ($n = 15$), participants created question-answering datasets using both a baseline tool and AIANO. AIANO nearly doubled annotation speed compared to the baseline while being easier to use and improving retrieval accuracy. These results demonstrate that AIANO's AI-augmented approach accelerates and enhances dataset creation for information retrieval tasks, advancing annotation capabilities in retrieval-intensive domains.
翻译:随着大语言模型(LLM)和检索增强生成(RAG)的兴起,对高质量、精编信息检索数据集的需求迅速增长。然而,当前这些数据集的创建依赖于现成的标注工具,使得标注过程复杂且低效。为优化此流程,我们开发了一款专用标注工具——AIANO。该工具采用人工智能增强的标注工作流,将人类专业知识与大语言模型辅助紧密结合,使标注者能够在充分利用AI建议的同时,完全掌控标注决策。在一项被试内用户研究($n = 15$)中,参与者分别使用基线工具和AIANO创建问答数据集。结果显示,与基线工具相比,AIANO在提升易用性和检索准确率的同时,将标注速度提高了近一倍。这些结果表明,AIANO的人工智能增强方法能够加速并优化信息检索任务的数据集创建过程,从而推动检索密集型领域的标注能力发展。