With the rapid advancement and strong generalization capabilities of large language models (LLMs), they have been increasingly incorporated into the active learning pipelines as annotators to reduce annotation costs. However, considering the annotation quality, labels generated by LLMs often fall short of real-world applicability. To address this, we propose a novel active learning framework, Mixture of LLMs in the Loop Active Learning, replacing human annotators with labels generated through a Mixture-of-LLMs-based annotation model, aimed at enhancing LLM-based annotation robustness by aggregating the strengths of multiple LLMs. To further mitigate the impact of the noisy labels, we introduce annotation discrepancy and negative learning to identify the unreliable annotations and enhance learning effectiveness. Extensive experiments demonstrate that our framework achieves performance comparable to human annotation and consistently outperforms single-LLM baselines and other LLM-ensemble-based approaches. Moreover, our framework is built on lightweight LLMs, enabling it to operate fully on local machines in real-world applications.
翻译:随着大语言模型(LLMs)的快速发展和强大的泛化能力,它们越来越多地被整合到主动学习流程中作为标注器,以降低标注成本。然而,考虑到标注质量,由LLMs生成的标签往往难以满足实际应用的需求。为解决这一问题,我们提出了一种新颖的主动学习框架——循环中的混合大语言模型主动学习,该框架通过基于混合大语言模型的标注模型生成的标签来替代人工标注者,旨在通过聚合多个LLMs的优势来增强基于LLM的标注鲁棒性。为了进一步减轻噪声标签的影响,我们引入了标注差异和负学习,以识别不可靠的标注并提升学习效果。大量实验表明,我们的框架实现了与人工标注相媲美的性能,并且持续优于单LLM基线方法以及其他基于LLM集成的方法。此外,我们的框架基于轻量级LLMs构建,使其能够在实际应用中完全在本地机器上运行。