Annotating large datasets can be challenging. However, crowd-sourcing is often expensive and can lack quality, especially for non-trivial tasks. We propose a method of using LLMs as few-shot learners for annotating data in a complex natural language task where we learn a standalone model to predict usage options for products from customer reviews. We also propose a new evaluation metric for this scenario, HAMS4, that can be used to compare a set of strings with multiple reference sets. Learning a custom model offers individual control over energy efficiency and privacy measures compared to using the LLM directly for the sequence-to-sequence task. We compare this data annotation approach with other traditional methods and demonstrate how LLMs can enable considerable cost savings. We find that the quality of the resulting data exceeds the level attained by third-party vendor services and that GPT-4-generated labels even reach the level of domain experts. We make the code and generated labels publicly available.
翻译:大规模数据集的标注工作具有挑战性。然而,众包方法通常成本高昂且质量难以保证,尤其在处理复杂任务时更为明显。本文提出一种利用大型语言模型(LLM)作为少样本学习器进行复杂自然语言任务数据标注的方法:通过学习独立模型,根据客户评论预测产品的使用场景。针对该场景,我们提出新的评估指标HAMS4,可用于比较字符串集合与多个参考集之间的匹配度。相较于直接使用LLM处理序列到序列任务,训练定制模型能在能效控制与隐私保护方面提供独立调控能力。通过与传统标注方法的对比实验,我们验证了LLM标注方案可实现显著的成本节约。实验结果表明:该方法生成的数据质量超越第三方标注服务水准,其中GPT-4生成的标签甚至达到领域专家水平。相关代码与生成标签已公开。