Data collection from manual labeling provides domain-specific and task-aligned supervision for data-driven approaches, and a critical mass of well-annotated resources is required to achieve reasonable performance in natural language processing tasks. However, manual annotations are often challenging to scale up in terms of time and budget, especially when domain knowledge, capturing subtle semantic features, and reasoning steps are needed. In this paper, we investigate the efficacy of leveraging large language models on automated labeling for computational stance detection. We empirically observe that while large language models show strong potential as an alternative to human annotators, their sensitivity to task-specific instructions and their intrinsic biases pose intriguing yet unique challenges in machine annotation. We introduce a multi-label and multi-target sampling strategy to optimize the annotation quality. Experimental results on the benchmark stance detection corpora show that our method can significantly improve performance and learning efficacy.
翻译:从人工标注中收集数据为数据驱动方法提供了领域特定和任务对齐的监督信号,而要实现自然语言处理任务中合理的性能表现,需要大量高质量标注资源的积累。然而,当需要领域知识、捕捉细微语义特征及推理步骤时,人工标注在时间和预算方面往往难以规模化扩展。本文探究了利用大语言模型进行自动化标注在计算立场检测中的有效性。实验观察表明,虽然大语言模型作为人工标注的替代方案展现出巨大潜力,但其对任务特定指令的敏感性以及内在偏差给机器标注带来了独特且有趣的挑战。我们提出了一种多标签多目标采样策略来优化标注质量,在基准立场检测语料库上的实验结果表明,该方法能显著提升模型性能与学习效率。