Active learning, a widely adopted technique for enhancing machine learning models in text and image classification tasks with limited annotation resources, has received relatively little attention in the domain of Named Entity Recognition (NER). The challenge of data imbalance in NER has hindered the effectiveness of active learning, as sequence labellers lack sufficient learning signals. To address these challenges, this paper presents a novel reweighting-based active learning strategy that assigns dynamic smoothed weights to individual tokens. This adaptable strategy is compatible with various token-level acquisition functions and contributes to the development of robust active learners. Experimental results on multiple corpora demonstrate the substantial performance improvement achieved by incorporating our re-weighting strategy into existing acquisition functions, validating its practical efficacy.
翻译:主动学习是一种广泛采用的技术,用于在文本和图像分类任务中利用有限的标注资源增强机器学习模型,但在命名实体识别(NER)领域受到的关注相对较少。NER中的数据不平衡问题阻碍了主动学习的有效性,因为序列标注器缺乏足够的学习信号。为解决这些挑战,本文提出了一种新颖的基于重新加权的主动学习策略,该策略为单个标记分配动态平滑权重。这种自适应策略与各种基于标记的获取函数兼容,并有助于开发鲁棒的主动学习器。在多个语料库上的实验结果表明,将我们的重新加权策略纳入现有获取函数后,性能显著提升,验证了其实用效果。