Named-entity recognition (NER) is a task that typically requires large annotated datasets, which limits its applicability across domains with varying entity definitions. This paper addresses few-shot NER, aiming to transfer knowledge to new domains with minimal supervision. Unlike previous approaches that rely solely on limited annotated data, we propose a weakly supervised algorithm that combines small labeled datasets with large amounts of unlabeled data. Our method extends the k-means algorithm with label supervision, cluster size constraints and domain-specific discriminative subspace selection. This unified framework achieves state-of-the-art results in few-shot NER on several English datasets.
翻译:命名实体识别(NER)通常需要大规模标注数据集,这限制了其在不同实体定义领域间的适用性。本文针对少样本NER问题,旨在以最小监督将知识迁移至新领域。与先前仅依赖有限标注数据的方法不同,我们提出一种弱监督算法,将少量标注数据与大量未标注数据相结合。该方法通过标签监督、聚类规模约束和领域特定判别子空间选择对k均值算法进行扩展。这一统一框架在多个英文数据集上实现了少样本NER的最先进性能。