Transformer-based language models have achieved significant success in various domains. However, the data-intensive nature of the transformer architecture requires much labeled data, which is challenging in low-resource scenarios (i.e., few-shot learning (FSL)). The main challenge of FSL is the difficulty of training robust models on small amounts of samples, which frequently leads to overfitting. Here we present Mask-BERT, a simple and modular framework to help BERT-based architectures tackle FSL. The proposed approach fundamentally differs from existing FSL strategies such as prompt tuning and meta-learning. The core idea is to selectively apply masks on text inputs and filter out irrelevant information, which guides the model to focus on discriminative tokens that influence prediction results. In addition, to make the text representations from different categories more separable and the text representations from the same category more compact, we introduce a contrastive learning loss function. Experimental results on public-domain benchmark datasets demonstrate the effectiveness of Mask-BERT.
翻译:基于Transformer的语言模型在多个领域取得了显著成功。然而,Transformer架构对数据的需求较高,需要大量标注数据,这在低资源场景(即少样本学习,FSL)中面临挑战。FSL的主要难点在于难以基于少量样本训练鲁棒模型,这经常导致过拟合。本文提出Mask-BERT,一个简单且模块化的框架,帮助基于BERT的架构应对FSL。该方法与提示调优和元学习等现有FSL策略存在本质不同。其核心思想是选择性地对文本输入应用掩码,过滤无关信息,引导模型关注影响预测结果的判别性标记。此外,为增强不同类别文本表示的可分离性并压缩同类文本表示的紧凑性,我们引入对比学习损失函数。在公共基准数据集上的实验结果表明了Mask-BERT的有效性。