Exploiting All Samples in Low-Resource Sentence Classification: Early Stopping and Initialization Parameters

To improve deep-learning performance in low-resource settings, many researchers have redesigned model architectures or applied additional data (e.g., external resources, unlabeled samples). However, there have been relatively few discussions on how to make good use of small amounts of labeled samples, although it is potentially beneficial and should be done before applying additional data or redesigning models. In this study, we assume a low-resource setting in which only a few labeled samples (i.e., 30-100 per class) are available, and we discuss how to exploit them without additional data or model redesigns. We explore possible approaches in the following three aspects: training-validation splitting, early stopping, and weight initialization. Extensive experiments are conducted on six public sentence classification datasets. Performance on various evaluation metrics (e.g., accuracy, loss, and calibration error) significantly varied depending on the approaches that were combined in the three aspects. Based on the results, we propose an integrated method, which is to initialize the model with a weight averaging method and use a non-validation stop method to train all samples. This simple integrated method consistently outperforms the competitive methods; e.g., the average accuracy of six datasets of this method was 1.8% higher than those of conventional validation-based methods. In addition, the integrated method further improves the performance when adapted to several state-of-the-art models that use additional data or redesign the network architecture (e.g., self-training and enhanced structural models). Our results highlight the importance of the training strategy and suggest that the integrated method can be the first step in the low-resource setting. This study provides empirical knowledge that will be helpful when dealing with low-resource data in future efforts.

翻译：为提升深度学习在低资源场景下的性能，许多研究者重新设计了模型架构或引入了额外数据（如外部资源、未标注样本）。然而，关于如何有效利用少量标注样本的讨论相对较少，尽管这具有潜在收益且应在引入额外数据或重构模型前优先考虑。本研究假设在仅能获取少量标注样本（即每类30-100个）的低资源场景下，探讨如何在不依赖额外数据或模型重构的前提下充分利用这些样本。我们从以下三个维度探索可行方案：训练-验证集划分、早停策略及权重初始化。我们在六个公开句子分类数据集上进行了大量实验。实验结果表明，不同评估指标（如准确率、损失值及校准误差）的表现会因三个维度所采用策略的组合而产生显著差异。基于实验结果，我们提出一种集成方法：采用权重平均法初始化模型，并使用无验证集停止法训练所有样本。这一简洁的集成方法在多个数据集上持续优于竞争方法；例如，该方法在六个数据集上的平均准确率较传统基于验证集的方法高出1.8%。此外，当将该方法适配于若干采用额外数据或重构网络架构的先进模型（如自训练与增强结构模型）时，能进一步提升性能。我们的研究结果凸显了训练策略的重要性，并表明该集成方法可作为低资源场景下的首要步骤。本研究提供的实证知识将为未来处理低资源数据提供有益参考。