Recent work has found that few-shot sentence classification based on pre-trained Sentence Encoders (SEs) is efficient, robust, and effective. In this work, we investigate strategies for domain-specialization in the context of few-shot sentence classification with SEs. We first establish that unsupervised Domain-Adaptive Pre-Training (DAPT) of a base Pre-trained Language Model (PLM) (i.e., not an SE) substantially improves the accuracy of few-shot sentence classification by up to 8.4 points. However, applying DAPT on SEs, on the one hand, disrupts the effects of their (general-domain) Sentence Embedding Pre-Training (SEPT). On the other hand, applying general-domain SEPT on top of a domain-adapted base PLM (i.e., after DAPT) is effective but inefficient, since the computationally expensive SEPT needs to be executed on top of a DAPT-ed PLM of each domain. As a solution, we propose AdaSent, which decouples SEPT from DAPT by training a SEPT adapter on the base PLM. The adapter can be inserted into DAPT-ed PLMs from any domain. We demonstrate AdaSent's effectiveness in extensive experiments on 17 different few-shot sentence classification datasets. AdaSent matches or surpasses the performance of full SEPT on DAPT-ed PLM, while substantially reducing the training costs. The code for AdaSent is available.
翻译:近期研究发现,基于预训练句子编码器(SE)的少样本句子分类具有高效、鲁棒且有效的特性。本研究针对使用句子编码器的少样本句子分类场景,探索领域专业化策略。我们首先证实,对基础预训练语言模型(PLM,非句子编码器)进行无监督领域自适应预训练(DAPT),可将少样本句子分类准确率显著提升高达8.4个百分点。然而,对句子编码器直接应用DAPT存在两方面问题:一方面会破坏其(通用领域)句子嵌入预训练(SEPT)的效果;另一方面,在领域自适应后的基础PLM上进行通用领域SEPT虽然有效,但效率低下,因为计算昂贵的SEPT需在每一领域的DAPT后PLM上执行。为此,我们提出AdaSent方法,通过在基础PLM上训练SEPT适配器,实现SEPT与DAPT的解耦。该适配器可嵌入任意领域的DAPT后PLM。我们在17个不同少样本句子分类数据集上的广泛实验验证了AdaSent的有效性。该方法在匹配甚至超越DAPT后PLM中完整SEPT性能的同时,大幅降低了训练成本。AdaSent的代码已开源。