AdaSent: Efficient Domain-Adapted Sentence Embeddings for Few-Shot Classification

Recent work has found that few-shot sentence classification based on pre-trained Sentence Encoders (SEs) is efficient, robust, and effective. In this work, we investigate strategies for domain-specialization in the context of few-shot sentence classification with SEs. We first establish that unsupervised Domain-Adaptive Pre-Training (DAPT) of a base Pre-trained Language Model (PLM) (i.e., not an SE) substantially improves the accuracy of few-shot sentence classification by up to 8.4 points. However, applying DAPT on SEs, on the one hand, disrupts the effects of their (general-domain) Sentence Embedding Pre-Training (SEPT). On the other hand, applying general-domain SEPT on top of a domain-adapted base PLM (i.e., after DAPT) is effective but inefficient, since the computationally expensive SEPT needs to be executed on top of a DAPT-ed PLM of each domain. As a solution, we propose AdaSent, which decouples SEPT from DAPT by training a SEPT adapter on the base PLM. The adapter can be inserted into DAPT-ed PLMs from any domain. We demonstrate AdaSent's effectiveness in extensive experiments on 17 different few-shot sentence classification datasets. AdaSent matches or surpasses the performance of full SEPT on DAPT-ed PLM, while substantially reducing the training costs. The code for AdaSent is available.

翻译：近期研究发现，基于预训练句子编码器（SE）的少样本句子分类具有高效、鲁棒且有效的特性。本研究针对使用句子编码器的少样本句子分类场景，探索领域专业化策略。我们首先证实，对基础预训练语言模型（PLM，非句子编码器）进行无监督领域自适应预训练（DAPT），可将少样本句子分类准确率显著提升高达8.4个百分点。然而，对句子编码器直接应用DAPT存在两方面问题：一方面会破坏其（通用领域）句子嵌入预训练（SEPT）的效果；另一方面，在领域自适应后的基础PLM上进行通用领域SEPT虽然有效，但效率低下，因为计算昂贵的SEPT需在每一领域的DAPT后PLM上执行。为此，我们提出AdaSent方法，通过在基础PLM上训练SEPT适配器，实现SEPT与DAPT的解耦。该适配器可嵌入任意领域的DAPT后PLM。我们在17个不同少样本句子分类数据集上的广泛实验验证了AdaSent的有效性。该方法在匹配甚至超越DAPT后PLM中完整SEPT性能的同时，大幅降低了训练成本。AdaSent的代码已开源。

相关内容

小样本学习

关注 216

小样本学习（Few-Shot Learning，以下简称 FSL ）用于解决当可用的数据量比较少时，如何提升神经网络的性能。在 FSL 中，经常用到的一类方法被称为 Meta-learning。和普通的神经网络的训练方法一样，Meta-learning 也包含训练过程和测试过程，但是它的训练过程被称作 Meta-training 和 Meta-testing。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日