Training or finetuning large-scale language models (LLMs) such as GPT-3 requires substantial computation resources, motivating recent efforts to explore parameter-efficient adaptation to downstream tasks. One practical area of research is to treat these models as black boxes and interact with them through their inference APIs. In this paper, we investigate how to optimize few-shot text classification without accessing the gradients of the LLMs. To achieve this, we treat the black-box model as a feature extractor and train a classifier with the augmented text data. Data augmentation is performed using prompt-based finetuning on an auxiliary language model with a much smaller parameter size than the black-box model. Through extensive experiments on eight text classification datasets, we show that our approach, dubbed BT-Classifier, significantly outperforms state-of-the-art black-box few-shot learners and performs on par with methods that rely on full-model tuning.
翻译:训练或微调大规模语言模型(如GPT-3)需要大量计算资源,这促使近期研究探索面向下游任务的参数高效适配方法。一个实用的研究领域是将这些模型视为黑盒,通过其推理接口与之交互。本文研究了如何在无法访问语言模型梯度的情况下优化少样本文本分类。为此,我们将黑盒模型作为特征提取器,并利用增强后的文本数据训练分类器。数据增强通过基于提示的微调方法实现,该方法在参数量远小于黑盒模型的辅助语言模型上进行。通过在八个文本分类数据集上的广泛实验,我们提出的方法(称为BT-Classifier)显著优于当前最先进的的黑盒少样本学习器,其性能与依赖全模型微调的方法相当。