CrossTune: Black-Box Few-Shot Classification with Label Enhancement

Training or finetuning large-scale language models (LLMs) requires substantial computation resources, motivating recent efforts to explore parameter-efficient adaptation to downstream tasks. One approach is to treat these models as black boxes and use forward passes (Inference APIs) to interact with them. Current research focuses on adapting these black-box models to downstream tasks using gradient-free prompt optimization, but this often involves an expensive process of searching task-specific prompts. Therefore, we are motivated to study black-box language model adaptation without prompt search. Specifically, we introduce a label-enhanced cross-attention network called CrossTune, which models the semantic relatedness between the input text sequence and task-specific label descriptions. Its effectiveness is examined in the context of few-shot text classification. To improve the generalization of CrossTune, we utilize ChatGPT to generate additional training data through in-context learning. A switch mechanism is implemented to exclude low-quality ChatGPT-generated data. Through extensive experiments on seven benchmark text classification datasets, we demonstrate that our proposed approach outperforms the previous state-of-the-art gradient-free black-box tuning method by 5.7% on average. Even without using ChatGPT-augmented data, CrossTune performs better or comparably than previous black-box tuning methods, suggesting the effectiveness of our approach.

翻译：大规模语言模型的训练或微调需要大量计算资源，这促使近年来的研究探索面向下游任务的参数高效适配方法。一种方法是将这些模型视为黑盒，通过前向传播（推理API）与其交互。当前研究聚焦于使用无梯度提示优化来适配这些黑盒模型至下游任务，但这一过程通常涉及针对特定任务进行代价高昂的提示搜索。因此，我们致力于研究无需提示搜索的黑盒语言模型适配方法。具体而言，我们提出了一种名为CrossTune的标签增强交叉注意力网络，该网络能够建模输入文本序列与任务特定标签描述之间的语义关联性，并在少样本文本分类任务中验证其有效性。为提升CrossTune的泛化性能，我们利用ChatGPT通过上下文学习生成额外训练数据，并设计切换机制以筛除低质量ChatGPT生成数据。通过在七个基准文本分类数据集上的大量实验，我们证明所提方法较先前最先进的无梯度黑盒调优方法平均提升5.7%。即使未使用ChatGPT增强数据，CrossTune的性能仍优于或持平于以往黑盒调优方法，这充分验证了我们方法的有效性。

相关内容

黑盒

关注 1

在科学，计算和工程学中，黑盒是一种设备，系统或对象，可以根据其输入和输出（或传输特性）对其进行查看，而无需对其内部工作有任何了解。它的实现是“不透明的”（黑色）。几乎任何事物都可以被称为黑盒：晶体管，引擎，算法，人脑，机构或政府。为了使用典型的“黑匣子方法”来分析建模为开放系统的事物，仅考虑刺激/响应的行为，以推断（未知）盒子。该黑匣子系统的通常表示形式是在该方框中居中的数据流程图。黑盒的对立面是一个内部组件或逻辑可用于检查的系统，通常将其称为白盒（有时也称为“透明盒”或“玻璃盒”）。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日