We propose a new method, Adversarial In-Context Learning (adv-ICL), to optimize prompt for in-context learning (ICL) by employing one LLM as a generator, another as a discriminator, and a third as a prompt modifier. As in traditional adversarial learning, adv-ICL is implemented as a two-player game between the generator and discriminator, where the generator tries to generate realistic enough output to fool the discriminator. In each round, given an input prefixed by task instructions and several exemplars, the generator produces an output. The discriminator is then tasked with classifying the generator input-output pair as model-generated or real data. Based on the discriminator loss, the prompt modifier proposes possible edits to the generator and discriminator prompts, and the edits that most improve the adversarial loss are selected. We show that adv-ICL results in significant improvements over state-of-the-art prompt optimization techniques for both open and closed-source models on 11 generation and classification tasks including summarization, arithmetic reasoning, machine translation, data-to-text generation, and the MMLU and big-bench hard benchmarks. In addition, because our method uses pre-trained models and updates only prompts rather than model parameters, it is computationally efficient, easy to extend to any LLM and task, and effective in low-resource settings.
翻译:我们提出了一种新方法——对抗性上下文学习(adv-ICL),通过将一个大型语言模型作为生成器、另一个作为判别器、第三个作为提示修改器,来优化上下文学习(ICL)的提示。与传统对抗学习类似,adv-ICL在生成器和判别器之间实现了一个双人博弈过程:生成器尝试生成足够逼真的输出以欺骗判别器。在每一轮中,给定输入前缀(包含任务指令和若干示例),生成器产生输出;随后,判别器被要求对该生成器的输入-输出对进行分类,判断其属于模型生成数据还是真实数据。基于判别器损失,提示修改器对生成器和判别器的提示提出可能的修改方案,并选择最能改善对抗损失的修改。实验表明,在11项生成与分类任务上(包括摘要生成、算术推理、机器翻译、数据到文本生成、以及MMLU和big-bench hard基准测试),adv-ICL相较于现有最优提示优化技术,在开放和封闭源模型上均实现了显著提升。此外,由于本方法使用预训练模型并仅更新提示而非模型参数,因此具有计算高效、易于扩展到任何大语言模型和任务、以及在低资源环境下有效的特点。