We propose a new method, Adversarial In-Context Learning (adv-ICL), to optimize prompt for in-context learning (ICL) by employing one LLM as a generator, another as a discriminator, and a third as a prompt modifier. As in traditional adversarial learning, adv-ICL is implemented as a two-player game between the generator and discriminator, where the generator tries to generate realistic enough output to fool the discriminator. In each round, given an input prefixed by task instructions and several exemplars, the generator produces an output. The discriminator is then tasked with classifying the generator input-output pair as model-generated or real data. Based on the discriminator loss, the prompt modifier proposes possible edits to the generator and discriminator prompts, and the edits that most improve the adversarial loss are selected. We show that adv-ICL results in significant improvements over state-of-the-art prompt optimization techniques for both open and closed-source models on 11 generation and classification tasks including summarization, arithmetic reasoning, machine translation, data-to-text generation, and the MMLU and big-bench hard benchmarks. In addition, because our method uses pre-trained models and updates only prompts rather than model parameters, it is computationally efficient, easy to extend to any LLM and task, and effective in low-resource settings.
翻译:我们提出了一种新方法——对抗式上下文学习(adv-ICL),用于优化上下文学习中的提示,该方法将一个大语言模型作为生成器,另一个作为判别器,第三个作为提示修改器。与传统对抗学习类似,adv-ICL 实现为生成器与判别器之间的双人博弈,其中生成器试图生成足够逼真的输出以欺骗判别器。在每一轮中,给定一个以任务指令和若干示例为前缀的输入,生成器产生输出。随后,判别器负责将生成器的输入-输出对分类为模型生成的数据或真实数据。基于判别器损失,提示修改器对生成器和判别器的提示提出可能的编辑,并选择能最大程度改善对抗损失的编辑。我们证明,在包括摘要生成、算术推理、机器翻译、数据到文本生成以及MMLU和big-bench hard基准测试在内的11个生成与分类任务中,adv-ICL 在开源和闭源模型上的表现均显著优于最先进的提示优化技术。此外,由于我们的方法使用预训练模型且仅更新提示而非模型参数,因此计算高效、易于扩展到任何大语言模型和任务,并在低资源环境下表现有效。