Use Your INSTINCT: INSTruction optimization for LLMs usIng Neural bandits Coupled with Transformers

Large language models (LLMs) have shown remarkable instruction-following capabilities and achieved impressive performances in various applications. However, the performances of LLMs depend heavily on the instructions given to them, which are typically manually tuned with substantial human efforts. Recent work has used the query-efficient Bayesian optimization (BO) algorithm to automatically optimize the instructions given to black-box LLMs. However, BO usually falls short when optimizing highly sophisticated (e.g., high-dimensional) objective functions, such as the functions mapping an instruction to the performance of an LLM. This is mainly due to the limited expressive power of the Gaussian process (GP) which is used by BO as a surrogate to model the objective function. Meanwhile, it has been repeatedly shown that neural networks (NNs), especially pre-trained transformers, possess strong expressive power and can model highly complex functions. So, we adopt a neural bandit algorithm which replaces the GP in BO by an NN surrogate to optimize instructions for black-box LLMs. More importantly, the neural bandit algorithm allows us to naturally couple the NN surrogate with the hidden representation learned by a pre-trained transformer (i.e., an open-source LLM), which significantly boosts its performance. These motivate us to propose our INSTruction optimization usIng Neural bandits Coupled with Transformers (INSTINCT) algorithm. We perform instruction optimization for ChatGPT and use extensive experiments to show that INSTINCT consistently outperforms baselines in different tasks, e.g., various instruction induction tasks and the task of improving zero-shot chain-of-thought instructions. Our code is available at https://github.com/xqlin98/INSTINCT.

翻译：大语言模型（LLMs）已展现出卓越的指令遵循能力，并在多种应用中取得了令人瞩目的性能表现。然而，大语言模型的性能在很大程度上依赖于提供给它们的指令，这些指令通常需要耗费大量人力进行手动调整。近期研究采用查询高效的贝叶斯优化（BO）算法来自动优化提供给黑盒大语言模型的指令。然而，当优化高度复杂（例如高维）的目标函数时，例如将指令映射到大语言模型性能的函数，贝叶斯优化通常表现不佳。这主要归因于高斯过程（GP）的表达能力有限，而贝叶斯优化使用高斯过程作为目标函数的代理模型。同时，大量研究表明神经网络（NNs），特别是预训练的Transformer，具有很强的表达能力，能够对高度复杂的函数进行建模。因此，我们采用一种神经赌博机算法，该算法用神经网络代理替代贝叶斯优化中的高斯过程，以优化黑盒大语言模型的指令。更重要的是，该神经赌博机算法使我们能够自然地将神经网络代理与预训练Transformer（即一个开源大语言模型）学习到的隐藏表示相结合，从而显著提升其性能。这些因素促使我们提出了结合Transformer的神经赌博机指令优化（INSTINCT）算法。我们针对ChatGPT进行了指令优化，并通过大量实验表明，INSTINCT在不同任务中（例如各种指令归纳任务和改进零样本思维链指令的任务）持续优于基线方法。我们的代码可在 https://github.com/xqlin98/INSTINCT 获取。