Large language models~(LLMs) are instruction followers, but it can be challenging to find the best instruction for different situations, especially for black-box LLMs on which backpropagation is forbidden. Instead of directly optimizing the discrete instruction, we optimize a low-dimensional soft prompt applied to an open-source LLM to generate the instruction for the black-box LLM. On each iteration of the proposed method, which we call InstructZero, a soft prompt is converted into an instruction using the open-source LLM, which is then submitted to the black-box LLM for zero-shot evaluation, and the performance is sent to Bayesian optimization to produce new soft prompts improving the zero-shot performance. We evaluate InstructZero on different combinations of open-source LLMs and APIs including Vicuna and ChatGPT. Our results show that InstructZero outperforms SOTA auto-instruction methods across a variety of downstream tasks. Our code and data are publicly available at https://github.com/Lichang-Chen/InstructZero.
翻译:大语言模型(LLMs)能够遵循指令,但在不同情境下寻找最优指令颇具挑战,尤其对于禁止反向传播的黑盒大语言模型。我们并未直接优化离散指令,而是优化一个应用于开源大语言模型的低维软提示,以生成面向黑盒大语言模型的指令。在名为InstructZero的提出方法中,每次迭代时,软提示通过开源大语言模型转化为指令,随后提交给黑盒大语言模型进行零样本评估,其性能结果被输入至贝叶斯优化过程,以生成能提升零样本性能的新软提示。我们在包括Vicuna和ChatGPT在内的多种开源大语言模型与API组合上评估了InstructZero。结果表明,在众多下游任务中,InstructZero的性能超越了当前最优的自动指令生成方法。我们的代码与数据已开源至https://github.com/Lichang-Chen/InstructZero。