Large language models~(LLMs) are instruction followers, but it can be challenging to find the best instruction for different situations, especially for black-box LLMs on which backpropagation is forbidden. Instead of directly optimizing the discrete instruction, we optimize a low-dimensional soft prompt applied to an open-source LLM to generate the instruction for the black-box LLM. On each iteration of the proposed method, which we call InstructZero, a soft prompt is converted into an instruction using the open-source LLM, which is then submitted to the black-box LLM for zero-shot evaluation, and the performance is sent to Bayesian optimization to produce new soft prompts improving the zero-shot performance. We evaluate InstructZero on different combinations of open-source LLMs and APIs including Vicuna and ChatGPT. Our results show that InstructZero outperforms SOTA auto-instruction methods across a variety of downstream tasks. Our code and data are publicly available at https://github.com/Lichang-Chen/InstructZero.
翻译:大语言模型是指令遵循者,但在不同场景下寻找最佳指令具有挑战性,尤其对于禁止反向传播的黑盒大语言模型。我们不直接优化离散指令,而是通过优化应用于开源大语言模型的低维软提示(soft prompt)来为黑盒大语言模型生成指令。在名为InstructZero的提议方法中,每次迭代将一个软提示转换为指令(基于开源大语言模型),该指令随后被提交至黑盒大语言模型进行零样本评估,其性能结果输入贝叶斯优化以生成新的软提示,从而持续提升零样本性能。我们在Vicuna和ChatGPT等不同开源大语言模型与API的组合上评估了InstructZero。结果表明,在多种下游任务中,InstructZero均优于现有最先进(SOTA)的自动指令方法。我们的代码与数据已公开于 https://github.com/Lichang-Chen/InstructZero。