Enabling large language models (LLMs) to perform tasks in zero-shot has been an appealing goal owing to its labor-saving (i.e., requiring no task-specific annotations); as such, zero-shot prompting approaches also enjoy better task generalizability. To improve LLMs' zero-shot performance, prior work has focused on devising more effective task instructions (e.g., ``let's think step by step'' ). However, we argue that, in order for an LLM to solve them correctly in zero-shot, individual test instances need more carefully designed and customized instructions. To this end, we propose PRoMPTd, an approach that rewrites the task prompt for each individual test input to be more specific, unambiguous, and complete, so as to provide better guidance to the task LLM. We evaluated PRoMPTd on eight datasets covering tasks including arithmetics, logical reasoning, and code generation, using GPT-4 as the task LLM. Notably, PRoMPTd achieves an absolute improvement of around 10% on the complex MATH dataset and 5% on the code generation task on HumanEval, outperforming conventional zero-shot methods. In addition, we also showed that the rewritten prompt can provide better interpretability of how the LLM resolves each test instance, which can potentially be leveraged as a defense mechanism against adversarial prompting. The source code and dataset can be obtained from https://github.com/salokr/PRoMPTd
翻译:使大型语言模型在零样本场景下执行任务一直是一个颇具吸引力的目标,因其节省人力(即无需任务特定的标注),同时零样本提示方法也享有更好的任务泛化能力。为提升大型语言模型的零样本性能,先前研究聚焦于设计更有效的任务指令(例如,"让我们一步一步思考")。然而,我们认为,要让大型语言模型在零样本条件下正确解决问题,单个测试实例需要更精心设计和定制的指令。为此,我们提出了PRoMPTd方法,该方法为每个单独测试输入重写任务提示,使其更具体、无歧义且完整,从而为任务大型语言模型提供更好的指导。我们使用GPT-4作为任务大型语言模型,在涵盖算术、逻辑推理和代码生成等任务的八个数据集上评估了PRoMPTd。值得注意的是,PRoMPTd在复杂的MATH数据集上实现了约10%的绝对提升,在HumanEval代码生成任务上实现了5%的提升,均优于传统零样本方法。此外,我们还展示了重写后的提示可以更好地解释大型语言模型如何解决每个测试实例,这有望被用作对抗性提示的防御机制。源代码和数据集可从https://github.com/salokr/PRoMPTd获取。