Enabling large language models (LLMs) to perform tasks in zero-shot has been an appealing goal owing to its labor-saving (i.e., requiring no task-specific annotations); as such, zero-shot prompting approaches also enjoy better task generalizability. To improve LLMs' zero-shot performance, prior work has focused on devising more effective task instructions (e.g., ``let's think step by step'' ). However, we argue that, in order for an LLM to solve them correctly in zero-shot, individual test instances need more carefully designed and customized instructions. To this end, we propose PRoMPTd, an approach that rewrites the task prompt for each individual test input to be more specific, unambiguous, and complete, so as to provide better guidance to the task LLM. We evaluated PRoMPTd on eight datasets covering tasks including arithmetics, logical reasoning, and code generation, using GPT-4 as the task LLM. Notably, \algoname achieves an absolute improvement of around 10\% on the complex MATH dataset and 5\% on the code generation task on HumanEval, outperforming conventional zero-shot methods. In addition, we also showed that the rewritten prompt can provide better interpretability of how the LLM resolves each test instance, which can potentially be leveraged as a defense mechanism against adversarial prompting. The source code and dataset can be obtained from https://github.com/salokr/PRoMPTd
翻译:使大型语言模型(LLMs)在零样本条件下执行任务一直是一个颇具吸引力的目标,这得益于其节省人力(即无需任务特定的标注)的特性;因此,零样本提示方法也具备更好的任务泛化能力。为提升LLMs的零样本性能,先前的工作主要集中在设计更有效的任务指令(例如,“让我们逐步思考”)。然而,我们认为,为了让LLM在零样本情况下正确解决问题,各个测试实例需要更精心设计和定制的指令。为此,我们提出了PRoMPTd方法,该方法为每个测试输入重写任务提示词,使其更具体、明确和完整,从而为任务LLM提供更好的指导。我们使用GPT-4作为任务LLM,在涵盖算术、逻辑推理和代码生成等任务的八个数据集上评估了PRoMPTd。值得注意的是,该算法在复杂的MATH数据集上实现了约10%的绝对提升,在HumanEval的代码生成任务上实现了5%的提升,优于传统的零样本方法。此外,我们还展示了重写的提示词能更好地解释LLM如何解决每个测试实例,这有可能作为一种对抗性提示的防御机制。源代码和数据集可从https://github.com/salokr/PRoMPTd获取。