Large Language Models (LLMs) have demonstrated remarkable abilities across various tasks, leveraging advanced reasoning. Yet, they struggle with task-oriented prompts due to a lack of specific prior knowledge of the task answers. The current state-of-the-art approach, PAL, utilizes code generation to address this issue. However, PAL depends on manually crafted prompt templates and examples while still producing inaccurate results. In this work, we present TITAN-a novel strategy designed to enhance LLMs' performance on task-oriented prompts. TITAN achieves this by generating scripts using a universal approach and zero-shot learning. Unlike existing methods, TITAN eliminates the need for detailed task-specific instructions and extensive manual efforts. TITAN enhances LLMs' performance on various tasks by utilizing their analytical and code-generation capabilities in a streamlined process. TITAN employs two key techniques: (1) step-back prompting to extract the task's input specifications and (2) chain-of-thought prompting to identify required procedural steps. This information is used to improve the LLMs' code-generation process. TITAN further refines the generated script through post-processing and the script is executed to retrieve the final answer. Our comprehensive evaluation demonstrates TITAN's effectiveness in a diverse set of tasks. On average, TITAN outperforms the state-of-the-art zero-shot approach by 7.6% and 3.9% when paired with GPT-3.5 and GPT-4. Overall, without human annotation, TITAN achieves state-of-the-art performance in 8 out of 11 cases while only marginally losing to few-shot approaches (which needed human intervention) on three occasions by small margins. This work represents a significant advancement in addressing task-oriented prompts, offering a novel solution for effectively utilizing LLMs in everyday life tasks.
翻译:大型语言模型(LLM)凭借其先进的推理能力,已在多种任务中展现出卓越性能。然而,由于缺乏对任务答案的特定先验知识,它们在处理面向任务的提示时仍面临困难。当前最先进的方法PAL通过代码生成来解决这一问题,但PAL依赖于人工构建的提示模板和示例,且仍会产生不准确的结果。本文提出TITAN——一种旨在提升LLM在面向任务提示上性能的新策略。TITAN通过通用方法和零样本学习生成脚本实现这一目标。与现有方法不同,TITAN无需详细的任务特定说明和大量人工干预。TITAN通过精简流程,利用LLM的分析和代码生成能力来提升其在各类任务中的表现。TITAN采用两项关键技术:(1)通过“回退提示”提取任务的输入规范;(2)通过“思维链提示”识别所需的流程步骤。这些信息用于改进LLM的代码生成过程。TITAN进一步通过后处理优化生成的脚本,并执行脚本以获取最终答案。我们的综合评估表明,TITAN在多种任务中均表现优异。平均而言,当与GPT-3.5和GPT-4配合使用时,TITAN分别优于当前最先进的零样本方法7.6%和3.9%。总体而言,在无需人工标注的情况下,TITAN在11个案例中的8个达到了最先进性能,仅在三个案例中以微小差距略逊于需要人工干预的少样本方法。这项工作代表了处理面向任务提示的重要进展,为在日常生活中有效利用LLM提供了一种新颖的解决方案。