Prompt engineering plays a critical role in adapting large language models (LLMs) to complex reasoning and labeling tasks without the need for extensive fine-tuning. In this paper, we propose a novel prompt optimization pipeline for frame detection in logistics texts, combining retrieval-augmented generation (RAG), few-shot prompting, chain-of-thought (CoT) reasoning, and automatic CoT synthesis (Auto-CoT) to generate highly effective task-specific prompts. Central to our approach is an LLM-based prompt optimizer agent that iteratively refines the prompts using retrieved examples, performance feedback, and internal self-evaluation. Our framework is evaluated on a real-world logistics text annotation task, where reasoning accuracy and labeling efficiency are critical. Experimental results show that the optimized prompts - particularly those enhanced via Auto-CoT and RAG - improve real-world inference accuracy by up to 15% compared to baseline zero-shot or static prompts. The system demonstrates consistent improvements across multiple LLMs, including GPT-4o, Qwen 2.5 (72B), and LLaMA 3.1 (70B), validating its generalizability and practical value. These findings suggest that structured prompt optimization is a viable alternative to full fine-tuning, offering scalable solutions for deploying LLMs in domain-specific NLP applications such as logistics.
翻译:提示工程在使大型语言模型适应复杂推理与标注任务方面起着关键作用,而无需进行大量微调。本文针对物流文本中的框架检测任务,提出了一种新颖的提示优化流程,该流程结合了检索增强生成、少样本提示、思维链推理以及自动思维链合成,以生成高效的任务特定提示。我们方法的核心是一个基于LLM的提示优化智能体,它利用检索到的示例、性能反馈和内部自评估迭代地优化提示。我们的框架在一个真实世界的物流文本标注任务上进行了评估,其中推理准确性和标注效率至关重要。实验结果表明,与基线零样本或静态提示相比,优化后的提示——特别是通过自动思维链合成和检索增强生成增强的提示——将实际推理准确性提升了高达15%。该系统在包括GPT-4o、Qwen 2.5和LLaMA 3.1在内的多个LLM上均表现出持续改进,验证了其泛化能力和实用价值。这些发现表明,结构化的提示优化是替代全面微调的一种可行方案,为在物流等特定领域的自然语言处理应用中部署LLM提供了可扩展的解决方案。