提示炼金术：通过自动提示优化增强代码生成 (Prompt Alchemy: Automatic Prompt Refinement for Enhancing Code Generation)

Code generation has emerged as a key task to automate software development by converting high-level descriptions into executable code. Large language models (LLMs) excel at this but depend heavily on input prompt quality.Manual prompt engineering can be time-consuming and inconsistent, limiting LLM effectiveness. This paper introduces Prochemy, an innovative method for automatically refining prompts to boost code generation. Prochemy overcomes manual prompt limitations by automating optimization, ensuring consistency during inference, and supporting multi-agent systems.It iteratively refines prompts based on model performance, using an optimized final prompt for improved consistency across tasks. We tested Prochemy on natural language-based code generation and translation tasks using three LLM series. Results indicate Prochemy enhances existing methods, improving performance by 5.0% for GPT-3.5-Turbo and 1.9% for GPT-4o over zero-shot baselines on HumanEval. In state-of-the-art LDB, Prochemy + LDB surpasses standalone methods by 1.2-1.8%. For code translation, Prochemy boosts GPT-4o's Java-to-Python (AVATAR) performance from 74.5 to 84.1 (+12.9%) and Python-to-Java from 66.8 to 78.2 (+17.1%). Moreover, Prochemy maintains strong performance when integrated with the o1-mini model, validating its efficacy in code tasks. Designed as plug-and-play, Prochemy optimizes prompts with minimal human input, bridging the gap between simple prompts and complex frameworks.

翻译：代码生成已成为通过将高级描述转换为可执行代码来自动化软件开发的关键任务。大型语言模型（LLMs）在此方面表现出色，但其性能高度依赖于输入提示的质量。手动提示工程可能耗时且不一致，从而限制了LLM的有效性。本文介绍了Prochemy，一种通过自动优化提示来提升代码生成的创新方法。Prochemy通过自动化优化、确保推理过程中的一致性以及支持多智能体系统，克服了手动提示的局限性。该方法基于模型性能迭代优化提示，并使用优化后的最终提示以提高跨任务的一致性。我们在基于自然语言的代码生成和翻译任务上使用三个LLM系列对Prochemy进行了测试。结果表明，Prochemy增强了现有方法，在HumanEval基准测试中，相较于零样本基线，GPT-3.5-Turbo的性能提升了5.0%，GPT-4o提升了1.9%。在目前最先进的LDB方法中，Prochemy + LDB的组合优于独立方法1.2-1.8%。在代码翻译任务中，Prochemy将GPT-4o在Java到Python（AVATAR）的性能从74.5提升至84.1（+12.9%），Python到Java从66.8提升至78.2（+17.1%）。此外，当与o1-mini模型集成时，Prochemy仍保持强劲性能，验证了其在代码任务中的有效性。Prochemy设计为即插即用，能以最少的人工输入优化提示，弥合了简单提示与复杂框架之间的差距。