Large language models (LLMs) achieve strong performance across a wide range of tasks but are highly sensitive to prompt design, motivating the need for automatic prompt optimization. Existing methods predominantly focus on performance alone, ignoring competing objectives such as inference cost or latency. At the same time, existing work on multi-objective prompt optimization relies on off-the-shelf NSGA-II, ignoring optimization efficiency. As a remedy, we introduce MO-CAPO, a novel multi-objective prompt optimization algorithm that jointly optimizes performance and inference cost while leveraging budget allocation for cost-efficient optimization. We further propose a deployment-oriented cost objective that captures the full computational profile of LLM inference. We evaluate our approach across four tasks and three LLMs and compare it to an NSGA-II-based multi-objective method and state-of-the-art single-objective prompt optimizers. Results show that MO-CAPO consistently identifies strong, robust, and diverse Pareto front approximations while maintaining cost-efficiency. It outperforms the NSGA-II baseline on 8 out of 12 cases in terms of the noisy R2 metric and achieves competitive performances often already at a considerably lower budget. The discovered solution sets span diverse performance-cost trade-offs that are omitted by single-objective optimizers, yet the top-performance candidates remain competitive with single-objective solutions. Additionally, we conduct the first evaluation of multi-objective machine learning experiments that considers generalization and robustness through noisy R2 and approximation gap, enabling a more realistic assessment of solution quality. MO-CAPO enables practitioners to select from an efficiently discovered set of multiple prompts offering different trade-offs between performance and cost.
翻译:大型语言模型(LLMs)在广泛任务中展现出强大性能,但对提示设计高度敏感,因此亟需自动提示优化方法。现有方法主要聚焦于性能单一目标,忽略了推理成本或延迟等竞争性目标。同时,现有针对多目标提示优化的研究依赖现成的NSGA-II算法,忽略了优化效率。为此,我们提出MO-CAPO——一种新型多目标提示优化算法,该算法联合优化性能与推理成本,同时利用预算分配实现成本高效优化。我们进一步提出一种面向部署的成本目标函数,该函数能刻画LLM推理过程的完整计算特征。我们在四个任务和三种LLM上评估了该方法,并与基于NSGA-II的多目标方法及当前最优的单目标提示优化器进行了对比。结果表明,MO-CAPO在保持成本效率的同时,能持续识别出强健、鲁棒且多样化的帕累托前沿近似解。相较于NSGA-II基线,在含噪R2指标上,MO-CAPO在12个测试案例中的8个中表现更优,且往往能在显著更低的预算下达到具有竞争力的性能。所发现的解集涵盖了单目标优化器忽略的多样性能-成本权衡关系,但其最优性能候选解仍可与单目标结果相媲美。此外,我们首次开展了考虑泛化性与鲁棒性的多目标机器学习实验评估(通过含噪R2和近似间隙),从而对解的质量进行更现实的评估。MO-CAPO使从业者能够从一套高效发现的、提供不同性能-成本权衡关系的多个提示中进行选择。