Prompt agents have recently emerged as a promising paradigm for automated prompt optimization, framing refinement as a sequential decision-making problem over a structured prompt space. While this formulation enables the use of advanced planning algorithms, these methods typically assume access to supervised reward signals, which are often unavailable in practical scenarios. In this work, we propose UPA, an Unsupervised Prompt Agent that realizes structured search and selection without relying on supervised feedback. Specifically, during search, UPA iteratively constructs an evolving tree structure to navigate the prompt space, guided by fine-grained and order-invariant pairwise comparisons from Large Language Models (LLMs). Crucially, as these local comparisons do not inherently yield a consistent global scale, we decouple systematic prompt exploration from final selection, introducing a two-stage framework grounded in the Bradley-Terry-Luce (BTL) model. This framework first performs path-wise Bayesian aggregation of local comparisons to filter candidates under uncertainty, followed by global tournament-style comparisons to infer latent prompt quality and identify the optimal prompt. Experiments across multiple tasks demonstrate that UPA consistently outperforms existing prompt optimization methods, showing that agent-style optimization remains highly effective even in fully unsupervised settings.
翻译:提示代理作为一种自动提示优化的新兴范式,将提示优化问题建模为结构化提示空间上的序列决策过程。尽管该框架能够利用先进的规划算法,现有方法通常依赖于监督奖励信号,而这在实际场景中往往难以获取。本文提出UPA,一种无需监督反馈即可实现结构化搜索与选择的无监督提示代理。具体而言,在搜索阶段,UPA基于大语言模型提供的细粒度且顺序无关的成对比较,通过迭代构建演化树结构来探索提示空间。由于此类局部比较本身不具备一致的全局尺度,我们基于Bradley-Terry-Luce模型提出两阶段框架,将系统性提示探索与最终选择解耦:第一阶段通过路径级贝叶斯聚合在不确定性下筛选候选提示,第二阶段通过全局锦标赛式比较推断潜在提示质量并确定最优提示。在多任务实验中的结果表明,UPA持续优于现有提示优化方法,证明即使在完全无监督场景下,代理式优化框架仍能保持高效性能。