When designing compound AI systems, a common approach is to query multiple copies of the same model and aggregate the responses to produce a synthesized output. Given the homogeneity of these models, this raises the question of whether aggregation unlocks access to a greater set of outputs than querying a single model. In this work, we investigate the power and limitations of aggregation within a stylized principal-agent framework. This framework models how the system designer can partially steer each agent's output through its reward function specification, but still faces limitations due to prompt engineering ability and model capabilities. Our analysis uncovers three natural mechanisms -- feasibility expansion, support expansion, and binding set contraction -- through which aggregation expands the set of outputs that are elicitable by the system designer. We prove that any aggregation operation must implement one of these mechanisms in order to be elicitability-expanding, and that strengthened versions of these mechanisms provide necessary and sufficient conditions that fully characterize elicitability-expansion. Finally, we provide an empirical illustration of our findings for LLMs deployed in a toy reference-generation task. Altogether, our results take a step towards characterizing when compound AI systems can overcome limitations in model capabilities and in prompt engineering.
翻译:在设计复合人工智能系统时,一种常见方法是查询同一模型的多个副本,并通过聚合其响应来生成综合输出。考虑到这些模型的同质性,这引发了一个问题:聚合是否比查询单一模型能够获取更广泛的输出集合。在本研究中,我们基于一个程式化的委托-代理框架,探讨了聚合的能力与局限。该框架模拟了系统设计者如何通过设定奖励函数来部分引导每个代理的输出,但仍受限于提示工程能力和模型本身的性能。我们的分析揭示了三种自然机制——可行性扩展、支持集扩展和约束集收缩——聚合通过这些机制扩展了系统设计者可诱导的输出集合。我们证明,任何聚合操作若要实现可诱导性扩展,必须实施上述机制之一;同时,这些机制的强化形式构成了完整刻画可诱导性扩展的充分必要条件。最后,我们通过一个玩具参考文献生成任务中部署的大语言模型,为研究结论提供了实证示例。总体而言,我们的研究结果朝着刻画复合人工智能系统何时能够克服模型能力与提示工程局限的方向迈出了一步。