Organizations increasingly deploy multiple AI systems across task domains, but selecting a small, high-performing ensemble can require costly model calls, benchmark runs, and human evaluation. We study this selection problem as a distributional variant of multiwinner voting: tasks are drawn from an unknown domain distribution, each task induces feedback over candidate experts, and a committee's value on a task is determined by its best-performing member. We analyze both binary feedback, for tasks with correct/incorrect outcomes, and pairwise feedback, for tasks where candidate outputs are compared by preference. In the binary setting, the induced objective is coverage. We give exhaustive-elicitation baselines and matching worst-case query lower bounds, and we design a failure-conditioned greedy algorithm that preserves the standard $(1-1/e)$ guarantee while obtaining instance-dependent query savings. In the pairwise setting, we study $θ$-winning committees. We show that full-information optimization admits a PTAS but no EPTAS under Gap-ETH, and that the objective is monotone but not submodular. This motivates a weighted ordinal coverage relaxation, which is submodular and supports a failure-conditioned greedy oracle under pairwise feedback. We then convert this oracle back into $θ$-type guarantees through finite-family auditing or a minimax wrapper. We also provide small-scale LLM experiments illustrating the predicted query savings and the role of complementarity in committee selection.
翻译:各类组织日益在多个任务领域部署多个人工智能系统,但选择一个小型且性能优异的集成系统可能需要昂贵的模型调用、基准测试运行和人工评估。我们将此选择问题视为多赢家投票的一种分布变体:任务来自未知领域分布,每个任务引发对候选专家的反馈,且委员会在某一任务上的价值由其表现最佳的成员决定。我们分析针对正确/错误结果的二元反馈,以及通过偏好比较候选输出的成对反馈。在二元设置中,诱导目标函数为覆盖率。我们给出了穷举型启发基线并证明了最坏情况下的查询下界,同时设计了一种基于失败条件的贪心算法,该算法在保持标准$(1-1/e)$保证的同时,实现了依赖于实例的查询节省。在成对设置中,我们研究$θ$-获胜委员会。我们证明,在全信息优化下存在PTAS,但在Gap-ETH假设下不存在EPTAS,且目标函数是单调但非子模的。这促使我们提出一种加权序数覆盖率松弛方法,该松弛具有子模性,并支持在成对反馈下基于失败条件的贪心Oracle。随后,我们通过有限族审计或极小极大包装器将此Oracle转换回$θ$型保证。我们还提供了小规模LLM实验,以说明预测的查询节省及互补性在委员会选择中的作用。