Deploying multiple large language models (LLMs) in parallel to classify an unknown ground-truth label is a common practice, yet the problem of optimally allocating queries across heterogeneous models remains poorly understood. In this paper, we formulate a robust, offline query-planning problem that minimizes total query cost subject to statewise error constraints which guarantee reliability for every possible ground-truth label. We first establish that this problem is NP-hard via a reduction from the minimum-weight set cover problem. To overcome this intractability, we develop a surrogate by combining a union bound decomposition of the multi-class error into pairwise comparisons with Chernoff-type concentration bounds. The resulting surrogate admits a closed-form, multiplicatively separable expression in the query counts and is guaranteed to be feasibility-preserving. We further show that the surrogate is asymptotically tight at the optimization level: the ratio of surrogate-optimal cost to true optimal cost converges to one as error tolerances shrink, with an explicit rate of $O\left(\log\log(1/α_{\min}) / \log(1/α_{\min})\right)$. Finally, we design an asymptotic fully polynomial-time approximation scheme (AFPTAS) that returns a surrogate-feasible query plan within a $(1+\varepsilon)$ factor of the surrogate optimum.
翻译:并行部署多个大型语言模型以分类未知的真实标签是一种常见做法,但如何在异构模型间优化分配查询的问题仍未得到充分理解。本文提出一个鲁棒的离线查询规划问题,该问题在状态级误差约束下最小化总查询成本,从而保证每个可能真实标签的可靠性。我们首先通过从最小权重集合覆盖问题的归约,证明该问题是NP难的。为克服这一计算困难性,我们通过将多类误差的联合界分解为成对比较并结合车尔诺夫型浓度界,构建了一个替代问题。由此得到的替代问题在查询次数上具有闭式可乘分离表达式,并能保证可行性保持。进一步地,我们证明该替代问题在优化层面渐近紧致:当误差容限缩小时,替代最优成本与真实最优成本之比趋近于1,收敛速率为$O\left(\log\log(1/α_{\min}) / \log(1/α_{\min})\right)$。最后,我们设计了一个渐近完全多项式时间近似方案,该方案能在替代最优的$(1+\varepsilon)$因子范围内返回满足替代可行性的查询计划。