Regret Bounds for Competitive Resource Allocation with Endogenous Costs

from arxiv, This is Paper 7 in a 9-paper series on Super-Alignment via Wuxing Institutional Architecture. The series explores resource competition and institutional design for human-aligned AI systems

We study online resource allocation among N interacting modules over T rounds. Unlike standard online optimization, costs are endogenous: they depend on the full allocation vector through an interaction matrix W encoding pairwise cooperation and competition. We analyze three paradigms: (I) uniform allocation (cost-ignorant), (II) gated allocation (cost-estimating), and (III) competitive allocation via multiplicative weights update with interaction feedback (cost-revealing). Our main results establish a strict separation under adversarial sequences with bounded variation: uniform incurs Omega(T) regret, gated achieves O(T^{2/3}), and competitive achieves O(sqrt(T log N)). The performance gap stems from competitive allocation's ability to exploit endogenous cost information revealed through interactions. We further show that W's topology governs a computation-regret tradeoff. Full interaction (|E|=O(N^2)) yields the tightest bound but highest per-step cost, while sparse topologies (|E|=O(N)) increase regret by at most O(sqrt(log N)) while reducing per-step cost from O(N^2) to O(N). Ring-structured topologies with both cooperative and competitive links - of which the five-element Wuxing topology is canonical - minimize the computation x regret product. These results provide the first formal regret-theoretic justification for decentralized competitive allocation in modular architectures and establish cost endogeneity as a fundamental challenge distinct from partial observability. Keywords: online learning, regret bounds, resource allocation, endogenous costs, interaction topology, multiplicative weights, modular systems, Wuxing topology

翻译：我们研究了N个交互模块在T轮在线资源分配中的问题。与标准的在线优化不同，成本是内生的：它们通过编码成对合作与竞争的交互矩阵W依赖于完整的分配向量。我们分析了三种范式：(I) 均匀分配（忽略成本）、(II) 门控分配（估计成本）、以及(III) 通过交互反馈的乘法权重更新进行竞争性分配（揭示成本）。我们的主要结果在具有有界变化的对抗性序列下建立了严格的区分：均匀分配产生了Ω(T)的遗憾，门控分配达到了O(T^{2/3})，而竞争性分配实现了O(√(T log N))。性能差距源于竞争性分配能够利用通过交互揭示的内生成本信息。我们进一步表明，W的拓扑结构决定了计算-遗憾权衡。完全交互（|E|=O(N^2)）产生了最紧的界但最高的每步成本，而稀疏拓扑（|E|=O(N)）将遗憾最多增加O(√(log N))，同时将每步成本从O(N^2)降低到O(N)。具有合作与竞争链接的环状拓扑——其中五元素五行拓扑是典范——最小化了计算与遗憾的乘积。这些结果首次为模块化架构中的去中心化竞争性分配提供了形式化的遗憾理论依据，并将成本内生性确立为一个不同于部分可观测性的基本挑战。关键词：在线学习，遗憾界，资源分配，内生成本，交互拓扑，乘法权重，模块化系统，五行拓扑