We study the \emph{Submodular Welfare Problem} (SWP), where items are partitioned among agents with monotone submodular utilities to maximize the total welfare under \emph{bandit feedback}. Classical SWP assumes full value-oracle access, achieving $(1-1/e)$ approximations via continuous-greedy algorithms. We extend this to a \emph{multi-agent combinatorial bandit} framework (\textsc{MA-CMAB}), where actions are partitions under full-bandit feedback with non-communicating agents. Unlike prior single-agent or separable multi-agent CMAB models, our setting couples agents through shared allocation constraints. We propose an explore-then-commit strategy with randomized assignments, achieving $\tilde{\mathcal{O}}(T^{2/3})$ regret against a $(1-1/e)$ benchmark, the first such guarantee for partition-based submodular welfare problem under bandit feedback.
翻译:本研究探讨了《子模福利问题》(SWP),其中物品在具有单调子模效用函数的智能体之间进行分配,以在《老虎机反馈》下最大化总福利。经典的SWP假设完全价值预言机访问,通过连续贪心算法可实现$(1-1/e)$的近似比。我们将此问题扩展至《多智能体组合老虎机》框架(\textsc{MA-CMAB}),其中动作是在完全老虎机反馈下由非通信智能体执行的分配方案。与先前单智能体或可分离多智能体CMAB模型不同,我们的设定通过共享分配约束耦合了智能体。我们提出了一种采用随机分配的探索-利用策略,针对$(1-1/e)$基准实现了$\tilde{\mathcal{O}}(T^{2/3})$的遗憾界,这是基于分配的子模福利问题在老虎机反馈下的首个此类理论保证。