Sustainable Multi-Agent Crowdsourcing via Physics-Informed Bandits

Crowdsourcing platforms face a four-way tension between allocation quality, workforce sustainability, operational feasibility, and strategic contractor behaviour--a dilemma we formalise as the Cold-Start, Burnout, Utilisation, and Strategic Agency Dilemma. Existing methods resolve at most two of these tensions simultaneously: greedy heuristics and multi-criteria decision making (MCDM) methods achieve Day-1 quality but cause catastrophic burnout, while bandit algorithms eliminate burnout only through operationally infeasible 100% workforce utilisation.To address this, we introduce FORGE, a physics-grounded $K+1$ multi-agent simulator in which each contractor is a rational agent that declares its own load-acceptance threshold based on its fatigue state, converting the standard passive Restless Multi-Armed Bandit (RMAB) into a genuine Stackelberg game. Operating within FORGE, we propose a Neural-Linear UCB allocator that fuses a Two-Tower embedding network with a Physics-Informed Covariance Prior derived from offline simulator interactions. The prior simultaneously warm-starts skill-cluster geometry and UCB exploration landscape, providing a geometry-aware belief state from episode 1 that measurably reduces cold-start regret.Over $T = 200$ cold-start episodes, the proposed method achieves the highest reward of all non-oracle methods ($\text{LRew} = 0.555 \pm 0.041$) at only 7.6% workforce utilisation--a combination no conventional baseline achieves--while maintaining robustness to workforce turnover up to 50% and observation noise up to $σ= 0.20$.

翻译：众包平台面临分配质量、劳动力可持续性、操作可行性及承包商策略行为之间的四重矛盾——我们将其形式化为冷启动困境、倦怠困境、利用率困境与策略代理困境。现有方法最多只能同时解决其中两个矛盾：贪婪启发式算法与多准则决策方法虽能实现首日质量，却会导致灾难性倦怠；而赌博机算法虽能消除倦怠，却仅通过操作上不可行的100%劳动力利用率来实现。为解决此问题，我们提出FORGE——一个基于物理建模的$K+1$多智能体仿真环境，其中每位承包商均为理性智能体，根据其疲劳状态自主声明负载接受阈值，从而将标准被动式"不安定多臂赌博机"问题转化为真正的斯塔克尔伯格博弈。在FORGE框架内，我们提出一种神经线性上置信界分配器，该分配器融合了双塔嵌入网络与通过离线仿真交互推导的物理信息协方差先验。该先验同时预热技能聚类几何结构与上置信界探索空间，从第一轮任务开始即提供具备几何感知的信念状态，可度量地降低冷启动遗憾。在$T = 200$轮冷启动任务中，所提方法以仅7.6%的劳动力利用率实现了所有非先知方法中的最高奖励（$\text{LRew} = 0.555 \pm 0.041$）——这是任何传统基线方法均未实现的组合效果——同时保持对高达50%的劳动力流动率及高达$σ= 0.20$的观测噪声的鲁棒性。