In many real-world settings, a centralized decision-maker must repeatedly allocate finite resources to a population over multiple time steps. Individuals who receive a resource derive some stochastic utility; to characterize the population-level effects of an allocation, the expected individual utilities are then aggregated using a social welfare function (SWF). We formalize this setting and present a general confidence sequence framework for SWF-based online learning and inference, valid for any monotonic, concave, and Lipschitz-continuous SWF. Our key insight is that monotonicity alone suffices to lift confidence sequences from individual utilities to anytime-valid bounds on optimal welfare. Building on this foundation, we propose SWF-UCB, a SWF-agnostic online learning algorithm that achieves near-optimal $\tilde{O}(n+\sqrt{nkT})$ regret (for $k$ resources distributed among $n$ individuals at each of $T$ time steps). We instantiate our framework on three normatively distinct SWF families: Weighted Power Mean, Kolm, and Gini, providing bespoke oracle algorithms for each. Experiments confirm $\sqrt{T}$ scaling and reveal rich interactions between $k$ and SWF parameters. This framework naturally supports inference applications such as sequential hypothesis testing, optimal stopping, and policy evaluation.
翻译:在许多现实场景中,中央决策者必须在多个时间步中反复将有限资源分配给一个群体。获得资源的个体将获得一定的随机效用;为刻画分配在群体层面的影响,通常使用社会福利函数对个体期望效用进行聚合。我们形式化了这一场景,并提出一个适用于社会福利函数的在线学习与推断的通用置信序列框架,该框架对任意单调、凹且Lipschitz连续的社会福利函数均成立。我们的核心洞见是:仅单调性这一条件就足以将置信序列从个体效用提升至对最优福利的任意时间有效边界。基于此基础,我们提出了SWF-UCB——一种与社会福利函数无关的在线学习算法,该算法实现了近乎最优的$\tilde{O}(n+\sqrt{nkT})$遗憾(在$T$个时间步中,每步将$k$种资源分配给$n$个个体)。我们在三个规范上不同的社会福利函数族上实例化了本框架:加权幂平均、科尔姆和基尼函数,并为每个函数族提供了定制化的预言机算法。实验证实了$\sqrt{T}$的标度规律,并揭示了$k$与社会福利函数参数之间丰富的相互作用关系。本框架天然支持序列假设检验、最优停止和策略评估等推断应用。