Nash regret has recently emerged as a principled fairness-aware performance metric for stochastic multi-armed bandits, motivated by the Nash Social Welfare objective. Although this notion has been extended to linear bandits, existing results suffer from suboptimality in ambient dimension $d$, stemming from proof techniques that rely on restrictive concentration inequalities. In this work, we resolve this open problem by introducing new analytical tools that yield an order-optimal Nash regret bound in linear bandits. Beyond Nash regret, we initiate the study of $p$-means regret in linear bandits, a unifying framework that interpolates between fairness and utility objectives and strictly generalizes Nash regret. We propose a generic algorithmic framework, FairLinBandit, that works as a meta-algorithm on top of any linear bandit strategy. We instantiate this framework using two bandit algorithms: Phased Elimination and Upper Confidence Bound, and prove that both achieve sublinear $p$-means regret for the entire range of $p$. Extensive experiments on linear bandit instances generated from real-world datasets demonstrate that our methods consistently outperform the existing state-of-the-art baseline.
翻译:纳什遗憾最近作为一种原则性的公平感知性能指标,在随机多臂赌博机领域兴起,其动机源于纳什社会福利目标。尽管这一概念已被推广至线性赌博机,但现有结果在环境维度$d$上存在次优性,这源于依赖限制性集中不等式的证明技术。在本工作中,我们通过引入新的分析工具解决了这一开放性问题,得到了线性赌博机中阶次最优的纳什遗憾界。除纳什遗憾外,我们首次研究了线性赌博机中的$p$-均值遗憾,这是一个在公平性与效用目标之间插值并严格推广纳什遗憾的统一框架。我们提出了一个通用算法框架FairLinBandit,该框架可作为元算法运行于任意线性赌博机策略之上。我们使用两种赌博机算法——阶段消除法与置信上界法——对该框架进行了实例化,并证明两者在整个$p$值范围内均能实现次线性的$p$-均值遗憾。基于真实世界数据集生成的线性赌博机实例上的大量实验表明,我们的方法始终优于现有的最先进基线。