To enhance the efficiency and practicality of federated bandit learning, recent advances have introduced incentives to motivate communication among clients, where a client participates only when the incentive offered by the server outweighs its participation cost. However, existing incentive mechanisms naively assume the clients are truthful: they all report their true cost and thus the higher cost one participating client claims, the more the server has to pay. Therefore, such mechanisms are vulnerable to strategic clients aiming to optimize their own utility by misreporting. To address this issue, we propose an incentive compatible (i.e., truthful) communication protocol, named Truth-FedBan, where the incentive for each participant is independent of its self-reported cost, and reporting the true cost is the only way to achieve the best utility. More importantly, Truth-FedBan still guarantees the sub-linear regret and communication cost without any overheads. In other words, the core conceptual contribution of this paper is, for the first time, demonstrating the possibility of simultaneously achieving incentive compatibility and nearly optimal regret in federated bandit learning. Extensive numerical studies further validate the effectiveness of our proposed solution.
翻译:为提升联邦强盗学习的效率与实用性,近期研究引入激励机制以促进客户端间的通信,即当服务器提供的激励超过客户端参与成本时,该客户端才会参与。然而,现有激励机制天真地假设客户端是诚实的:它们均报告真实成本,因此声称较高成本的参与客户端越多,服务器需支付的费用越高。这种机制容易受到策略性客户端的攻击,这些客户端通过谎报成本以优化自身效用。为解决此问题,我们提出一种激励兼容(即诚实)的通信协议Truth-FedBan,其中每个参与者的激励与其自报告成本无关,报告真实成本是实现最优效用的唯一方式。更重要的是,Truth-FedBan在无任何额外开销的情况下,仍能保证次线性遗憾和通信成本。换言之,本文的核心概念贡献在于首次证明在联邦强盗学习中同时实现激励兼容性与近最优遗憾的可能性。大量数值研究进一步验证了所提方案的有效性。