We study learning-based design of fair allocation mechanisms for divisible resources, using proportional fairness (PF) as a benchmark. The learning setting is a significant departure from the classic mechanism design literature, in that, we need to learn fair mechanisms solely from data. In particular, we consider the challenging problem of learning one-shot allocation mechanisms -- without the use of money -- that incentivize strategic agents to be truthful when reporting their valuations. It is well-known that the mechanism that directly seeks to optimize PF is not incentive compatible, meaning that the agents can potentially misreport their preferences to gain increased allocations. We introduce the notion of "exploitability" of a mechanism to measure the relative gain in utility from misreport, and make the following important contributions in the paper: (i) Using sophisticated techniques inspired by differentiable convex programming literature, we design a numerically efficient approach for computing the exploitability of the PF mechanism. This novel contribution enables us to quantify the gap that needs to be bridged to approximate PF via incentive compatible mechanisms. (ii) Next, we modify the PF mechanism to introduce a trade-off between fairness and exploitability. By properly controlling this trade-off using data, we show that our proposed mechanism, ExPF-Net, provides a strong approximation to the PF mechanism while maintaining low exploitability. This mechanism, however, comes with a high computational cost. (iii) To address the computational challenges, we propose another mechanism ExS-Net, which is end-to-end parameterized by a neural network. ExS-Net enjoys similar (slightly inferior) performance and significantly accelerated training and inference time performance. (iv) Extensive numerical simulations demonstrate the robustness and efficacy of the proposed mechanisms.
翻译:我们研究基于学习的可分割资源公平分配机制设计,以比例公平性(PF)作为基准。该学习设置与经典机制设计文献存在重大差异,即我们需要仅从数据中学习公平机制。具体而言,我们考虑一个具有挑战性的问题:学习无需使用金钱的一次性分配机制,该机制能激励策略性主体在报告自身估值时诚实。众所周知,直接优化PF的机制并非激励相容,这意味着主体可能通过虚假报告偏好来获取更多分配。我们引入机制"可剥削性"概念来衡量虚假报告带来的相对效用增益,并在本文中做出以下重要贡献:(i) 受可微凸规划文献启发,我们设计了一种数值高效的方法来计算PF机制的可剥削性。这一创新贡献使我们能够量化通过激励相容机制逼近PF所需弥补的差距。(ii) 接着,我们修改PF机制以引入公平性与可剥削性之间的权衡。通过利用数据恰当控制该权衡,我们提出的ExPF-Net机制能在保持低可剥削性的同时强有力地近似PF机制。然而,该机制计算成本较高。(iii) 为解决计算挑战,我们提出另一机制ExS-Net,其采用神经网络进行端到端参数化。ExS-Net具有相似(略逊)的性能,但训练和推理时间显著加速。(iv) 大量数值模拟证明了所提出机制的鲁棒性和有效性。