We study resource allocation problems in which a central planner allocates resources among strategic agents with private cost functions in order to minimize a social cost, defined as an aggregate of the agents' costs. This setting poses two main challenges: (i) the agents' cost functions may be unknown to them or difficult to specify explicitly, and (ii) agents may misreport their costs strategically. To address these challenges, we propose an algorithm that combines preference-based learning with Vickrey-Clarke-Groves (VCG) payments to incentivize truthful reporting. Our algorithm selects informative preference queries via D-optimal design, estimates cost parameters through maximum likelihood, and computes VCG allocations and payments based on these estimates. In a one-shot setting, we prove that the mechanism is approximately truthful, individually rational, and efficient up to an error of $\tilde{\mathcal O}(K^{-1/2})$ for $K$ preference queries per agent. In an online setting, these guarantees hold asymptotically with sublinear regret at a rate of $\tilde{\mathcal O}(T^{2/3})$ after $T$ rounds. Finally, we validate our approach through a numerical case study on demand response in local electricity markets.
翻译:本研究探讨资源分配问题,其中中央规划者需在具有私有成本函数的策略型智能体之间分配资源,以最小化社会成本(定义为智能体成本的加总)。该设定面临两大挑战:(i) 智能体的成本函数可能未知或难以显式表达;(ii) 智能体可能策略性虚报成本。为解决这些挑战,我们提出一种将偏好学习与维克里-克拉克-格罗夫斯(VCG)支付相结合的算法,以激励真实报告。该算法通过D最优设计选择信息性偏好查询,通过极大似然估计成本参数,并基于这些估计计算VCG分配与支付。在单次博弈设定中,我们证明该机制具有近似真实性、个体理性与有效性,当每个智能体进行K次偏好查询时误差为$\tilde{\mathcal O}(K^{-1/2})$。在在线设定中,这些保证渐近成立,经过T轮后以$\tilde{\mathcal O}(T^{2/3})$速率实现次线性遗憾。最后,我们通过本地电力市场中需求响应的数值案例验证了所提方法。