Reward allocation, also known as the credit assignment problem, has been an important topic in economics, engineering, and machine learning. An important concept in credit assignment is the core, which is the set of stable allocations where no agent has the motivation to deviate from the grand coalition. In this paper, we consider the stable allocation learning problem of stochastic cooperative games, where the reward function is characterised as a random variable with an unknown distribution. Given an oracle that returns a stochastic reward for an enquired coalition each round, our goal is to learn the expected core, that is, the set of allocations that are stable in expectation. Within the class of strictly convex games, we present an algorithm named \texttt{Common-Points-Picking} that returns a stable allocation given a polynomial number of samples, with high probability. The analysis of our algorithm involves the development of several new results in convex geometry, including an extension of the separation hyperplane theorem for multiple convex sets, and may be of independent interest.
翻译:奖励分配,也称为信用分配问题,一直是经济学、工程学和机器学习中的重要课题。信用分配中的一个核心概念是“核”,即一组稳定分配,在此分配下没有任何代理有动机偏离大联盟。本文研究了随机合作博弈的稳定分配学习问题,其中奖励函数被刻画为具有未知分布的随机变量。给定一个预言机,每轮可返回所查询联盟的随机奖励,我们的目标是学习期望核,即期望意义下稳定的分配集合。在严格凸博弈类中,我们提出了一种名为\texttt{Common-Points-Picking}的算法,该算法能以高概率在多项式数量样本下返回一个稳定分配。算法的分析涉及凸几何中的若干新结果,包括多凸集分离超平面定理的推广,这些结果可能具有独立的研究价值。