Coalition formation concerns strategic collaborations of selfish agents that form coalitions based on their preferences. It is often assumed that coalitions are disjoint and preferences are fully known, which may not hold in practice. In this paper, we thus present a new model of coalition formation with possibly overlapping coalitions under partial information, where selfish agents may be part of multiple coalitions simultaneously and their full preferences are initially unknown. Instead, information about past interactions and associated utility feedback is stored in a fixed offline dataset, and we aim to efficiently infer the agents' preferences from this dataset. We analyze the impact of diverse dataset information constraints by studying two types of utility feedback that can be stored in the dataset: agent- and coalition-level utility feedback. For both feedback models, we identify assumptions under which the dataset covers sufficient information for an offline learning algorithm to infer preferences and use them to recover a partition that is (approximately) Nash stable, in which no agent can improve her utility by unilaterally deviating. Our additional goal is devising algorithms with low sample complexity, requiring only a small dataset to obtain a desired approximation to Nash stability. Under agent-level feedback, we provide a sample-efficient algorithm proven to obtain an approximately Nash stable partition under a sufficient and necessary assumption on the information covered by the dataset. However, under coalition-level feedback, we show that only under a stricter assumption is sufficient for sample-efficient learning. Still, in multiple cases, our algorithms' sample complexity bounds have optimality guarantees up to logarithmic factors. Finally, extensive experiments show that our algorithm converges to a low approximation level to Nash stability across diverse settings.
翻译:联盟形成关注自私智能体基于其偏好形成联盟的战略协作。通常假设联盟互不相交且偏好完全已知,但这在实践中可能不成立。因此,本文提出了一种在部分信息下可能重叠联盟形成的新模型,其中自私智能体可能同时属于多个联盟,且其完整偏好初始未知。相反,关于历史交互及关联效用反馈的信息存储于固定的离线数据集中,我们的目标是从该数据集高效推断智能体的偏好。通过研究可存储于数据集中的两种效用反馈类型——智能体层面与联盟层面的效用反馈,我们分析了多样化数据集信息约束的影响。针对两种反馈模型,我们识别了数据集覆盖足够信息的假设条件,使得离线学习算法能够推断偏好并利用其恢复(近似)纳什稳定的划分,即在此划分中无智能体可通过单方面偏离来提高自身效用。我们的额外目标是设计具有低样本复杂度的算法,仅需小型数据集即可获得对纳什稳定性的期望近似。在智能体层面反馈下,我们提供了一种样本高效算法,证明在数据集所覆盖信息的充分必要假设下可获得近似纳什稳定划分。然而,在联盟层面反馈下,我们表明仅当更严格的假设成立时,样本高效学习才可行。尽管如此,在多种情况下,我们算法的样本复杂度界限具有直至对数因子的最优性保证。最后,大量实验表明我们的算法在不同设置下均能收敛至纳什稳定性的低近似水平。