"How much is my data worth?" is an increasingly common question posed by organizations and individuals alike. An answer to this question could allow, for instance, fairly distributing profits among multiple data contributors and determining prospective compensation when data breaches happen. In this paper, we study the problem of data valuation by utilizing the Shapley value, a popular notion of value which originated in cooperative game theory. The Shapley value defines a unique payoff scheme that satisfies many desiderata for the notion of data value. However, the Shapley value often requires exponential time to compute. To meet this challenge, we propose a repertoire of efficient algorithms for approximating the Shapley value. We also demonstrate the value of each training instance for various benchmark datasets.
翻译:“我的数据价值几何?”这是组织和个人日益关注的问题。回答这一问题可用于例如公平分配多方数据贡献者的收益,或在数据泄露时确定预期赔偿金额。本文利用合作博弈论中经典的价值概念——Shapley值来研究数据估值问题。Shapley值定义了一种独特的收益分配方案,能够满足数据价值概念的诸多理想性质。然而,Shapley值的计算通常需要指数级时间。为应对这一挑战,我们提出了一系列用于近似计算Shapley值的高效算法,并展示了各训练实例在多个基准数据集上的价值。