We investigate the data distribution valuation problem, which aims to quantify the values of data distributions from their samples. This is a recently proposed problem that is related to but different from classical data valuation and can be applied to various applications. For this problem, we develop a novel framework called Generalized Bayes Valuation that utilizes generalized Bayesian inference with a loss constructed from transferability measures. This framework allows us to solve, in a unified way, seemingly unrelated practical problems, such as annotator evaluation and data augmentation. Using the Bayesian principles, we further improve and enhance the applicability of our framework by extending it to the continuous data stream setting. Our experiment results confirm the effectiveness and efficiency of our framework in different real-world scenarios.
翻译:我们研究数据分布估值问题,该问题旨在通过样本量化数据分布的价值。这是近期提出的新问题,与经典数据估值既有联系又有区别,可应用于多种场景。针对该问题,我们提出名为广义贝叶斯估值的创新框架,该框架利用基于迁移性度量构建的损失函数实现广义贝叶斯推断。该框架能够统一解决标注者评估和数据增强等看似无关的实际问题。基于贝叶斯原理,我们通过扩展至连续数据流场景进一步提升了框架的实用性与适用性。实验结果表明,该框架在多种真实场景下均具有高效性和有效性。