Data valuation is critical in machine learning, as it helps enhance model transparency and protect data properties. Existing data valuation methods have primarily focused on discriminative models, neglecting deep generative models that have recently gained considerable attention. Similar to discriminative models, there is an urgent need to assess data contributions in deep generative models as well. However, previous data valuation approaches mainly relied on discriminative model performance metrics and required model retraining. Consequently, they cannot be applied directly and efficiently to recent deep generative models, such as generative adversarial networks and diffusion models, in practice. To bridge this gap, we formulate the data valuation problem in generative models from a similarity-matching perspective. Specifically, we introduce Generative Model Valuator (GMValuator), the first model-agnostic approach for any generative models, designed to provide data valuation for generation tasks. We have conducted extensive experiments to demonstrate the effectiveness of the proposed method. To the best of their knowledge, GMValuator is the first work that offers a training-free, post-hoc data valuation strategy for deep generative models.
翻译:数据估值在机器学习中至关重要,它有助于提升模型透明度并保护数据属性。现有数据估值方法主要关注判别式模型,忽视了近年来备受关注的深度生成模型。与判别式模型类似,深度生成模型同样迫切需要对数据贡献进行评估。然而,以往的数据估值方法主要依赖判别式模型的性能指标,且需要重新训练模型。因此,它们无法直接高效地应用于近年来如生成对抗网络和扩散模型等深度生成模型的实际场景。为填补这一空白,我们从相似性匹配的角度构建生成模型中的数据估值问题。具体而言,我们提出了生成模型估值器(GMValuator),这是首个适用于任意生成模型的无关模型方法,旨在为生成任务提供数据估值。我们通过大量实验证明了该方法的有效性。据研究者所知,GMValuator是首个为深度生成模型提供免训练、事后数据估值策略的工作。