Model inversion and membership inference attacks aim to reconstruct and verify the data which a model was trained on. However, they are not guaranteed to find all training samples as they do not know the size of the training set. In this paper, we introduce a new task: dataset size recovery, that aims to determine the number of samples used to train a model, directly from its weights. We then propose DSiRe, a method for recovering the number of images used to fine-tune a model, in the common case where fine-tuning uses LoRA. We discover that both the norm and the spectrum of the LoRA matrices are closely linked to the fine-tuning dataset size; we leverage this finding to propose a simple yet effective prediction algorithm. To evaluate dataset size recovery of LoRA weights, we develop and release a new benchmark, LoRA-WiSE, consisting of over 25000 weight snapshots from more than 2000 diverse LoRA fine-tuned models. Our best classifier can predict the number of fine-tuning images with a mean absolute error of 0.36 images, establishing the feasibility of this attack.
翻译:模型反演与成员推理攻击旨在重构和验证模型训练所用的数据。然而,这些方法无法保证发现所有训练样本,因为它们不知道训练集的具体规模。本文提出一项新任务:数据集规模恢复,其目标是从模型权重直接推断出训练所使用的样本数量。针对当前普遍采用LoRA进行模型微调的场景,我们提出DSiRe方法,用于恢复微调模型时使用的图像数量。我们发现,LoRA矩阵的范数与谱特性均与微调数据集规模紧密相关;基于这一发现,我们提出了一种简单而有效的预测算法。为评估LoRA权重的数据集规模恢复能力,我们开发并发布了新基准测试LoRA-WiSE,该基准包含来自2000多个多样化LoRA微调模型的超过25000个权重快照。我们提出的最优分类器能以平均绝对误差0.36张图像的精度预测微调图像数量,证实了此类攻击的可行性。