ModHiFi: Identifying High Fidelity predictive components for Model Modification

Open weight models, which are ubiquitous, rarely provide access to their training data or loss function. This makes modifying such models for tasks such as pruning or unlearning, which are constrained by this unavailability, an active area of research. Existing techniques typically require gradients or ground-truth labels, rendering them infeasible in settings with limited computational resources. In this work, we investigate the fundamental question of identifying components that are critical to the model's predictive performance, without access to either gradients or the loss function, and with only distributional access such as synthetic data. We theoretically demonstrate that the global error is linearly bounded by local reconstruction errors for Lipschitz-continuous networks such as CNNs and well-trained Transformers (which, contrary to existing literature, we find exhibit Lipschitz continuity). This motivates using the locally reconstructive behavior of component subsets to quantify their global importance, via a metric that we term Subset Fidelity. In the uncorrelated features setting, selecting individual components based on their Subset Fidelity scores is optimal, which we utilize to propose ModHiFi, an algorithm for model modification that requires neither training data nor access to a loss function. ModHiFi-P, for structured pruning, achieves an 11\% speedup over the current state of the art on ImageNet models and competitive performance on language models. ModHiFi-U, for classwise unlearning, achieves complete unlearning on CIFAR-10 without fine-tuning and demonstrates competitive performance on Swin Transformers.

翻译：开放权重模型虽然无处不在，但很少提供其训练数据或损失函数的访问权限。这使得修改此类模型以进行剪枝或遗忘等任务（这些任务因无法获取上述资源而受限）成为一个活跃的研究领域。现有技术通常需要梯度或真实标签，这在计算资源有限的环境中变得不可行。在本工作中，我们研究了一个根本性问题：如何在不依赖梯度或损失函数、且仅能通过合成数据等分布级访问的情况下，识别对模型预测性能至关重要的组件。我们从理论上证明，对于Lipschitz连续网络（如CNN和训练良好的Transformer模型——与现有文献观点相反，我们发现它们表现出Lipschitz连续性），全局误差受局部重建误差的线性约束。这启发了我们通过组件子集的局部重建行为来量化其全局重要性，我们称之为子集保真度的度量指标。在特征不相关设置下，基于子集保真度分数选择单个组件是最优策略，我们利用此提出ModHiFi——一种既不需要训练数据也不需访问损失函数的模型修改算法。针对结构化剪枝的ModHiFi-P在ImageNet模型上比当前最优方法实现11%的加速，并在语言模型上取得竞争性性能。针对类别级遗忘的ModHiFi-U在CIFAR-10上无需微调即可实现完全遗忘，并在Swin Transformer上展现出竞争性性能。