Label corruption, where training samples have incorrect labels, can significantly degrade the performance of machine learning models. This corruption often arises from non-expert labeling or adversarial attacks. Acquiring large, perfectly labeled datasets is costly, and retraining large models from scratch when a clean dataset becomes available is computationally expensive. To address this challenge, we propose Post-Training Correction, a new paradigm that adjusts model parameters after initial training to mitigate label noise, eliminating the need for retraining. We introduce Verifix, a novel Singular Value Decomposition (SVD) based algorithm that leverages a small, verified dataset to correct the model weights using a single update. Verifix uses SVD to estimate a Clean Activation Space and then projects the model's weights onto this space to suppress activations corresponding to corrupted data. We demonstrate Verifix's effectiveness on both synthetic and real-world label noise. Experiments on the CIFAR dataset with 25% synthetic corruption show 7.36% generalization improvements on average. Additionally, we observe generalization improvements of up to 2.63% on naturally corrupted datasets like WebVision1.0 and Clothing1M.
翻译:标签损坏(即训练样本存在错误标签)会显著降低机器学习模型的性能。这种损坏通常源于非专业标注或对抗性攻击。获取大规模完美标注数据集成本高昂,且在获得干净数据集后从头重新训练大型模型计算代价极大。为解决该问题,我们提出后训练修正(Post-Training Correction)这一新范式,通过调整初始训练后的模型参数来缓解标签噪声,无需重新训练。我们引入Verifix——一种基于奇异值分解(SVD)的新型算法,该算法利用小规模验证数据集,通过单次更新修正模型权重。Verifix利用SVD估计干净激活空间,随后将模型权重投影至该空间以抑制对应损坏数据的激活。我们在合成标签噪声与真实标签噪声场景下验证了Verifix的有效性。在包含25%合成噪声的CIFAR数据集实验中,平均泛化性能提升达7.36%。此外,在WebVision1.0与Clothing1M等自然损坏数据集上,我们观察到泛化性能最高提升2.63%。