Existing dynamic data pruning methods often fail under noisy-label settings, as they typically rely on per-sample loss as the ranking criterion. This could mistakenly lead to preserving noisy samples due to their high loss values, resulting in significant performance drop. To address this, we propose AlignPrune, a noise-robust module designed to enhance the reliability of dynamic pruning under label noise. Specifically, AlignPrune introduces the Dynamic Alignment Score (DAS), which is a loss-trajectory-based criterion that enables more accurate identification of noisy samples, thereby improving pruning effectiveness. As a simple yet effective plug-and-play module, AlignPrune can be seamlessly integrated into state-of-the-art dynamic pruning frameworks, consistently outperforming them without modifying either the model architecture or the training pipeline. Extensive experiments on five widely-used benchmarks across various noise types and pruning ratios demonstrate the effectiveness of AlignPrune, boosting accuracy by up to 6.3\% over state-of-the-art baselines. Our results offer a generalizable solution for pruning under noisy data, encouraging further exploration of learning in real-world scenarios. Code is available at: https://github.com/leonqin430/AlignPrune.
翻译:现有动态数据剪枝方法在噪声标签场景下常常失效,因为这类方法通常以单样本损失作为排序准则。这可能导致因高损失值而错误保留噪声样本,造成显著的性能下降。为解决此问题,我们提出AlignPrune——一种面向标签噪声场景、能增强动态剪枝可靠性的鲁棒模块。具体而言,AlignPrune引入动态对齐分数(DAS),这是一种基于损失轨迹的准则,能够更准确地识别噪声样本,从而提升剪枝效果。作为简洁高效的即插即用模块,AlignPrune可无缝集成至现有最优动态剪枝框架中,在无需修改模型架构或训练流程的前提下持续超越原始方法。在五种涵盖不同噪声类型与剪枝比例的广泛基准测试中,大量实验证明了AlignPrune的有效性,其准确率相比最优基线方法最高提升6.3%。我们的研究成果为噪声数据下的剪枝提供了可泛化的解决方案,鼓励在真实场景中开展进一步学习研究。代码已开源:https://github.com/leonqin430/AlignPrune。