Noisy labels significantly hinder the accuracy and generalization of machine learning models, particularly due to ambiguous instance features. Traditional techniques that attempt to correct noisy labels directly, such as those using transition matrices, often fail to address the inherent complexities of the problem sufficiently. In this paper, we introduce EchoAlign, a transformative paradigm shift in learning from noisy labels. Instead of focusing on label correction, EchoAlign treats noisy labels ($\tilde{Y}$) as accurate and modifies corresponding instance features ($X$) to achieve better alignment with $\tilde{Y}$. EchoAlign's core components are (1) EchoMod: Employing controllable generative models, EchoMod precisely modifies instances while maintaining their intrinsic characteristics and ensuring alignment with the noisy labels. (2) EchoSelect: Instance modification inevitably introduces distribution shifts between training and test sets. EchoSelect maintains a significant portion of clean original instances to mitigate these shifts. It leverages the distinct feature similarity distributions between original and modified instances as a robust tool for accurate sample selection. This integrated approach yields remarkable results. In environments with 30% instance-dependent noise, even at 99% selection accuracy, EchoSelect retains nearly twice the number of samples compared to the previous best method. Notably, on three datasets, EchoAlign surpasses previous state-of-the-art techniques with a substantial improvement.
翻译:噪声标签显著阻碍了机器学习模型的准确性和泛化能力,这主要源于模糊的实例特征。传统方法试图直接修正噪声标签,例如使用转移矩阵的技术,往往难以充分解决问题的内在复杂性。本文引入EchoAlign,一种从噪声标签中学习的范式变革性方法。EchoAlign不专注于标签修正,而是将噪声标签($\tilde{Y}$)视为准确的,并修改相应的实例特征($X$),以实现与$\tilde{Y}$更好的对齐。EchoAlign的核心组件包括:(1)EchoMod:利用可控生成模型,EchoMod精确修改实例,同时保持其内在特征并确保与噪声标签的对齐。(2)EchoSelect:实例修改不可避免地会在训练集和测试集之间引入分布偏移。EchoSelect保留大部分干净的原始实例以缓解这些偏移,并利用原始实例与修改实例之间独特的特征相似性分布作为准确样本选择的稳健工具。这种集成方法取得了显著成果。在30%实例相关噪声的环境中,即使选择准确率达到99%,EchoSelect保留的样本数量也比之前最优方法多近两倍。值得注意的是,在三个数据集上,EchoAlign以大幅提升超越了此前最先进的技术。