The paradigm of learning from automatic annotations driven by pre-trained experts and Foundation Models dominates data-hungry applications. However, it introduces a critical challenge: model-induced label noise. Unlike stochastic noise in classical robust learning, this noise stems from annotator inductive biases, manifesting as systematic errors tightly coupled with local feature manifolds. Existing methods relying on global transition matrices underfit these structural patterns, while learning instance-specific matrices remains mathematically intractable. We propose Model-Induced Noise Decoupling (MIND), a theoretically grounded framework addressing this dilemma. We demonstrate that the high-dimensional noise manifold can be decoupled into tractable, subspace-dependent components via Latent Manifold Disentanglement. Specifically, our Latent Decoupling Estimator (LDE) dynamically projects samples into latent structural clusters with consistent error modes, facilitating noise identifiability without ground-truth anchor points. To rigorously evaluate robustness, we adopt a hierarchical protocol: moving from controlled noise on CIFAR-100 to a structural stress test on large-scale real-world 3D datasets (S3DIS, ScanNet), where error patterns explicitly couple with geometric manifolds. Empirically, MIND significantly outperforms state-of-the-art methods on these complex benchmarks and effectively corrects zero-shot hallucinations from Vision-Language Models (e.g., OpenSeg), highlighting its potential as a robust distillation framework for Foundation Models.
翻译:由预训练专家和基础模型驱动的自动标注学习范式主导了数据密集型应用。然而,这引入了一个关键挑战:模型诱导的标签噪声。与经典鲁棒学习中的随机噪声不同,此噪声源于标注者的归纳偏差,表现为与局部特征流形紧密耦合的系统性错误。依赖全局转移矩阵的现有方法无法充分拟合这些结构模式,而学习实例特定的转移矩阵在数学上仍然难以处理。我们提出模型诱导噪声解耦(MIND),一个从理论上解决此困境的框架。我们证明高维噪声流形可通过潜在流形解耦被分解为可处理的子空间依赖分量。具体而言,我们的潜在解耦估计器(LDE)将样本动态投影到具有一致错误模式的潜在结构聚类中,从而在无需真实锚点的情况下促进噪声可识别性。为严格评估鲁棒性,我们采用分层协议:从CIFAR-100上的可控噪声过渡到大规模真实世界3D数据集(S3DIS、ScanNet)的结构压力测试,其中错误模式与几何流形显式耦合。实验表明,MIND在这些复杂基准上显著优于现有最优方法,并能有效纠正视觉-语言模型(如OpenSeg)的零样本幻觉,凸显其作为基础模型鲁棒蒸馏框架的潜力。