Existing Test-time Adaptation (TTA) studies rely heavily on static and homogeneous corruption protocols, such as ImageNet-C and CIFAR-10-C/100-C, leading to inconsistent evaluation settings and potentially inflated robustness estimates that are compared with real-world situations. TTA lacks a standardized evaluation infrastructure capable of modeling realistic heterogeneous acoustic degradation. We introduce DHAuDS, a standardized benchmark suite for evaluating audio classification TTA robustness under dynamic corruption severity and heterogeneous noise mixtures. Rather than proposing a new TTA algorithm, DHAuDS focuses on exposing robustness limitations that remain hidden under conventional fixed-noise evaluation protocols.
翻译:现有的测试时自适应(TTA)研究严重依赖静态且同质的损坏协议(如ImageNet-C和CIFAR-10-C/100-C),导致评估设置不一致,并可能高估与现实场景相比的鲁棒性。当前TTA缺乏能够模拟现实异构声学退化的标准化评估基础设施。我们提出DHAuDS,一种用于评估音频分类TTA在动态损坏严重性和异构噪声混合下鲁棒性的标准化基准套件。DHAuDS不提出新的TTA算法,而是专注于揭示在传统固定噪声评估协议下仍隐藏的鲁棒性局限性。