DHAuDS：一种用于测试时自适应的动态异构音频基准 (DHAuDS: A Dynamic and Heterogeneous Audio Benchmark for Test-Time Adaptation)

Audio classifiers frequently face domain shift, when models trained on one dataset lose accuracy on data recorded in acoustically different conditions. Previous Test-Time Adaptation (TTA) research in speech and sound analysis often evaluates models under fixed or mismatched noise settings, that fail to mimic real-world variability. To overcome these limitations, this paper presents DHAuDS (Dynamic and Heterogeneous Audio Domain Shift), a benchmark designed to assess TTA approaches under more realistic and diverse acoustic shifts. DHAuDS comprises four standardized benchmarks: UrbanSound8K-C, SpeechCommandsV2-C, VocalSound-C, and ReefSet-C, each constructed with dynamic corruption severity levels and heterogeneous noise types to simulate authentic audio degradation scenarios. The framework defines 14 evaluation criteria for each benchmark (8 for UrbanSound8K-C), resulting in 50 unrepeated criteria (124 experiments) that collectively enable fair, reproducible, and cross-domain comparison of TTA algorithms. Through the inclusion of dynamic and mixed-domain noise settings, DHAuDS offers a consistent and publicly reproducible testbed to support ongoing studies in robust and adaptive audio modeling.

翻译：音频分类器常面临领域偏移问题，即基于单一数据集训练的模型在声学条件不同的数据上准确性下降。先前语音与声音分析领域的测试时自适应研究通常在固定或不匹配的噪声设置下评估模型，未能模拟真实世界的动态变化。为突破这些局限，本文提出DHAuDS（动态异构音频领域偏移基准），该基准旨在更真实、更多样的声学偏移场景下评估TTA方法。DHAuDS包含四个标准化基准：UrbanSound8K-C、SpeechCommandsV2-C、VocalSound-C和ReefSet-C，每个基准均采用动态变化的噪声强度与异构噪声类型构建，以模拟真实的音频退化场景。该框架为每个基准定义14项评估指标（UrbanSound8K-C为8项），共形成50项非重复指标（对应124组实验），可实现TTA算法公平、可复现、跨领域的综合比较。通过引入动态及混合领域噪声设置，DHAuDS为鲁棒自适应音频建模研究提供了统一且可公开复现的测试平台。