Sound source localization (SSL) demonstrates remarkable results in controlled settings but struggles in real-world deployment due to dual imbalance challenges: intra-task imbalance arising from long-tailed direction-of-arrival (DoA) distributions, and inter-task imbalance induced by cross-task skews and overlaps. These often lead to catastrophic forgetting, significantly degrading the localization accuracy. To mitigate these issues, we propose a unified framework with two key innovations. Specifically, we design a GCC-PHAT-based data augmentation (GDA) method that leverages peak characteristics to alleviate intra-task distribution skews. We also propose an Analytic dynamic imbalance rectifier (ADIR) with task-adaption regularization, which enables analytic updates that adapt to inter-task dynamics. On the SSLR benchmark, our proposal achieves state-of-the-art (SoTA) results of 89.0% accuracy, 5.3° mean absolute error, and 1.6 backward transfer, demonstrating robustness to evolving imbalances without exemplar storage.
翻译:声源定位在受控环境中表现出卓越的性能,但在实际部署中却面临双重不平衡挑战:由到达方向的长尾分布引起的任务内不平衡,以及由跨任务偏斜和重叠引起的任务间不平衡。这些挑战常导致灾难性遗忘,显著降低定位精度。为缓解这些问题,我们提出了一个统一框架,包含两项关键创新。具体而言,我们设计了一种基于GCC-PHAT的数据增强方法,该方法利用峰值特征来缓解任务内分布偏斜。我们还提出了一种具有任务自适应正则化的解析式动态不平衡校正器,能够实现适应任务间动态变化的解析式更新。在SSLR基准测试中,我们的方案取得了89.0%的准确率、5.3°的平均绝对误差和1.6的后向迁移率等最先进的结果,证明了其在无需存储样本的情况下对演化不平衡的鲁棒性。