In this study, we introduce Unified Microphone Conversion, a unified generative framework to enhance the resilience of sound event classification systems against device variability. Building on the limitations of previous works, we condition the generator network with frequency response information to achieve many-to-many device mapping. This approach overcomes the inherent limitation of CycleGAN, requiring separate models for each device pair. Our framework leverages the strengths of CycleGAN for unpaired training to simulate device characteristics in audio recordings and significantly extends its scalability by integrating frequency response related information via Feature-wise Linear Modulation. The experiment results show that our method outperforms the state-of-the-art method by 2.6% and reducing variability by 0.8% in macro-average F1 score.
翻译:本研究提出统一麦克风转换,一种统一的生成框架,旨在提升声音事件分类系统对设备差异的鲁棒性。针对先前工作的局限性,我们通过向生成器网络注入频率响应信息,实现了多对多设备映射。该方法克服了CycleGAN需要为每对设备单独建模的固有缺陷。我们的框架利用CycleGAN在非配对训练中模拟录音设备特性的优势,并通过特征线性调制集成频率响应相关信息,显著扩展了其可扩展性。实验结果表明,我们的方法在宏平均F1分数上优于当前最优方法2.6%,并将变异性降低了0.8%。