Acoustic Event Classification (AEC) has been widely used in devices such as smart speakers and mobile phones for home safety or accessibility support. As AEC models run on more and more devices with diverse computation resource constraints, it became increasingly expensive to develop models that are tuned to achieve optimal accuracy/computation trade-off for each given computation resource constraint. In this paper, we introduce a Once-For-All (OFA) Neural Architecture Search (NAS) framework for AEC. Specifically, we first train a weight-sharing supernet that supports different model architectures, followed by automatically searching for a model given specific computational resource constraints. Our experimental results showed that by just training once, the resulting model from NAS significantly outperforms both models trained individually from scratch and knowledge distillation (25.4% and 7.3% relative improvement). We also found that the benefit of weight-sharing supernet training of ultra-small models comes not only from searching but from optimization.
翻译:声学事件分类(AEC)已广泛应用于智能音箱和手机等设备中,用于家庭安全或无障碍辅助。随着AEC模型在越来越多具有不同计算资源约束的设备上运行,针对每个给定的计算资源约束开发能够实现最优精度/计算权衡的模型成本日益高昂。本文提出一种面向AEC的"一次性"(OFA)神经架构搜索(NAS)框架。具体而言,我们首先训练一个支持不同模型架构的权值共享超级网络,然后针对特定计算资源约束自动搜索模型。实验结果表明,仅需一次训练,通过NAS得到的模型性能显著优于从头单独训练的模型以及知识蒸馏方法(相对提升分别为25.4%和7.3%)。我们还发现,超小型模型的权值共享超级网络训练优势不仅来自搜索过程,也来自优化过程。