Recently, many forms of audio industrial applications, such as sound monitoring and source localization, have begun exploiting smart multi-modal devices equipped with a microphone array. Regrettably, model-based methods are often difficult to employ for such devices due to their high computational complexity, as well as the difficulty of appropriately selecting the user-determined parameters. As an alternative, one may use deep network-based methods, but these are often difficult to generalize, nor can they generate the desired beamforming map directly. In this paper, a computationally efficient acoustic beamforming algorithm is proposed, which may be unrolled to form a model-based deep learning network for real-time imaging, here termed the DAMAS-FISTA-Net. By exploiting the natural structure of an acoustic beamformer, the proposed network inherits the physical knowledge of the acoustic system, and thus learns the underlying physical properties of the propagation. As a result, all the network parameters may be learned end-to-end, guided by a model-based prior using back-propagation. Notably, the proposed network enables an excellent interpretability and the ability of being able to process the raw data directly. Extensive numerical experiments using both simulated and real-world data illustrate the preferable performance of the DAMAS-FISTA-Net as compared to alternative approaches.
翻译:近年来,许多音频工业应用(如声音监测和声源定位)已开始利用配备麦克风阵列的智能多模态设备。遗憾的是,基于模型的方法由于计算复杂度高且难以恰当选择用户自定义参数,通常难以应用于此类设备。作为替代方案,深度网络方法虽可被使用,但往往难以泛化,也无法直接生成所需的波束图。本文提出一种计算高效的声学波束成形算法,该算法可展开形成用于实时成像的模型驱动深度学习网络,称为 DAMAS-FISTA-Net。通过利用声学波束成形的自然结构,所提网络继承了声学系统的物理知识,从而学习传播过程中的潜在物理属性。因此,所有网络参数可在模型先验的引导下通过反向传播进行端到端学习。值得注意的是,所提网络具有出色的可解释性,并能够直接处理原始数据。使用模拟数据和真实数据的大量数值实验表明,相比其他方法,DAMAS-FISTA-Net 展现出优越的性能。