Machine Listening, as usually formalized, attempts to perform a task that is, from our perspective, fundamentally human-performable, and performed by humans. Current automated models of Machine Listening vary from purely data-driven approaches to approaches imitating human systems. In recent years, the most promising approaches have been hybrid in that they have used data-driven approaches informed by models of the perceptual, cognitive, and semantic processes of the human system. Not only does the guidance provided by models of human perception and domain knowledge enable better, and more generalizable Machine Listening, in the converse, the lessons learned from these models may be used to verify or improve our models of human perception themselves. This paper summarizes advances in the development of such hybrid approaches, ranging from Machine Listening models that are informed by models of peripheral (human) auditory processes, to those that employ or derive semantic information encoded in relations between sounds. The research described herein was presented in a special session on ``Synergy between human and machine approaches to sound/scene recognition and processing'' at the 2023 ICASSP meeting.
翻译:机器聆听,按照通常的定义,试图执行一项从我们的角度来看本质上是由人类执行且可由人类完成的任务。当前的机器聆听自动化模型涵盖从纯粹的数据驱动方法到模仿人类系统的方法。近年来,最有前景的方法呈现混合特征,即它们采用基于人类系统的感知、认知和语义过程模型所指导的数据驱动方法。人类感知模型和领域知识提供的指导不仅能实现更优且更具泛化性的机器聆听,反之,从这些模型中汲取的经验可用于验证或改进我们自身对人类感知的模型。本文总结了此类混合方法的发展进展,涵盖从受(人类)外周听觉过程模型启发的机器聆听模型,到利用或推导声音间关系所编码的语义信息的模型。本文所述研究是在2023年ICASSP会议的“人类与机器方法在声音/场景识别与处理中的协同”特别会议上发表的。