Machine Listening, as usually formalized, attempts to perform a task that is, from our perspective, fundamentally human-performable, and performed by humans. Current automated models of Machine Listening vary from purely data-driven approaches to approaches imitating human systems. In recent years, the most promising approaches have been hybrid in that they have used data-driven approaches informed by models of the perceptual, cognitive, and semantic processes of the human system. Not only does the guidance provided by models of human perception and domain knowledge enable better, and more generalizable Machine Listening, in the converse, the lessons learned from these models may be used to verify or improve our models of human perception themselves. This paper summarizes advances in the development of such hybrid approaches, ranging from Machine Listening models that are informed by models of peripheral (human) auditory processes, to those that employ or derive semantic information encoded in relations between sounds. The research described herein was presented in a special session on "Synergy between human and machine approaches to sound/scene recognition and processing" at the 2023 ICASSP meeting.
翻译:机器听觉,如通常所定义的,试图执行一项从根本上而言可由人类完成且由人类实际执行的任务。当前自动化的机器听觉模型涵盖从纯数据驱动方法到模仿人类系统的不同路径。近年来,最有前景的方法采用混合策略:利用受人类感知、认知及语义过程模型指导的数据驱动方法。人类感知模型与领域知识所提供的引导不仅能够实现更优、更具泛化性的机器听觉,反之,从这些模型中习得的经验也可用于验证或改进人类感知模型本身。本文总结了此类混合方法的发展进展,涵盖从受(人类)外周听觉过程模型启发的机器听觉模型,到利用或推导声音间关系编码的语义信息的模型。本文所述研究成果收录于2023年ICASSP会议“人机协同的声音/场景识别与处理”特别会议。