Recent Large Audio Language Models (LALMs) excel in understanding but often lack transparent reasoning. To address this "black-box" limitation, we organized the Audio Reasoning Challenge at Interspeech 2026, the first shared task dedicated to evaluating Chain-of-Thought (CoT) quality in the audio domain. The challenge introduced MMAR-Rubrics, a novel instance-level protocol assessing the factuality and logic of reasoning chains. Featured Single Model and Agent tracks, the competition attracting 156 teams from 18 countries and regions. Results show agent systems currently lead in reasoning quality, utilizing iterative tool orchestration and cross-modal analysis. Besides, single models are rapidly advancing via reinforcement learning and sophisticated data pipeline. We details the challenge design, methodology, and a comprehensive analysis of state-of-the-art systems, providing new insights for explainable audio intelligence.
翻译:近期的大型音频语言模型在理解方面表现出色,但其推理过程往往缺乏透明度。为应对这一"黑箱"限制,我们在Interspeech 2026上组织了首届专注于评估音频领域思维链质量的共享任务——音频推理挑战赛。该挑战赛引入了MMAR-Rubrics,一种新颖的实例级评估协议,用于评估推理链的事实性与逻辑性。竞赛设置了单模型与智能体双赛道,吸引了来自18个国家和地区的156支团队参与。结果表明,当前智能体系统通过迭代式工具编排与跨模态分析,在推理质量上处于领先地位。此外,单模型系统通过强化学习与复杂数据管道也正在快速进步。本文详细阐述了挑战赛的设计、方法论以及对前沿系统的全面分析,为可解释音频智能提供了新的见解。