Multi-modality learning has become a crucial technique for improving the performance of machine learning applications across domains such as autonomous driving, robotics, and perception systems. While existing frameworks such as Auxiliary Modality Learning (AML) effectively utilize multiple data sources during training and enable inference with reduced modalities, they primarily operate in a single-agent context. This limitation is particularly critical in dynamic environments, such as connected autonomous vehicles (CAV), where incomplete data coverage can lead to decision-making blind spots. To address these challenges, we propose Collaborative Auxiliary Modality Learning ($\textbf{CAML}$), a novel multi-agent multi-modality framework that enables agents to collaborate and share multimodal data during training while allowing inference with reduced modalities per agent during testing. We systematically analyze the effectiveness of $\textbf{CAML}$ from the perspective of uncertainty reduction and data coverage, providing theoretical insights into its advantages over AML. Experimental results in collaborative decision-making for CAV in accident-prone scenarios demonstrate that \ours~achieves up to a ${\bf 58.13}\%$ improvement in accident detection. Additionally, we validate $\textbf{CAML}$ on real-world aerial-ground robot data for collaborative semantic segmentation, achieving up to a ${\bf 10.61}\%$ improvement in mIoU.
翻译:多模态学习已成为提升自动驾驶、机器人技术和感知系统等领域机器学习应用性能的关键技术。尽管现有框架如辅助模态学习(AML)在训练中能有效利用多源数据并支持模态缩减下的推理,但其主要适用于单智能体场景。这一局限在动态环境(如网联自动驾驶车辆)中尤为突出,其中不完整的数据覆盖可能导致决策盲区。为应对这些挑战,本文提出协作辅助模态学习($\textbf{CAML}$),这是一种新型多智能体多模态框架,使智能体能够在训练阶段协作共享多模态数据,同时在测试阶段允许各智能体以缩减模态进行推理。我们从不确定性降低与数据覆盖的视角系统分析了$\textbf{CAML}$的有效性,从理论上阐释了其相较于AML的优势。在事故高发场景下对网联自动驾驶车辆协同决策的实验表明,\ours~在事故检测方面最高可提升${\bf 58.13}\%$。此外,我们在真实空地机器人数据上验证了$\textbf{CAML}$在协同语义分割任务中的性能,其mIoU指标最高提升${\bf 10.61}\%$。