AviationLMM: A Large Multimodal Foundation Model for Civil Aviation

from arxiv, Accepted by 2025 7th International Conference on Interdisciplinary Computer Science and Engineering (ICICSE 2025) conference, Chongqing, China; 9 pages,1 figure,5 tables

Civil aviation is a cornerstone of global transportation and commerce, and ensuring its safety, efficiency and customer satisfaction is paramount. Yet conventional Artificial Intelligence (AI) solutions in aviation remain siloed and narrow, focusing on isolated tasks or single modalities. They struggle to integrate heterogeneous data such as voice communications, radar tracks, sensor streams and textual reports, which limits situational awareness, adaptability, and real-time decision support. This paper introduces the vision of AviationLMM, a Large Multimodal foundation Model for civil aviation, designed to unify the heterogeneous data streams of civil aviation and enable understanding, reasoning, generation and agentic applications. We firstly identify the gaps between existing AI solutions and requirements. Secondly, we describe the model architecture that ingests multimodal inputs such as air-ground voice, surveillance, on-board telemetry, video and structured texts, and performs cross-modal alignment and fusion, and produces flexible outputs ranging from situation summaries and risk alerts to predictive diagnostics and multimodal incident reconstructions. In order to fully realize this vision, we identify key research opportunities to address, including data acquisition, alignment and fusion, pretraining, reasoning, trustworthiness, privacy, robustness to missing modalities, and synthetic scenario generation. By articulating the design and challenges of AviationLMM, we aim to boost the civil aviation foundation model progress and catalyze coordinated research efforts toward an integrated, trustworthy and privacy-preserving aviation AI ecosystem.

翻译：民用航空是全球交通与商业的基石，确保其安全性、效率及客户满意度至关重要。然而，当前航空领域的人工智能（AI）解决方案仍处于孤立与局限状态，多专注于独立任务或单一模态。这些方案难以整合语音通信、雷达航迹、传感器流与文本报告等异构数据，从而限制了态势感知、适应性与实时决策支持能力。本文提出AviationLMM的构想——一个面向民用航空的大型多模态基础模型，旨在统一民用航空的异构数据流，并实现理解、推理、生成及智能体应用。我们首先分析了现有AI解决方案与需求之间的差距；其次，描述了模型架构：该架构可接收空-地语音、监视数据、机载遥测、视频及结构化文本等多模态输入，执行跨模态对齐与融合，并生成灵活的输出，涵盖从态势摘要、风险预警到预测性诊断与多模态事件重建等多种形式。为实现这一愿景，我们指出了需解决的关键研究机遇，包括数据获取、对齐与融合、预训练、推理、可信性、隐私保护、缺失模态鲁棒性以及合成场景生成。通过阐明AviationLMM的设计与挑战，我们旨在推动民用航空基础模型的进展，并促进协同研究，以构建一个集成、可信且保护隐私的航空AI生态系统。