Multimodal data, which can comprehensively perceive and recognize the physical world, has become an essential path towards general artificial intelligence. However, multimodal large models trained on public datasets often underperform in specific industrial domains. This paper proposes a multimodal federated learning framework that enables multiple enterprises to utilize private domain data to collaboratively train large models for vertical domains, achieving intelligent services across scenarios. The authors discuss in-depth the strategic transformation of federated learning in terms of intelligence foundation and objectives in the era of big model, as well as the new challenges faced in heterogeneous data, model aggregation, performance and cost trade-off, data privacy, and incentive mechanism. The paper elaborates a case study of leading enterprises contributing multimodal data and expert knowledge to city safety operation management , including distributed deployment and efficient coordination of the federated learning platform, technical innovations on data quality improvement based on large model capabilities and efficient joint fine-tuning approaches. Preliminary experiments show that enterprises can enhance and accumulate intelligent capabilities through multimodal model federated learning, thereby jointly creating an smart city model that provides high-quality intelligent services covering energy infrastructure safety, residential community security, and urban operation management. The established federated learning cooperation ecosystem is expected to further aggregate industry, academia, and research resources, realize large models in multiple vertical domains, and promote the large-scale industrial application of artificial intelligence and cutting-edge research on multimodal federated learning.
翻译:多模态数据能够全面感知和识别物理世界,已成为通向通用人工智能的必经之路。然而,基于公共数据集训练的多模态大模型在特定工业领域往往表现不佳。本文提出一种多模态联邦学习框架,使多个企业能够利用私有领域数据协同训练垂直领域的大模型,实现跨场景的智能服务。作者深入探讨了在大模型时代联邦学习在智能基础与目标上的战略转型,以及在异构数据、模型聚合、性能与成本权衡、数据隐私和激励机制方面面临的新挑战。论文详细阐述了领军企业将多模态数据和专家知识贡献于城市安全运营管理的案例研究,包括联邦学习平台的分布式部署与高效协调、基于大模型能力的数据质量提升技术及高效联合微调方法等技术创新。初步实验表明,企业可通过多模态模型联邦学习增强并积累智能能力,进而共同构建智慧城市模型,为能源基础设施安全、居民社区安保及城市运营管理等领域提供高质量智能服务。所建立的联邦学习合作生态有望进一步聚合产学研资源,实现多个垂直领域的大模型,推动人工智能的大规模产业应用及多模态联邦学习的前沿研究。