Many AI platforms, including traffic monitoring systems, use Federated Learning (FL) for decentralized sensor data processing for learning-based applications while preserving privacy and ensuring secured information transfer. On the other hand, applying supervised learning to large data samples, like high-resolution images requires intensive human labor to label different parts of a data sample. Multiple Instance Learning (MIL) alleviates this challenge by operating over labels assigned to the 'bag' of instances. In this paper, we introduce Federated Multiple-Instance Learning (FedMIL). This framework applies federated learning to boost the training performance in video-based MIL tasks such as vehicle accident detection using distributed CCTV networks. However, data sources in decentralized settings are not typically Independently and Identically Distributed (IID), making client selection imperative to collectively represent the entire dataset with minimal clients. To address this challenge, we propose DPPQ, a framework based on the Determinantal Point Process (DPP) with a quality-based kernel to select clients with the most diverse datasets that achieve better performance compared to both random selection and current DPP-based client selection methods even with less data utilization in the majority of non-IID cases. This offers a significant advantage for deployment on edge devices with limited computational resources, providing a reliable solution for training AI models in massive smart sensor networks.
翻译:摘要:许多人工智能平台(包括交通监控系统)采用联邦学习(FL)实现分散式传感器数据的处理,用于基于学习的应用,同时保护隐私并确保安全信息传输。另一方面,将监督学习应用于高分辨率图像等大规模数据样本时,需耗费大量人力对数据样本的不同部分进行标注。多实例学习(MIL)通过基于实例包标签的学习模式缓解了这一挑战。本文提出联邦多实例学习(FedMIL)框架,该框架将联邦学习应用于基于视频的MIL任务(例如利用分布式闭路电视网络进行车辆事故检测)以提升训练性能。然而,分散式环境中的数据源通常非独立同分布(Non-IID),这使得客户端选择成为必然需求——需用最少客户端实现整体数据集的代表性覆盖。为解决此问题,我们提出DPPQ框架,该框架基于行列式点过程(DPP)并采用质量感知核函数,选择数据集最具多样性的客户端。实验表明,在大部分非IID场景下,即使使用更少数据,该方法相比随机选择及现有基于DPP的客户端选择方法依然能实现更优性能。这为部署在计算资源受限的边缘设备提供了显著优势,为大规模智能传感器网络中的AI模型训练提供了可靠解决方案。