Mobile crowdsensing (MCS) is a distributed sensing architecture that utilizes existing sensors on mobile units (MUs) to perform sensing tasks. A mobile crowdsensing platform (MCSP) publishes the sensing tasks and the MUs decide whether to participate in exchange for money. The MCS system is dynamic: the task requirements, the MUs' availability, and their available resources change over time. The MUs aim to find an efficient task participation strategy to maximize their income while the MCSP focuses on maximizing the number of completed tasks. As optimal strategies require perfect non-causal information about the MCS system, which is unavailable in realistic scenarios, the main challenge is to find an efficient task participation strategy for the MUs under incomplete information. To this end, a novel fully decentralized federated deep reinforcement learning algorithm, FDRL-PPO, is proposed. FDRL-PPO enables every MU to learn its own task participation strategy based on its experiences, available resources, and preferences, without relying on perfect non-causal information about the MCS system. To replenish their batteries, the MUs rely on energy harvesting. As a result, their available energy varies over time, leading to varying availability and fragmented learning experiences. To mitigate these challenges, the proposed approach leverages federated learning, enabling MUs to collaboratively improve their models without sharing private raw data like their own experiences. By exchanging only learned models, MUs collectively compensate for individual limitations, and find more scalable, robust, and efficient task participation strategies. Comprehensive evaluations on both synthetic and real-world datasets show that FDRL-PPO consistently outperforms benchmark algorithms in terms of task completion ratio, fairness in task completion, energy consumption, and number of conflicting proposals.
翻译:移动群智感知(MCS)是一种利用移动单元(MUs)上现有传感器执行感知任务的分布式感知架构。移动群智感知平台(MCSP)发布感知任务,MUs决定是否参与以换取报酬。MCS系统是动态的:任务需求、MUs的可用性及其可用资源随时间变化。MUs旨在找到高效的任务参与策略以最大化其收入,而MCSP则专注于最大化已完成任务的数量。由于最优策略需要关于MCS系统的完美非因果信息,这在现实场景中不可获得,因此主要挑战是在不完全信息下为MUs找到高效的任务参与策略。为此,提出了一种新颖的完全去中心化联邦深度强化学习算法FDRL-PPO。FDRL-PPO使每个MU能够基于自身经验、可用资源和偏好学习其任务参与策略,而无需依赖关于MCS系统的完美非因果信息。为补充电池电量,MUs依赖能量收集。因此,其可用能量随时间变化,导致可用性变化和学习经验碎片化。为缓解这些挑战,所提方法利用联邦学习,使MUs能够在无需共享私有原始数据(如自身经验)的情况下协同改进其模型。通过仅交换学习到的模型,MUs共同弥补个体局限,并找到更具可扩展性、鲁棒性和高效性的任务参与策略。在合成和真实世界数据集上的全面评估表明,FDRL-PPO在任务完成率、任务完成公平性、能耗和冲突提案数量方面始终优于基准算法。