Vehicular Metaverses represent emerging paradigms arising from the convergence of vehicle road cooperation, Metaverse, and augmented intelligence of things. Users engaging with Vehicular Metaverses (VMUs) gain entry by consistently updating their Vehicular Twins (VTs), which are deployed on RoadSide Units (RSUs) in proximity. The constrained RSU coverage and the consistently moving vehicles necessitate the continuous migration of VTs between RSUs through vehicle road cooperation, ensuring uninterrupted immersion services for VMUs. Nevertheless, the VT migration process faces challenges in obtaining adequate bandwidth resources from RSUs for timely migration, posing a resource trading problem among RSUs. In this paper, we tackle this challenge by formulating a game-theoretic incentive mechanism with multi-leader multi-follower, incorporating insights from social-awareness and queueing theory to optimize VT migration. To validate the existence and uniqueness of the Stackelberg Equilibrium, we apply the backward induction method. Theoretical solutions for this equilibrium are then obtained through the Alternating Direction Method of Multipliers (ADMM) algorithm. Moreover, owing to incomplete information caused by the requirements for privacy protection, we proposed a multi-agent deep reinforcement learning algorithm named MALPPO. MALPPO facilitates learning the Stackelberg Equilibrium without requiring private information from others, relying solely on past experiences. Comprehensive experimental results demonstrate that our MALPPO-based incentive mechanism outperforms baseline approaches significantly, showcasing rapid convergence and achieving the highest reward.
翻译:车载元宇宙是车路协同、元宇宙与增强智能体融合催生的新兴范式。用户通过持续更新部署在邻近路侧单元(RSU)上的车载孪生(VT)来接入车载元宇宙(VMU)。受限于RSU覆盖范围与车辆持续移动特性,需经由车路协同在RSU间连续迁移VT,以确保为VMU提供无中断的沉浸式服务。然而,VT迁移过程面临从RSU获取充足带宽资源以实现及时迁移的挑战,这构成了RSU间的资源交易问题。本文通过构建融入社会感知与排队论的多领导者-多追随者博弈激励框架,优化VT迁移过程。采用逆向归纳法验证了斯坦伯格均衡的存在性与唯一性,并通过交替方向乘子法(ADMM)求解该均衡的理论解。进一步,针对隐私保护需求导致的信息不完全问题,提出了名为MALPPO的多智能体深度强化学习算法。该算法无需获取他人隐私信息,仅凭历史经验即可学习斯坦伯格均衡。综合实验结果表明,基于MALPPO的激励机制显著优于基线方案,展现出快速收敛特性并取得了最高奖励值。