In this paper, we address the problem of designing incentive mechanisms by a virtual service provider (VSP) to hire sensing IoT devices to sell their sensing data to help creating and rendering the digital copy of the physical world in the Metaverse. Due to the limited bandwidth, we propose to use semantic extraction algorithms to reduce the delivered data by the sensing IoT devices. Nevertheless, mechanisms to hire sensing IoT devices to share their data with the VSP and then deliver the constructed digital twin to the Metaverse users are vulnerable to adverse selection problem. The adverse selection problem, which is caused by information asymmetry between the system entities, becomes harder to solve when the private information of the different entities are multi-dimensional. We propose a novel iterative contract design and use a new variant of multi-agent reinforcement learning (MARL) to solve the modelled multi-dimensional contract problem. To demonstrate the effectiveness of our algorithm, we conduct extensive simulations and measure several key performance metrics of the contract for the Metaverse. Our results show that our designed iterative contract is able to incentivize the participants to interact truthfully, which maximizes the profit of the VSP with minimal individual rationality (IR) and incentive compatibility (IC) violation rates. Furthermore, the proposed learning-based iterative contract framework has limited access to the private information of the participants, which is to the best of our knowledge, the first of its kind in addressing the problem of adverse selection in incentive mechanisms.
翻译:本文研究了虚拟服务提供商(VSP)设计激励机制以雇佣感知物联网设备出售其传感数据,从而帮助创建和渲染元宇宙中物理世界数字副本的问题。针对带宽限制,我们提出使用语义提取算法来减少感知物联网设备传输的数据量。然而,雇佣感知物联网设备与虚拟服务提供商共享数据,并将构建的数字孪生交付给元宇宙用户的机制,容易受到逆向选择问题的影响。逆向选择问题由系统实体之间的信息不对称引起,当不同实体的私有信息是多维时,这一问题变得难以解决。我们提出了一种新颖的迭代契约设计方法,并利用多智能体强化学习(MARL)的新变体来解决建模后的多维契约问题。为展示我们算法的有效性,我们进行了大量仿真实验,并测量了元宇宙契约的若干关键性能指标。结果表明,我们设计的迭代契约能够激励参与者进行真实交互,从而在最小化个体理性(IR)和激励相容(IC)违反率的情况下最大化虚拟服务提供商的利润。此外,所提出的基于学习的迭代契约框架对参与者的私有信息访问有限,据我们所知,这是首个解决激励机制中逆向选择问题的方法。