In this paper, we establish a task-oriented cross-system design framework to minimize the required packet rate for timely and accurate modeling of a real-world robotic arm in the Metaverse, where sensing, communication, prediction, control, and rendering are considered. To optimize a scheduling policy and prediction horizons, we design a Constraint Proximal Policy Optimization(C-PPO) algorithm by integrating domain knowledge from relevant systems into the advanced reinforcement learning algorithm, Proximal Policy Optimization(PPO). Specifically, the Jacobian matrix for analyzing the motion of the robotic arm is included in the state of the C-PPO algorithm, and the Conditional Value-at-Risk(CVaR) of the state-value function characterizing the long-term modeling error is adopted in the constraint. Besides, the policy is represented by a two-branch neural network determining the scheduling policy and the prediction horizons, respectively. To evaluate our algorithm, we build a prototype including a real-world robotic arm and its digital model in the Metaverse. The experimental results indicate that domain knowledge helps to reduce the convergence time and the required packet rate by up to 50%, and the cross-system design framework outperforms a baseline framework in terms of the required packet rate and the tail distribution of the modeling error.
翻译:本文建立了一个任务导向的跨系统设计框架,旨在最小化元宇宙中对真实世界机械臂进行及时准确建模所需的包速率,其中考虑了感知、通信、预测、控制与渲染环节。为优化调度策略与预测时域,我们设计了一种约束近端策略优化(C-PPO)算法,该算法将相关系统的领域知识融入先进强化学习算法——近端策略优化(PPO)中。具体而言,C-PPO算法状态中包含用于分析机械臂运动的雅可比矩阵,并采用表征长期建模误差的状态值函数的条件风险价值(CVaR)作为约束条件。此外,策略由双分支神经网络表示,分别确定调度策略与预测时域。为评估算法性能,我们搭建了包含真实世界机械臂及其元宇宙数字模型的原型系统。实验结果表明,领域知识有助于将收敛时间与所需包速率降低最高50%,且跨系统设计框架在所需包速率与建模误差尾部分布方面均优于基准框架。