Energy-Aware Reinforcement Learning for Robotic Manipulation of Articulated Components in Infrastructure Operation and Maintenance

With the growth of intelligent civil infrastructure and smart cities, operation and maintenance (O&M) increasingly requires safe, efficient, and energy-conscious robotic manipulation of articulated components, including access doors, service drawers, and pipeline valves. However, existing robotic approaches either focus primarily on grasping or target object-specific articulated manipulation, and they rarely incorporate explicit actuation energy into multi-objective optimisation, which limits their scalability and suitability for long-term deployment in real O&M settings. Therefore, this paper proposes an articulation-agnostic and energy-aware reinforcement learning framework for robotic manipulation in intelligent infrastructure O&M. The method combines part-guided 3D perception, weighted point sampling, and PointNet-based encoding to obtain a compact geometric representation that generalises across heterogeneous articulated objects. Manipulation is formulated as a Constrained Markov Decision Process (CMDP), in which actuation energy is explicitly modelled and regulated via a Lagrangian-based constrained Soft Actor-Critic scheme. The policy is trained end-to-end under this CMDP formulation, enabling effective articulated-object operation while satisfying a long-horizon energy budget. Experiments on representative O&M tasks demonstrate 16%-30% reductions in energy consumption, 16%-32% fewer steps to success, and consistently high success rates, indicating a scalable and sustainable solution for infrastructure O&M manipulation.

翻译：随着智能土木基础设施和智慧城市的发展，运维（O&M）日益需要安全、高效且具备能量意识的机器人操作，以处理包括检修门、服务抽屉和管道阀门在内的铰接部件。然而，现有机器人方法要么主要聚焦于抓取，要么针对特定物体的铰接操作，且极少将显式驱动能量纳入多目标优化中，这限制了它们在真实运维场景中的可扩展性和长期部署的适用性。因此，本文提出了一种面向智能基础设施运维中机器人操作的、不依赖具体铰接类型且具备能量感知的强化学习框架。该方法结合了部件引导的三维感知、加权点采样和基于PointNet的编码，以获取一个能泛化于多种异构铰接物体的紧凑几何表示。操作被建模为约束马尔可夫决策过程（CMDP），其中驱动能量被显式建模，并通过基于拉格朗日乘子的约束软演员-评论家（SAC）方案进行调控。策略在该CMDP框架下进行端到端训练，从而在满足长时域能量预算的同时实现对铰接物体的有效操作。在代表性运维任务上的实验表明，该方法能降低16%-30%的能量消耗，减少16%-32%的成功操作步数，并保持稳定的高成功率，为基础设施运维操作提供了一种可扩展且可持续的解决方案。