When Should a Robot Think? Resource-Aware Reasoning via Reinforcement Learning for Embodied Robotic Decision-Making

Jun Liu,Pu Zhao,Zhenglun Kong,Xuan Shen,Peiyan Dong,Fan Yang,Lin Cui,Hao Tang,Geng Yuan,Wei Niu,Wenbin Zhang,Xue Lin,Gaowen Liu,Yanzhi Wang,Dong Huang

Embodied robotic systems increasingly rely on large language model (LLM)-based agents to support high-level reasoning, planning, and decision-making during interactions with the environment. However, invoking LLM reasoning introduces substantial computational latency and resource overhead, which can interrupt action execution and reduce system reliability. Excessive reasoning may delay actions, while insufficient reasoning often leads to incorrect decisions and task failures. This raises a fundamental question for embodied agents: when should the agent reason, and when should it act? In this work, we propose RARRL (Resource-Aware Reasoning via Reinforcement Learning), a hierarchical framework for resource-aware orchestration of embodied agents. Rather than learning low-level control policies, RARRL learns a high-level orchestration policy that operates at the agent's decision-making layer. This policy enables the agent to adaptively determine whether to invoke reasoning, which reasoning role to employ, and how much computational budget to allocate based on current observations, execution history, and remaining resources. Extensive experiments, including evaluations with empirical latency profiles derived from the ALFRED benchmark, show that RARRL consistently improves task success rates while reducing execution latency and enhancing robustness compared with fixed or heuristic reasoning strategies. These results demonstrate that adaptive reasoning control is essential for building reliable and efficient embodied robotic agents.

翻译：具身机器人系统越来越依赖基于大语言模型的智能体来支持与环境交互过程中的高层推理、规划与决策。然而，调用大语言模型推理会引入显著的计算延迟和资源开销，这可能中断动作执行并降低系统可靠性。过度推理会延迟动作执行，而推理不足又常导致错误决策和任务失败。这引出了具身智能体的一个根本性问题：智能体何时应该推理，何时应该行动？在本工作中，我们提出了RARRL（基于强化学习的资源感知推理）——一个面向具身智能体资源感知编排的层次化框架。RARRL不学习底层控制策略，而是学习一个作用于智能体决策层的高层编排策略。该策略使智能体能够根据当前观测、执行历史和剩余资源，自适应地决定是否调用推理、采用何种推理角色以及分配多少计算预算。通过在ALFRED基准测试中结合经验延迟模型进行的广泛实验表明，与固定或启发式推理策略相比，RARRL在降低执行延迟并增强鲁棒性的同时，持续提升了任务成功率。这些结果表明，自适应推理控制对于构建可靠高效的具身机器人智能体至关重要。