We propose a method that enables large language models (LLMs) to control embodied agents through the generation of control policies that directly map continuous observation vectors to continuous action vectors. At the outset, the LLMs generate a control strategy based on a textual description of the agent, its environment, and the intended goal. This strategy is then iteratively refined through a learning process in which the LLMs are repeatedly prompted to improve the current strategy, using performance feedback and sensory-motor data collected during its evaluation. The method is validated on classic control tasks from the Gymnasium library and the inverted pendulum task from the MuJoCo library. The approach proves effective with relatively compact models such as GPT-oss:120b and Qwen2.5:72b. In most cases, it successfully identifies optimal or near-optimal solutions by integrating symbolic knowledge derived through reasoning with sub-symbolic sensory-motor data gathered as the agent interacts with its environment.


翻译:我们提出了一种方法,使大语言模型(LLMs)能够通过生成直接将连续观测向量映射到连续动作向量的控制策略来控制具身智能体。初始阶段,LLMs根据对智能体、其环境及预期目标的文本描述生成控制策略。随后,该策略通过一个学习过程进行迭代优化:在此过程中,LLMs被反复提示以改进当前策略,利用在策略评估期间收集的性能反馈和感知-运动数据。该方法在Gymnasium库的经典控制任务和MuJoCo库的倒立摆任务上得到验证。该方法在相对紧凑的模型(如GPT-oss:120b和Qwen2.5:72b)上被证明是有效的。在大多数情况下,通过整合通过推理获得的符号知识与智能体与环境交互时收集的子符号感知-运动数据,该方法成功找到了最优或接近最优的解决方案。

0
下载
关闭预览

相关内容

Top
微信扫码咨询专知VIP会员