EBT-Policy：能量模型解锁涌现的物理推理能力 (EBT-Policy: Energy Unlocks Emergent Physical Reasoning Capabilities)

Implicit policies parameterized by generative models, such as Diffusion Policy, have become the standard for policy learning and Vision-Language-Action (VLA) models in robotics. However, these approaches often suffer from high computational cost, exposure bias, and unstable inference dynamics, which lead to divergence under distribution shifts. Energy-Based Models (EBMs) address these issues by learning energy landscapes end-to-end and modeling equilibrium dynamics, offering improved robustness and reduced exposure bias. Yet, policies parameterized by EBMs have historically struggled to scale effectively. Recent work on Energy-Based Transformers (EBTs) demonstrates the scalability of EBMs to high-dimensional spaces, but their potential for solving core challenges in physically embodied models remains underexplored. We introduce a new energy-based architecture, EBT-Policy, that solves core issues in robotic and real-world settings. Across simulated and real-world tasks, EBT-Policy consistently outperforms diffusion-based policies, while requiring less training and inference computation. Remarkably, on some tasks it converges within just two inference steps, a 50x reduction compared to Diffusion Policy's 100. Moreover, EBT-Policy exhibits emergent capabilities not seen in prior models, such as zero-shot recovery from failed action sequences using only behavior cloning and without explicit retry training. By leveraging its scalar energy for uncertainty-aware inference and dynamic compute allocation, EBT-Policy offers a promising path toward robust, generalizable robot behavior under distribution shifts.

翻译：由生成模型参数化的隐式策略，如扩散策略（Diffusion Policy），已成为机器人学中策略学习和视觉-语言-动作（VLA）模型的标准方法。然而，这些方法通常存在计算成本高、暴露偏差以及推理动态不稳定等问题，导致在分布偏移下出现发散。基于能量的模型（EBMs）通过端到端学习能量景观并建模平衡动态，解决了这些问题，提供了更强的鲁棒性和更低的暴露偏差。然而，由EBMs参数化的策略在历史上一直难以有效扩展。近期关于基于能量的Transformer（EBTs）的研究证明了EBMs在高维空间中的可扩展性，但其在解决物理实体模型核心挑战方面的潜力仍未得到充分探索。我们提出了一种新的基于能量的架构——EBT-Policy，它解决了机器人和现实世界设置中的核心问题。在模拟和现实世界任务中，EBT-Policy始终优于基于扩散的策略，同时需要更少的训练和推理计算。值得注意的是，在某些任务中，它仅需两次推理步骤即可收敛，相比扩散策略的100步减少了50倍。此外，EBT-Policy展现出先前模型中未见的涌现能力，例如仅通过行为克隆、无需显式重试训练即可实现失败动作序列的零样本恢复。通过利用其标量能量进行不确定性感知推理和动态计算分配，EBT-Policy为在分布偏移下实现鲁棒、可泛化的机器人行为提供了一条有前景的路径。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日