Generative models, like flows and diffusions, have recently emerged as popular and efficacious policy parameterizations in robotics. There has been much speculation as to the factors underlying their successes, ranging from capturing multi-modal action distribution to expressing more complex behaviors. In this work, we perform a comprehensive evaluation of popular generative control policies (GCPs) on common behavior cloning (BC) benchmarks. We find that GCPs do not owe their success to their ability to capture multi-modality or to express more complex observation-to-action mappings. Instead, we find that their advantage stems from iterative computation, as long as intermediate steps are supervised during training and this supervision is paired with a suitable level of stochasticity. As a validation of our findings, we show that a minimum iterative policy (MIP), a lightweight two-step regression-based policy, essentially matches the performance of flow GCPs, and often outperforms distilled shortcut models. Our results suggest that the distribution-fitting component of GCPs is less salient than commonly believed, and point toward new design spaces focusing solely on control performance. Project page: https://simchowitzlabpublic.github.io/much-ado-about-noising-project/
翻译:生成模型,如流模型和扩散模型,近年来已成为机器人学中流行且有效的策略参数化方法。关于其成功背后的因素存在诸多推测,包括捕捉多模态动作分布到表达更复杂行为等。在本研究中,我们对常见行为克隆基准测试中的主流生成式控制策略进行了全面评估。我们发现,GCPs的成功并非源于其捕捉多模态的能力,也非源于表达更复杂的观测-动作映射能力。相反,其优势源于迭代计算机制——前提是训练过程中对中间步骤进行监督,且这种监督需与适当水平的随机性相结合。为验证这一发现,我们证明最小迭代策略——一种轻量级的两步回归策略,其性能基本与流式GCPs持平,且常优于蒸馏后的捷径模型。我们的研究结果表明,GCPs的分布拟合组件的重要性低于普遍认知,这为专注于控制性能的新设计空间指明了方向。项目页面:https://simchowitzlabpublic.github.io/much-ado-about-noising-project/