Combining gradient-based trajectory optimization with differentiable physics simulation is an efficient technique for solving soft-body manipulation problems. Using a well-crafted optimization objective, the solver can quickly converge onto a valid trajectory. However, writing the appropriate objective functions requires expert knowledge, making it difficult to collect a large set of naturalistic problems from non-expert users. We introduce DiffVL, a method that enables non-expert users to communicate soft-body manipulation tasks -- a combination of vision and natural language, given in multiple stages -- that can be readily leveraged by a differential physics solver. We have developed GUI tools that enable non-expert users to specify 100 tasks inspired by real-life soft-body manipulations from online videos, which we'll make public. We leverage large language models to translate task descriptions into machine-interpretable optimization objectives. The optimization objectives can help differentiable physics solvers to solve these long-horizon multistage tasks that are challenging for previous baselines.
翻译:结合基于梯度的轨迹优化与可微物理模拟是解决软体操控问题的高效技术。通过精心设计的优化目标,求解器能快速收敛至有效轨迹。然而,编写合适的优化目标需要专家知识,这使得从非专家用户处收集大量自然问题变得困难。我们提出DiffVL方法——该方法使非专家用户能够以多阶段视觉与自然语言组合形式描述软体操控任务,并可直接被可微物理求解器利用。我们开发了图形界面工具,使非专家用户能够基于在线视频中的真实软体操控案例,指定100项任务(该数据集将公开)。我们利用大语言模型将任务描述转化为机器可理解的优化目标。这些优化目标能帮助可微物理求解器解决此前基线方法难以处理的多阶段长周期任务。