Food cutting is a highly practical yet underexplored application at the intersection of vision and robotic manipulation. The task remains challenging because interactions between the knife and deformable materials are highly nonlinear and often entail large deformations, frequent contact, and topological change, which in turn hinder stable and safe large-scale data collection. To address these challenges, we propose a unified framework that couples a vision-language-action (VLA) dataset with a physically realistic cutting simulator built on the material point method (MPM). Our simulator adopts MLS-MPM as its computational core, reducing numerical dissipation and energy drift while preserving rotational and shear responses even under topology-changing cuts. During cutting, forces and stress distributions are estimated from impulse exchanges between particles and the grid, enabling stable tracking of transient contact forces and energy transfer. We also provide a benchmark dataset that integrates diverse cutting trajectories, multi-view visual observations, and fine-grained language instructions, together with force--torque and tool--pose labels to provide physically consistent training signals. These components realize a learning--evaluation loop that respects the core physics of cutting and establishes a safe, reproducible, and scalable foundation for advancing VLA models in deformable object manipulation.
翻译:食物切割是视觉与机器人操作交叉领域中一项极具实用性但尚未得到充分探索的应用。该任务仍具挑战性,因为刀具与可变形材料之间的相互作用具有高度非线性,通常涉及大变形、频繁接触及拓扑变化,从而阻碍了稳定、安全的大规模数据采集。为应对这些挑战,我们提出一个统一框架,将视觉-语言-动作数据集与基于材料点方法构建的物理真实切割模拟器相耦合。我们的模拟器采用MLS-MPM作为计算核心,在保持旋转和剪切响应的同时减少数值耗散与能量漂移,即使在进行拓扑变化的切割时亦然。切割过程中,通过粒子与网格间的冲量交换估算力与应力分布,从而实现对瞬态接触力与能量传递的稳定追踪。我们还提供了一个基准数据集,该数据集整合了多样化的切割轨迹、多视角视觉观测、细粒度语言指令,以及力-扭矩和工具-位姿标签,以提供物理一致的训练信号。这些组件共同实现了一个尊重切割核心物理规律的学习-评估闭环,为推进可变形物体操控中的VLA模型奠定了安全、可复现且可扩展的基础。