Realistic visual simulation of food manipulation requires accurate material parameters, yet these are difficult to measure directly and vary across the heterogeneous regions of a single food item. We address the inverse problem of estimating material parameters from a target description of fracture behavior in a non-differentiable continuum damage mechanics simulator. Using orange peeling as a test case, we train a neural surrogate on 2,000 forward simulations and compare Covariance Matrix Adaptation Evolution Strategy (CMA-ES, a gradient-free evolutionary optimizer) with Proximal Policy Optimization (PPO, a reinforcement learning algorithm) across the original 9-dimensional parameter space and two learned 4-dimensional latent representations. Since different oranges have different material properties, a practical inverse system must handle arbitrary targets without retraining. We train a goal-conditioned PPO policy that learns a general inverse mapping: given any target description of peeling behavior, the policy produces a material parameter estimate in a single forward pass (8 surrogate evaluations, approximately 10ms). Operating in a normalizing flow latent space with a shared surrogate evaluator, the goal-conditioned policy achieves 0.642 actual recovery when validated through the simulator, outperforming the original parameter space by 23%. A warm-start extension that initializes CMA-ES refinement from the policy's output further improves recovery to 0.828 with 540 evaluations. These findings provide a practical framework for inverse food physics and lay groundwork for vision-driven material identification from video observations of food manipulation.
翻译:食物操作的真实视觉模拟需要准确的材料参数,然而这些参数难以直接测量,并且在单个食品的不同异质区域中变化很大。我们解决了在不可微分的连续损伤力学模拟器中,根据断裂行为的目标准则估计材料参数的反演问题。以橘子剥皮为测试案例,我们在2000次正向模拟上训练了一个神经代理模型,并比较了协方差矩阵自适应进化策略(CMA-ES,一种无梯度进化优化器)与近端策略优化(PPO,一种强化学习算法)在原始9维参数空间和两个学习到的4维潜空间表示上的表现。由于不同橘子具有不同的材料属性,一个实用的反演系统必须能在无需重新训练的情况下处理任意目标。我们训练了一个目标条件化的PPO策略,该策略学习通用的逆映射:给定任意剥皮行为的目标准则,该策略通过一次正向传播(8次代理模型评估,约10毫秒)生成材料参数估计。在共享代理模型评估器的归一化流潜空间中运行时,目标条件化策略通过模拟器验证的实际恢复率达到0.642,相比原始参数空间提升23%。一种热启动扩展方法,即从策略输出初始化CMA-ES精细化优化,将恢复率进一步提升至0.828(共540次评估)。这些发现为反演食品物理学提供了实用框架,并为从视频观测食物操作中进行视觉驱动的材料识别奠定了基础。