We introduce NIFT, Neural Interaction Field and Template, a descriptive and robust interaction representation of object manipulations to facilitate imitation learning. Given a few object manipulation demos, NIFT guides the generation of the interaction imitation for a new object instance by matching the Neural Interaction Template (NIT) extracted from the demos in the target Neural Interaction Field (NIF) defined for the new object. Specifically, the NIF is a neural field that encodes the relationship between each spatial point and a given object, where the relative position is defined by a spherical distance function rather than occupancies or signed distances, which are commonly adopted by conventional neural fields but less informative. For a given demo interaction, the corresponding NIT is defined by a set of spatial points sampled in the demo NIF with associated neural features. To better capture the interaction, the points are sampled on the Interaction Bisector Surface (IBS), which consists of points that are equidistant to the two interacting objects and has been used extensively for interaction representation. With both point selection and pointwise features defined for better interaction encoding, NIT effectively guides the feature matching in the NIFs of the new object instances such that the relative poses are optimized to realize the manipulation while imitating the demo interactions. Experiments show that our NIFT solution outperforms state-of-the-art imitation learning methods for object manipulation and generalizes better to objects from new categories.
翻译:我们提出NIFT(神经交互场与模板),这是一种用于物体操作的描述性且鲁棒的交互表示,旨在促进模仿学习。给定少量物体操作演示,NIFT通过将演示中提取的神经交互模板(NIT)与目标神经交互场(NIF)进行匹配,引导新物体实例的交互模仿生成,其中目标NI由新物体定义。具体而言,NIF是一种神经场,用于编码每个空间点与给定物体之间的关系,其相对位置通过球面距离函数而非传统神经场常用的占据距离或有符号距离来定义——后者信息量相对不足。对于给定的演示交互,相应的NIT由在演示NIF中采样的空间点集及其关联的神经特征定义。为更有效地捕捉交互行为,这些点基于交互平分面(IBS)进行采样,该曲面由与两个交互物体等距的点构成,已被广泛用于交互表示。通过为更优的交互编码而设计的点选取与逐点特征定义,NIT有效指导新物体实例NIF中的特征匹配,从而优化相对姿态以实现操作,同时模仿演示交互。实验表明,我们的NIFT方案在物体操作任务的模仿学习上优于当前最先进方法,且对来自新类别的物体具有更强的泛化能力。