In daily life, humans utilize hands to manipulate objects. Modeling the shape of objects that are manipulated by the hand is essential for AI to comprehend daily tasks and to learn manipulation skills. However, previous approaches have encountered difficulties in reconstructing the precise shapes of hand-held objects, primarily owing to a deficiency in prior shape knowledge and inadequate data for training. As illustrated, given a particular type of tool, such as a mug, despite its infinite variations in shape and appearance, humans have a limited number of 'effective' modes and poses for its manipulation. This can be attributed to the fact that humans have mastered the shape prior of the 'mug' category, and can quickly establish the corresponding relations between different mug instances and the prior, such as where the rim and handle are located. In light of this, we propose a new method, CHORD, for Category-level Hand-held Object Reconstruction via shape Deformation. CHORD deforms a categorical shape prior for reconstructing the intra-class objects. To ensure accurate reconstruction, we empower CHORD with three types of awareness: appearance, shape, and interacting pose. In addition, we have constructed a new dataset, COMIC, of category-level hand-object interaction. COMIC contains a rich array of object instances, materials, hand interactions, and viewing directions. Extensive evaluation shows that CHORD outperforms state-of-the-art approaches in both quantitative and qualitative measures. Code, model, and datasets are available at https://kailinli.github.io/CHORD.
翻译:日常生活中,人类通过双手操控物体。建模被手操控的物体形状,对于人工智能理解日常任务并学习操控技能至关重要。然而,先前的方法在重建手持物体的精确形状方面遭遇困难,主要源于形状先验知识的缺乏以及训练数据的不足。如图所示,给定特定类型的工具(如杯子),尽管其形状和外观存在无限变化,人类对其操控的“有效”模式与姿态数量却有限。这归因于人类掌握了“杯子”类别的形状先验,并能快速建立不同杯子实例与先验之间的对应关系(例如杯沿和手柄的位置)。基于此,我们提出一种新方法CHORD,即基于形状变形的类别级手持物体重建。CHORD通过变形类别形状先验来重建类内物体。为确保重建的准确性,我们赋予CHORD三种感知能力:外观感知、形状感知和交互姿态感知。此外,我们构建了一个新的类别级手物交互数据集COMIC,其中包含丰富的物体实例、材质、手部交互以及视角。大量评估表明,CHORD在定量与定性指标上均优于现有最先进方法。代码、模型和数据集已开源至https://kailinli.github.io/CHORD。