We introduce the ObjectFolder Benchmark, a benchmark suite of 10 tasks for multisensory object-centric learning, centered around object recognition, reconstruction, and manipulation with sight, sound, and touch. We also introduce the ObjectFolder Real dataset, including the multisensory measurements for 100 real-world household objects, building upon a newly designed pipeline for collecting the 3D meshes, videos, impact sounds, and tactile readings of real-world objects. We conduct systematic benchmarking on both the 1,000 multisensory neural objects from ObjectFolder, and the real multisensory data from ObjectFolder Real. Our results demonstrate the importance of multisensory perception and reveal the respective roles of vision, audio, and touch for different object-centric learning tasks. By publicly releasing our dataset and benchmark suite, we hope to catalyze and enable new research in multisensory object-centric learning in computer vision, robotics, and beyond. Project page: https://objectfolder.stanford.edu
翻译:我们提出了ObjectFolder基准测试套件,包含10项面向多感官物体中心学习的任务,聚焦于通过视觉、听觉和触觉进行物体识别、重建与操控。同时,我们构建了ObjectFolder Real数据集,收录100个真实世界日用物体的多感官测量数据,该数据集基于新设计的采集流水线,可同步获取真实物体的三维网格、视频、碰撞声音及触觉读数。我们系统性地在ObjectFolder的1000个多感官神经物体与ObjectFolder Real的真实多感官数据上开展了基准评估。结果表明多感官感知的重要性,并揭示了视觉、听觉与触觉在不同物体中心学习任务中的各自作用。通过公开发布数据集与基准测试套件,我们期望推动计算机视觉、机器人学等领域在物体中心多感官学习方向的新研究。项目主页:https://objectfolder.stanford.edu