The intimate entanglement between objects affordances and human poses is of large interest, among others, for behavioural sciences, cognitive psychology, and Computer Vision communities. In recent years, the latter has developed several object-centric approaches: starting from items, learning pipelines synthesizing human poses and dynamics in a realistic way, satisfying both geometrical and functional expectations. However, the inverse perspective is significantly less explored: Can we infer 3D objects and their poses from human interactions alone? Our investigation follows this direction, showing that a generic 3D human point cloud is enough to pop up an unobserved object, even when the user is just imitating a functionality (e.g., looking through a binocular) without involving a tangible counterpart. We validate our method qualitatively and quantitatively, with synthetic data and sequences acquired for the task, showing applicability for XR/VR. The code is available at https://github.com/ptrvilya/object-popup.
翻译:物体可供性与人体姿态之间的紧密关联对行为科学、认知心理学和计算机视觉等领域具有重要研究价值。近年来,计算机视觉领域已发展出多种以物体为中心的方法:从物品出发,通过学习管道能以符合几何与功能期望的方式逼真合成人体姿态与动态。然而,反向视角的研究却显著不足:我们能否仅凭人类交互推断出三维物体及其姿态?本研究正是沿着这一方向展开探索,证明即使仅使用通用三维人体点云,也能"弹现"出未观测到的物体——即便当用户仅模仿某项功能(如做望远镜观望状)而未接触实物时亦然。我们通过合成数据及为任务采集的序列进行了定性与定量验证,展示了该方法在扩展现实/虚拟现实中的适用性。代码开源地址:https://github.com/ptrvilya/object-popup。