Convenient 4D modeling of human-object interactions is essential for numerous applications. However, monocular tracking and rendering of complex interaction scenarios remain challenging. In this paper, we propose Instant-NVR, a neural approach for instant volumetric human-object tracking and rendering using a single RGBD camera. It bridges traditional non-rigid tracking with recent instant radiance field techniques via a multi-thread tracking-rendering mechanism. In the tracking front-end, we adopt a robust human-object capture scheme to provide sufficient motion priors. We further introduce a separated instant neural representation with a novel hybrid deformation module for the interacting scene. We also provide an on-the-fly reconstruction scheme of the dynamic/static radiance fields via efficient motion-prior searching. Moreover, we introduce an online key frame selection scheme and a rendering-aware refinement strategy to significantly improve the appearance details for online novel-view synthesis. Extensive experiments demonstrate the effectiveness and efficiency of our approach for the instant generation of human-object radiance fields on the fly, notably achieving real-time photo-realistic novel view synthesis under complex human-object interactions.
翻译:便捷的人-物交互四维建模对众多应用至关重要。然而,复杂交互场景的单目追踪与渲染仍具挑战性。本文提出Instant-NVR——一种基于单目RGBD相机的神经方法,用于实现即时体素级人-物追踪与渲染。该方法通过多线程追踪-渲染机制,将传统非刚性追踪与最新即时辐射场技术相衔接。在追踪前端,我们采用鲁棒的人-物捕获方案以提供充足运动先验。针对交互场景,我们进一步引入分离式即时神经表征与新型混合变形模块。同时,通过高效运动先验搜索,提出动态/静态辐射场的即时重建方案。此外,引入在线关键帧选取机制与渲染感知优化策略,显著提升在线新视角合成的外观细节。大量实验证明,本方法在复杂人-物交互场景下即时生成辐射场具有高效性与有效性,尤其实现了实时照片级新视角合成。