Anything in Any Scene: Photorealistic Video Object Insertion

Chen Bai,Zeman Shao,Guoxiang Zhang,Di Liang,Jie Yang,Zhuorui Zhang,Yujian Guo,Chengzhang Zhong,Yiqiao Qiu,Zhendong Wang,Yichen Guan,Xiaoyin Zheng,Tao Wang,Cheng Lu

Realistic video simulation has shown significant potential across diverse applications, from virtual reality to film production. This is particularly true for scenarios where capturing videos in real-world settings is either impractical or expensive. Existing approaches in video simulation often fail to accurately model the lighting environment, represent the object geometry, or achieve high levels of photorealism. In this paper, we propose Anything in Any Scene, a novel and generic framework for realistic video simulation that seamlessly inserts any object into an existing dynamic video with a strong emphasis on physical realism. Our proposed general framework encompasses three key processes: 1) integrating a realistic object into a given scene video with proper placement to ensure geometric realism; 2) estimating the sky and environmental lighting distribution and simulating realistic shadows to enhance the light realism; 3) employing a style transfer network that refines the final video output to maximize photorealism. We experimentally demonstrate that Anything in Any Scene framework produces simulated videos of great geometric realism, lighting realism, and photorealism. By significantly mitigating the challenges associated with video data generation, our framework offers an efficient and cost-effective solution for acquiring high-quality videos. Furthermore, its applications extend well beyond video data augmentation, showing promising potential in virtual reality, video editing, and various other video-centric applications. Please check our project website https://anythinginanyscene.github.io for access to our project code and more high-resolution video results.

翻译：真实的视频模拟在从虚拟现实到电影制作的众多应用中展现出巨大潜力，尤其适用于在现实场景中拍摄视频不可行或成本过高的情境。现有的视频模拟方法通常难以准确建模光照环境、表示物体几何形态或实现高水平的照片级真实感。本文提出“任意场景中的任意物体”（Anything in Any Scene），一种新颖且通用的真实视频模拟框架，能够将任意物体无缝插入现有动态视频中，并高度强调物理真实性。我们提出的通用框架包含三个关键过程：1）将真实物体以合适的位置整合到给定场景视频中，确保几何真实性；2）估计天空和环境光照分布并模拟真实阴影，以增强光照真实性；3）采用风格迁移网络优化最终视频输出，最大化照片级真实感。实验表明，“任意场景中的任意物体”框架生成的模拟视频具有出色的几何真实性、光照真实性和照片级真实感。该框架显著缓解了视频数据生成中的挑战，为获取高质量视频提供了高效且成本低廉的方案。此外，其应用不仅限于视频数据增强，在虚拟现实、视频编辑及其他多种以视频为中心的领域也展现出广阔潜力。项目代码与更多高分辨率视频结果请访问我们的项目网站：https://anythinginanyscene.github.io。