We recover the underlying 3D structure from images of cartoons and anime depicting the same scene. This is an interesting problem domain because images in creative media are often depicted without explicit geometric consistency for storytelling and creative expression-they are only 3D in a qualitative sense. While humans can easily perceive the underlying 3D scene from these images, existing Structure-from-Motion (SfM) methods that assume 3D consistency fail catastrophically. We present Toon3D for reconstructing geometrically inconsistent images. Our key insight is to deform the input images while recovering camera poses and scene geometry, effectively explaining away geometrical inconsistencies to achieve consistency. This process is guided by the structure inferred from monocular depth predictions. We curate a dataset with multi-view imagery from cartoons and anime that we annotate with reliable sparse correspondences using our user-friendly annotation tool. Our recovered point clouds can be plugged into novel-view synthesis methods to experience cartoons from viewpoints never drawn before. We evaluate against classical and recent learning-based SfM methods, where Toon3D is able to obtain more reliable camera poses and scene geometry.
翻译:我们从描绘同一场景的卡通与动漫图像中恢复其底层三维结构。这是一个有趣的领域,因为创意媒体中的图像常为叙事与艺术表达而描绘,缺乏明确的几何一致性——它们仅在定性意义上是三维的。尽管人类能轻易从这些图像中感知底层三维场景,但依赖三维一致性假设的传统运动恢复结构方法会完全失效。我们提出Toon3D,用于重建几何不一致的图像。我们的核心思路是在恢复相机姿态与场景几何的同时对输入图像进行形变,从而有效解释几何不一致性以实现一致性。该过程由单目深度预测所推断的结构进行引导。我们构建了一个包含卡通与动漫多视角图像的数据集,并使用我们开发的用户友好标注工具为其标注了可靠的稀疏对应点。恢复得到的点云可直接应用于新视角合成方法,使观众得以体验从未被绘制过的卡通视角。通过与经典及近期基于学习的运动恢复结构方法进行比较,Toon3D能够获得更可靠的相机姿态与场景几何。