We introduce PointOdyssey, a large-scale synthetic dataset, and data generation framework, for the training and evaluation of long-term fine-grained tracking algorithms. Our goal is to advance the state-of-the-art by placing emphasis on long videos with naturalistic motion. Toward the goal of naturalism, we animate deformable characters using real-world motion capture data, we build 3D scenes to match the motion capture environments, and we render camera viewpoints using trajectories mined via structure-from-motion on real videos. We create combinatorial diversity by randomizing character appearance, motion profiles, materials, lighting, 3D assets, and atmospheric effects. Our dataset currently includes 104 videos, averaging 2,000 frames long, with orders of magnitude more correspondence annotations than prior work. We show that existing methods can be trained from scratch in our dataset and outperform the published variants. Finally, we introduce modifications to the PIPs point tracking method, greatly widening its temporal receptive field, which improves its performance on PointOdyssey as well as on two real-world benchmarks. Our data and code are publicly available at: https://pointodyssey.com
翻译:本文介绍PointOdyssey——一个大规模合成数据集及其数据生成框架,用于训练和评估长期细粒度跟踪算法。我们的目标是通过强调具有自然运动的长视频推动该领域技术发展。为实现自然运动,我们利用真实动作捕捉数据驱动可变形角色动画,构建与动作捕捉环境匹配的三维场景,并通过真实视频的运动恢复结构(Structure-from-Motion)技术提取相机视角轨迹。通过随机化角色外观、运动轮廓、材质、光照、三维资产及大气效果,我们创造了组合多样性。当前数据集包含104个视频,平均长度为2000帧,对应关系标注数量较先前工作高出数个数量级。实验表明,现有方法在该数据集上从头训练后,性能可超越其已发表版本。最后,我们对PIPs点跟踪方法进行改进,显著扩大其时间感受野,从而在PointOdyssey以及两个真实世界基准测试中提升性能。我们的数据和代码已公开于:https://pointodyssey.com