Point tracking aims to follow visual points through complex motion, occlusion, and viewpoint changes, and has advanced rapidly with modern foundation models. Yet progress toward general point tracking remains constrained by limited high-quality data, as existing datasets often provide insufficient diversity and imperfect trajectory annotations. To this end, we introduce SynthVerse, a large-scale, diverse synthetic dataset specifically designed for point tracking. SynthVerse includes several new domains and object types missing from existing synthetic datasets, such as animated-film-style content, embodied manipulation, scene navigation, and articulated objects. SynthVerse substantially expands dataset diversity by covering a broader range of object categories and providing high-quality dynamic motions and interactions, enabling more robust training and evaluation for general point tracking. In addition, we establish a highly diverse point tracking benchmark to systematically evaluate state-of-the-art methods under broader domain shifts. Extensive experiments and analyses demonstrate that training with SynthVerse yields consistent improvements in generalization and reveal limitations of existing trackers under diverse settings.
翻译:点跟踪旨在通过复杂运动、遮挡和视角变化追踪视觉点,并随着现代基础模型的快速发展而迅速进步。然而,通用点跟踪的进展仍受限于高质量数据的匮乏,现有数据集通常缺乏足够的多样性且轨迹标注不够完善。为此,我们提出了SynthVerse,一个专门为点跟踪设计的大规模、多样化合成数据集。SynthVerse包含了现有合成数据集中缺失的多个新领域和对象类型,例如动画电影风格内容、具身操控、场景导航以及铰接式对象。通过覆盖更广泛的对象类别并提供高质量动态运动与交互,SynthVerse显著扩展了数据集的多样性,从而为通用点跟踪提供更鲁棒的训练与评估。此外,我们建立了一个高度多样化的点跟踪基准,以系统评估最先进方法在更广泛域偏移下的性能。大量实验与分析表明,使用SynthVerse进行训练能持续提升泛化性能,并揭示了现有跟踪器在多样化场景下的局限性。