Accurately tracking camera intrinsics is crucial for achieving 3D understanding from 2D video. However, most 3D algorithms assume that camera intrinsics stay constant throughout a video, which is often not true for many real-world in-the-wild videos. A major obstacle in this field is a lack of dynamic camera intrinsics benchmarks--existing benchmarks typically offer limited diversity in scene content and intrinsics variation, and none provide per-frame intrinsic changes for consecutive video frames. In this paper, we present Intrinsics in Flux (InFlux), a real-world benchmark that provides per-frame ground truth intrinsics annotations for videos with dynamic intrinsics. Compared to prior benchmarks, InFlux captures a wider range of intrinsic variations and scene diversity, featuring 143K+ annotated frames from 386 high-resolution indoor and outdoor videos with dynamic camera intrinsics. To ensure accurate per-frame intrinsics, we build a comprehensive lookup table of calibration experiments and extend the Kalibr toolbox to improve its accuracy and robustness. Using our benchmark, we evaluate existing baseline methods for predicting camera intrinsics and find that most struggle to achieve accurate predictions on videos with dynamic intrinsics. For the dataset, code, videos, and submission, please visit https://influx.cs.princeton.edu/.
翻译:精确追踪相机内参对于从二维视频实现三维理解至关重要。然而,大多数三维算法假设相机内参在整个视频中保持恒定,这对于许多真实世界非受控视频而言通常不成立。该领域的主要障碍是缺乏动态相机内参基准测试集——现有基准测试集通常在场景内容和内参变化方面多样性有限,且均未提供连续视频帧的逐帧内参变化。本文提出动态内参基准测试集(InFlux),这是一个提供动态内参视频逐帧真实内参标注的真实世界基准测试集。与先前基准测试集相比,InFlux捕捉了更广泛的内参变化和场景多样性,包含来自386个具有动态相机内参的高分辨率室内外视频的超过14.3万标注帧。为确保精确的逐帧内参,我们构建了完整的标定实验查找表,并扩展Kalibr工具箱以提升其精度与鲁棒性。基于本基准测试集,我们评估了现有预测相机内参的基线方法,发现大多数方法难以在动态内参视频上实现精确预测。数据集、代码、视频及提交入口请访问 https://influx.cs.princeton.edu/。