Markerless multiview motion capture is often constrained by the need for precise camera calibration, limiting accessibility for non-experts and in-the-wild captures. Existing calibration-free approaches mitigate this requirement but suffer from high computational cost and reduced reconstruction accuracy. We present Kineo, a fully automatic, calibration-free pipeline for markerless motion capture from videos captured by unsynchronized, uncalibrated, consumer-grade RGB cameras. Kineo leverages 2D keypoints from off-the-shelf detectors to simultaneously calibrate cameras, including Brown-Conrady distortion coefficients, and reconstruct 3D keypoints and dense scene point maps at metric scale. A confidence-driven spatio-temporal keypoint sampling strategy, combined with graph-based global optimization, ensures robust calibration at a fixed computational cost independent of sequence length. We further introduce a pairwise reprojection consensus score to quantify 3D reconstruction reliability for downstream tasks. Evaluations on EgoHumans and Human3.6M demonstrate substantial improvements over prior calibration-free methods. Compared to previous state-of-the-art approaches, Kineo reduces camera translation error by approximately 83-85%, camera angular error by 86-92%, and world mean-per-joint error (W-MPJPE) by 83-91%. Kineo is also efficient in real-world scenarios, processing multi-view sequences faster than their duration in specific configuration (e.g., 36min to process 1h20min of footage). The full pipeline and evaluation code are openly released to promote reproducibility and practical adoption at https://liris-xr.github.io/kineo/.
翻译:无标记多视角运动捕捉通常受限于对精确相机标定的需求,这限制了非专业用户及野外场景下的可访问性。现有的免标定方法虽缓解了这一要求,但存在计算成本高、重建精度降低的问题。我们提出了Kineo,一种完全自动化的免标定流程,用于从非同步、未标定的消费级RGB相机拍摄的视频中进行无标记运动捕捉。Kineo利用现成检测器提取的2D关键点,同时标定相机(包括Brown-Conrady畸变系数)并以度量尺度重建3D关键点及稠密场景点云图。通过置信度驱动的时空关键点采样策略,结合基于图的全局优化,确保了在固定计算成本(与序列长度无关)下的鲁棒标定。我们进一步引入了成对重投影一致性评分,以量化下游任务中3D重建的可靠性。在EgoHumans和Human3.6M数据集上的评估表明,Kineo相较于先前的免标定方法有显著提升:与之前的最优方法相比,Kineo将相机平移误差降低了约83-85%,相机角度误差降低了86-92%,世界平均关节误差(W-MPJPE)降低了83-91%。Kineo在实际场景中也表现出高效性,其处理多视角序列的速度在特定配置下快于序列时长(例如,处理1小时20分钟的素材仅需36分钟)。完整流程及评估代码已公开发布于https://liris-xr.github.io/kineo/,以促进可复现性及实际应用。