Deformable scenes violate the rigidity assumptions underpinning classical visual--inertial odometry (VIO), often leading to over-fitting to local non-rigid motion or to severe camera pose drift when deformation dominates visual parallax. In this paper, we introduce DefVINS, the first visual-inertial odometry pipeline designed to operate in deformable environments. Our approach models the odometry state by decomposing it into a rigid, IMU-anchored component and a non-rigid scene warp represented by an embedded deformation graph. As a second contribution, we present VIMandala, the first benchmark containing real images and ground-truth camera poses for visual-inertial odometry in deformable scenes. In addition, we augment the synthetic Drunkard's benchmark with simulated inertial measurements to further evaluate our pipeline under controlled conditions. We also provide an observability analysis of the visual-inertial deformable odometry problem, characterizing how inertial measurements constrain camera motion and render otherwise unobservable modes identifiable in the presence of deformation. This analysis motivates the use of IMU anchoring and leads to a conditioning-based activation strategy that avoids ill-posed updates under poor excitation. Experimental results on both the synthetic Drunkard's and our real VIMandala benchmarks show that DefVINS outperforms rigid visual--inertial and non-rigid visual odometry baselines. Our source code and data will be released upon acceptance.
翻译:可变形场景违背了经典视觉-惯性里程计所依赖的刚性假设,通常导致算法对局部非刚性运动过拟合,或在形变主导视觉视差时产生严重的相机位姿漂移。本文提出DefVINS,首个专为可变形环境设计的视觉-惯性里程计流程。我们的方法通过对状态进行分解来建模里程计:将其分为一个与惯性测量单元锚定的刚性分量和一个由嵌入式形变图表示的非刚性场景扭曲分量。作为第二项贡献,我们提出了VIMandala——首个包含真实图像与真值相机位姿的可变形场景视觉-惯性里程计基准数据集。此外,我们为合成数据集Drunkard's benchmark增加了模拟惯性测量数据,以在受控条件下进一步评估我们的流程。我们还对视觉-惯性可变形里程计问题进行了可观测性分析,阐释了惯性测量如何约束相机运动,并使在形变存在时原本不可观测的模式变得可识别。该分析为惯性测量单元锚定策略提供了理论依据,并催生了一种基于条件数的激活策略,以避免在激励不足时进行病态更新。在合成Drunkard's数据集及我们提出的真实VIMandala基准上的实验结果表明,DefVINS优于刚性视觉-惯性里程计与非刚性视觉里程计基线方法。我们的源代码与数据将在论文录用后公开。