Domain gap between synthetic and real data in visual regression (\eg 6D pose estimation) is bridged in this paper via global feature alignment and local refinement on the coarse classification of discretized anchor classes in target space, which imposes a piece-wise target manifold regularization into domain-invariant representation learning. Specifically, our method incorporates an explicit self-supervised manifold regularization, revealing consistent cumulative target dependency across domains, to a self-training scheme (\eg the popular Self-Paced Self-Training) to encourage more discriminative transferable representations of regression tasks. Moreover, learning unified implicit neural functions to estimate relative direction and distance of targets to their nearest class bins aims to refine target classification predictions, which can gain robust performance against inconsistent feature scaling sensitive to UDA regressors. Experiment results on three public benchmarks of the challenging 6D pose estimation task can verify the effectiveness of our method, consistently achieving superior performance to the state-of-the-art for UDA on 6D pose estimation.
翻译:本文通过全局特征对齐与目标空间中离散化锚定类别的粗分类局部细化,弥合了视觉回归(如6D姿态估计)中合成数据与真实数据之间的域差距,从而在域不变表示学习中引入了一种分段目标流形正则化。具体而言,我们的方法将一种显式的自监督流形正则化(揭示跨域一致的累积目标依赖性)融入自训练框架(如流行的自步自训练),以促进回归任务更具判别性的可迁移表示。此外,通过学习统一的隐式神经函数来估计目标相对于其最近类别箱的相对方向和距离,旨在细化目标的分类预测,这能够获得对UDA回归器敏感的不一致特征缩放具有鲁棒性的性能。在三个具有挑战性的6D姿态估计任务的公开基准上的实验结果验证了我们方法的有效性,并在6D姿态估计的UDA任务中持续取得了优于现有技术的性能。