Augmented reality (AR) games, particularly those designed for head-mounted displays, have grown increasingly prevalent. However, most existing systems depend on pre-scanned, static environments and rely heavily on continuous tracking or marker-based solutions, which limit adaptability in dynamic physical spaces. This is particularly problematic for AR headsets and glasses, which typically follow the user's head movement and cannot maintain a fixed, stationary view of the scene. Moreover, continuous scene observation is neither power-efficient nor practical for wearable devices, given their limited battery and processing capabilities. A persistent challenge arises when multiple identical objects are present in the environment-standard object tracking pipelines often fail to maintain consistent identities without uninterrupted observation or external sensors. These limitations hinder fluid physical-virtual interactions, especially in dynamic or occluded scenes where continuous tracking is infeasible. To address this, we introduce a novel optimization-based framework for re-identifying identical objects in AR scenes using only one partial egocentric observation frame captured by a headset. We formulate the problem as a label assignment task solved via integer programming, augmented with a Voronoi diagram-based pruning strategy to improve computational efficiency. This method reduces computation time by 50% while preserving 91% accuracy in simulated experiments. Moreover, we evaluated our approach in quantitative synthetic and quantitative real-world experiments. We also conducted three qualitative real-world experiments to demonstrate the practical utility and generalizability for enabling dynamic, markerless object interaction in AR environments. Our video demo is available at https://youtu.be/RwptEfLtW1U.
翻译:增强现实(AR)游戏,特别是为头戴式显示器设计的游戏,已变得越来越普遍。然而,现有系统大多依赖于预扫描的静态环境,并严重依赖连续跟踪或基于标记的解决方案,这限制了其在动态物理空间中的适应性。这对于通常跟随用户头部运动、无法保持对场景固定静止视角的AR头显和眼镜而言尤为突出。此外,考虑到可穿戴设备有限的电池和处理能力,连续的场景观测既非高能效也非实用。当环境中存在多个相同物体时,一个持续的挑战便会出现——标准的物体跟踪流程若没有不间断的观测或外部传感器,往往难以维持一致的身份标识。这些限制阻碍了流畅的物理-虚拟交互,特别是在连续跟踪不可行的动态或被遮挡场景中。为解决此问题,我们提出了一种新颖的基于优化的框架,仅利用头显捕获的一帧第一人称局部观测来重新识别AR场景中的相同物体。我们将该问题表述为一个通过整数规划求解的标签分配任务,并辅以基于Voronoi图的剪枝策略以提高计算效率。该方法在模拟实验中,在保持91%准确率的同时,将计算时间减少了50%。此外,我们在定量合成实验和定量真实世界实验中评估了我们的方法。我们还进行了三项定性真实世界实验,以展示该方法在实现AR环境中动态、无标记物体交互方面的实用性和泛化能力。我们的视频演示可在 https://youtu.be/RwptEfLtW1U 查看。