通过第一人称局部观测帧追踪相同物体以增强增强现实游戏中的物理-虚拟交互 (Enriching physical-virtual interaction in AR gaming by tracking identical objects via an egocentric partial observation frame)

Augmented reality (AR) games, particularly those designed for head-mounted displays, have grown increasingly prevalent. However, most existing systems depend on pre-scanned, static environments and rely heavily on continuous tracking or marker-based solutions, which limit adaptability in dynamic physical spaces. This is particularly problematic for AR headsets and glasses, which typically follow the user's head movement and cannot maintain a fixed, stationary view of the scene. Moreover, continuous scene observation is neither power-efficient nor practical for wearable devices, given their limited battery and processing capabilities. A persistent challenge arises when multiple identical objects are present in the environment-standard object tracking pipelines often fail to maintain consistent identities without uninterrupted observation or external sensors. These limitations hinder fluid physical-virtual interactions, especially in dynamic or occluded scenes where continuous tracking is infeasible. To address this, we introduce a novel optimization-based framework for re-identifying identical objects in AR scenes using only one partial egocentric observation frame captured by a headset. We formulate the problem as a label assignment task solved via integer programming, augmented with a Voronoi diagram-based pruning strategy to improve computational efficiency. This method reduces computation time by 50% while preserving 91% accuracy in simulated experiments. Moreover, we evaluated our approach in quantitative synthetic and quantitative real-world experiments. We also conducted three qualitative real-world experiments to demonstrate the practical utility and generalizability for enabling dynamic, markerless object interaction in AR environments. Our video demo is available at https://youtu.be/RwptEfLtW1U.

翻译：增强现实（AR）游戏，特别是为头戴式显示器设计的游戏，已变得越来越普遍。然而，现有系统大多依赖于预扫描的静态环境，并严重依赖连续跟踪或基于标记的解决方案，这限制了其在动态物理空间中的适应性。这对于通常跟随用户头部运动、无法保持对场景固定静止视角的AR头显和眼镜而言尤为突出。此外，考虑到可穿戴设备有限的电池和处理能力，连续的场景观测既非高能效也非实用。当环境中存在多个相同物体时，一个持续的挑战便会出现——标准的物体跟踪流程若没有不间断的观测或外部传感器，往往难以维持一致的身份标识。这些限制阻碍了流畅的物理-虚拟交互，特别是在连续跟踪不可行的动态或被遮挡场景中。为解决此问题，我们提出了一种新颖的基于优化的框架，仅利用头显捕获的一帧第一人称局部观测来重新识别AR场景中的相同物体。我们将该问题表述为一个通过整数规划求解的标签分配任务，并辅以基于Voronoi图的剪枝策略以提高计算效率。该方法在模拟实验中，在保持91%准确率的同时，将计算时间减少了50%。此外，我们在定量合成实验和定量真实世界实验中评估了我们的方法。我们还进行了三项定性真实世界实验，以展示该方法在实现AR环境中动态、无标记物体交互方面的实用性和泛化能力。我们的视频演示可在 https://youtu.be/RwptEfLtW1U 查看。