Accurate human localization is crucial for various applications, especially in the Metaverse era. Existing high precision solutions rely on expensive, tag-dependent hardware, while vision-based methods offer a cheaper, tag-free alternative. However, current vision solutions based on stereo vision face limitations due to rigid perspective transformation principles and error propagation in multi-stage SVD solvers. These solutions also require multiple high-resolution cameras with strict setup constraints. To address these limitations, we propose a probabilistic approach that considers all points on the human body as observations generated by a distribution centered around the body's geometric center. This enables us to improve sampling significantly, increasing the number of samples for each point of interest from hundreds to billions. By modeling the relation between the means of the distributions of world coordinates and pixel coordinates, leveraging the Central Limit Theorem, we ensure normality and facilitate the learning process. Experimental results demonstrate human localization accuracy of 95% within a 0.3m range and nearly 100% accuracy within a 0.5m range, achieved at a low cost of only 10 USD using two web cameras with a resolution of 640x480 pixels.
翻译:精确的人体定位对于各类应用至关重要,在元宇宙时代尤为如此。现有高精度解决方案依赖于昂贵且需标签的硬件,而基于视觉的方法提供了一种更廉价、无需标签的替代方案。然而,当前基于立体视觉的解决方案因刚性透视变换原理及多阶段SVD求解器中的误差传播而面临局限。这些方案还需多台高分辨率相机并受严格的设置约束。为应对这些局限,我们提出一种概率方法,将人体所有点视为以人体几何中心为分布的观测值生成。这使我们能显著改进采样,将每个关注点的样本数量从数百提升至数十亿。通过建模世界坐标分布均值与像素坐标分布均值之间的关系,并利用中心极限定理,我们确保了正态性并促进了学习过程。实验结果表明,使用两台分辨率640x480像素的网络摄像头,仅以10美元的低成本,实现了0.3米范围内95%及0.5米范围内近100%的人体定位精度。