In this paper, we present a low-budget and high-authenticity bidirectional telepresence system, Tele-Aloha, targeting peer-to-peer communication scenarios. Compared to previous systems, Tele-Aloha utilizes only four sparse RGB cameras, one consumer-grade GPU, and one autostereoscopic screen to achieve high-resolution (2048x2048), real-time (30 fps), low-latency (less than 150ms) and robust distant communication. As the core of Tele-Aloha, we propose an efficient novel view synthesis algorithm for upper-body. Firstly, we design a cascaded disparity estimator for obtaining a robust geometry cue. Additionally a neural rasterizer via Gaussian Splatting is introduced to project latent features onto target view and to decode them into a reduced resolution. Further, given the high-quality captured data, we leverage weighted blending mechanism to refine the decoded image into the final resolution of 2K. Exploiting world-leading autostereoscopic display and low-latency iris tracking, users are able to experience a strong three-dimensional sense even without any wearable head-mounted display device. Altogether, our telepresence system demonstrates the sense of co-presence in real-life experiments, inspiring the next generation of communication.
翻译:本文提出了一种低成本高真实感的双向远程呈现系统Tele-Aloha,面向点对点通信场景。与现有系统相比,Tele-Aloha仅使用四个稀疏RGB摄像头、一块消费级GPU和一个自动立体显示屏,即可实现高分辨率(2048×2048)、实时(30帧/秒)、低延迟(小于150毫秒)且鲁棒的远程通信。作为系统的核心,我们提出了一种针对上半身的高效新颖视图合成算法。首先,我们设计了级联视差估计器以获取鲁棒的几何线索;其次,引入基于高斯泼溅的神经光栅化器,将潜在特征投影至目标视图并解码为降分辨率图像;进一步,基于高质量采集数据,采用加权混合机制将解码图像优化至2K最终分辨率。借助世界领先的自动立体显示技术与低延迟虹膜追踪,用户无需佩戴任何头戴式显示设备即可体验强烈的三维临场感。综合而言,我们的远程呈现系统在真实场景实验中展现了高度的共在感,为下一代通信技术发展提供了启示。