We introduce the Aria Digital Twin (ADT) - an egocentric dataset captured using Aria glasses with extensive object, environment, and human level ground truth. This ADT release contains 200 sequences of real-world activities conducted by Aria wearers in two real indoor scenes with 398 object instances (324 stationary and 74 dynamic). Each sequence consists of: a) raw data of two monochrome camera streams, one RGB camera stream, two IMU streams; b) complete sensor calibration; c) ground truth data including continuous 6-degree-of-freedom (6DoF) poses of the Aria devices, object 6DoF poses, 3D eye gaze vectors, 3D human poses, 2D image segmentations, image depth maps; and d) photo-realistic synthetic renderings. To the best of our knowledge, there is no existing egocentric dataset with a level of accuracy, photo-realism and comprehensiveness comparable to ADT. By contributing ADT to the research community, our mission is to set a new standard for evaluation in the egocentric machine perception domain, which includes very challenging research problems such as 3D object detection and tracking, scene reconstruction and understanding, sim-to-real learning, human pose prediction - while also inspiring new machine perception tasks for augmented reality (AR) applications. To kick start exploration of the ADT research use cases, we evaluated several existing state-of-the-art methods for object detection, segmentation and image translation tasks that demonstrate the usefulness of ADT as a benchmarking dataset.
翻译:我们推出Aria数字孪生(ADT)——一个使用Aria眼镜采集的自我中心数据集,具备对象、环境和人体层次的丰富真值数据。本次ADT版本包含200个序列,由Aria佩戴者在两个真实室内场景中执行现实活动时采集,涵盖398个对象实例(324个静态与74个动态)。每个序列包括:a) 两个单色摄像头流、一个RGB摄像头流及两个IMU流的原始数据;b) 完整的传感器标定参数;c) 连续六自由度(6DoF)Aria设备姿态、对象6DoF姿态、三维眼动注视向量、三维人体姿态、二维图像分割及图像深度图等真值数据;d) 逼真的合成渲染图像。据我们所知,现有自我中心数据集在精度、逼真度和全面性上均无法与ADT比肩。通过向学术界贡献ADT,我们旨在为自我中心机器感知领域的评估设立新标杆——该领域涵盖三维目标检测与跟踪、场景重建与理解、模拟到现实学习、人体姿态预测等极具挑战性的研究课题,同时为增强现实(AR)应用启发新的机器感知任务。为探索ADT研究用例的切入点,我们评估了多项现有顶尖方法在目标检测、分割及图像翻译任务中的表现,证实了ADT作为基准数据集的有效性。