GRADE: Generating Realistic Animated Dynamic Environments for Robotics Research

Simulation engines like Gazebo, Unity and Webots are widely adopted in robotics. However, they lack either full simulation control, ROS integration, realistic physics, or photorealism. Recently, synthetic data generation and realistic rendering advanced tasks like target tracking and human pose estimation. However, when focusing on vision applications, there is usually a lack of information like sensor measurements (e.g. IMU, LiDAR, joint state), or time continuity. On the other hand, simulations for most robotics applications are obtained in (semi)static environments, with specific sensor settings and low visual fidelity. In this work, we present a solution to these issues with a fully customizable framework for generating realistic animated dynamic environments (GRADE) for robotics research. The data produced can be post-processed, e.g. to add noise, and easily expanded with new information using the tools that we provide. To demonstrate GRADE, we use it to generate an indoor dynamic environment dataset and then compare different SLAM algorithms on the produced sequences. By doing that, we show how current research over-relies on well-known benchmarks and fails to generalize. Furthermore, our tests with YOLO and Mask R-CNN provide evidence that our data can improve training performance and generalize to real sequences. Finally, we show GRADE's flexibility by using it for indoor active SLAM, with diverse environment sources, and in a multi-robot scenario. In doing that, we employ different control, asset placement, and simulation techniques. The code, results, implementation details, and generated data are provided as open-source. The main project page is https://eliabntt.github.io/grade-rr while the accompanying video can be found at https://youtu.be/cmywCSD-9TU.

翻译：Gazebo、Unity及Webots等仿真引擎在机器人领域被广泛采用，但它们在完整仿真控制、ROS集成、物理真实感或照片级渲染能力方面仍存在不足。近期，合成数据生成与逼真渲染技术推动了目标跟踪和人体姿态估计等任务的发展，然而在视觉应用中，这类方法通常缺乏传感器测量数据（如IMU、激光雷达、关节状态）或时间连续性信息。另一方面，多数机器人应用仿真往往在（半）静态环境中进行，受限于特定传感器配置和低视觉保真度。针对这些问题，本文提出一种完全可定制的逼真动态环境生成框架（GRADE），专为机器人研究设计。该框架生成的数据可进行后处理（如添加噪声），并可通过我们提供的工具便捷地扩展新信息。为验证GRADE性能，我们利用其生成室内动态环境数据集，并在生成的序列上对比不同SLAM算法，从而揭示当前研究过度依赖经典基准测试而缺乏泛化能力的现状。此外，基于YOLO与Mask R-CNN的测试表明，我们的数据能提升训练性能并泛化至真实序列。最终，我们通过将GRADE应用于室内主动SLAM（采用多样化环境源和多机器人场景），展示了其灵活性。在此过程中，我们运用了不同控制策略、资产配置方案与仿真技术。相关代码、实验结果、实现细节及生成数据均以开源形式提供。项目主页为https://eliabntt.github.io/grade-rr，配套视频见https://youtu.be/cmywCSD-9TU。