Developing embodied AI for intelligent surgical systems requires safe, controllable environments for continual learning and evaluation. However, safety regulations and operational constraints in operating rooms (ORs) limit agents from freely perceiving and interacting in realistic settings. Digital twins provide high-fidelity, risk-free environments for exploration and training. How we may create dynamic digital representations of ORs that capture relevant spatial, visual, and behavioral complexity remains an open challenge. We introduce TwinOR, a real-to-sim infrastructure for constructing photorealistic and dynamic digital twins of ORs. The system reconstructs static geometry and continuously models human and equipment motion. The static and dynamic components are fused into an immersive 3D environment that supports controllable simulation and facilitates future embodied exploration. The proposed framework reconstructs complete OR geometry with centimeter-level accuracy while preserving dynamic interaction across surgical workflows. In our experiments, TwinOR synthesizes stereo and monocular RGB streams as well as depth observations for geometry understanding and visual localization tasks. Models such as FoundationStereo and ORB-SLAM3 evaluated on TwinOR-synthesized data achieve performance within their reported accuracy ranges on real-world indoor datasets, demonstrating that TwinOR provides sensor-level realism sufficient for emulating real-world perception and localization challenge. By establishing a perception-grounded real-to-sim pipeline, TwinOR enables the automatic construction of dynamic, photorealistic digital twins of ORs. As a safe and scalable environment for experimentation, TwinOR opens new opportunities for translating embodied intelligence from simulation to real-world clinical environments.
翻译:为开发智能手术系统的具身人工智能,需要安全可控的环境进行持续学习与评估。然而,手术室中的安全规范与操作约束限制了智能体在真实场景中自由感知与交互。数字孪生可提供高保真、无风险的探索与训练环境,如何构建捕捉手术室空间、视觉及行为复杂性的动态数字表征仍是一项开放挑战。我们提出TwinOR——一种实现手术室逼真动态数字孪生的真实到仿真基础设施。该系统可重建静态几何结构,并连续建模人员与设备运动。静态与动态组件融合为沉浸式三维环境,支持可控仿真并推动未来具身探索。所提框架能以厘米级精度重建完整手术室几何结构,同时保留手术流程中的动态交互。实验中,TwinOR合成立体与单目RGB流及深度观测数据,用于几何理解与视觉定位任务。基于TwinOR合成数据评估的FoundationStereo与ORB-SLAM3等模型,其性能在真实室内数据集报告精度范围内,证明TwinOR提供的传感器级真实感足以模拟真实感知与定位挑战。通过建立基于感知的真实到仿真流水线,TwinOR可实现手术室动态逼真数字孪生的自动构建。作为安全可扩展的实验环境,TwinOR为将具身智能从仿真迁移至真实临床环境开辟了新的可能。