This paper presents a comprehensive workflow for generating and validating a synthetic dataset designed for robotic surgery instrument segmentation. A 3D reconstruction of the Da Vinci robotic arms was refined and animated in Autodesk Maya through a fully automated Python-based pipeline capable of producing photorealistic, labeled video sequences. Each scene integrates randomized motion patterns, lighting variations, and synthetic blood textures to mimic intraoperative variability while preserving pixel-accurate ground truth masks. To validate the realism and effectiveness of the generated data, several segmentation models were trained under controlled ratios of real and synthetic data. Results demonstrate that a balanced composition of real and synthetic samples significantly improves model generalization compared to training on real data only, while excessive reliance on synthetic data introduces a measurable domain shift. The proposed framework provides a reproducible and scalable tool for surgical computer vision, supporting future research in data augmentation, domain adaptation, and simulation-based pretraining for robotic-assisted surgery. Data and code are available at https://github.com/EIDOSLAB/Sintetic-dataset-DaVinci.
翻译:本文提出了一套用于生成和验证机器人手术器械分割合成数据集的完整工作流程。通过对达芬奇机器人手臂进行三维重建,并利用基于Python的全自动流程在Autodesk Maya中进行精细化与动画处理,该流程能够生成具有照片级真实感的标注视频序列。每个场景融合了随机运动模式、光照变化及合成血液纹理,以模拟术中可变性,同时保持像素级精确的真实标注掩码。为验证生成数据的真实性与有效性,研究在真实数据与合成数据的受控比例下训练了多种分割模型。结果表明,与仅使用真实数据训练相比,真实样本与合成样本的均衡组合能显著提升模型的泛化能力,而过度依赖合成数据则会导致可测量的域偏移。所提出的框架为手术计算机视觉提供了一个可复现且可扩展的工具,支持未来在数据增强、域适应以及机器人辅助手术的仿真预训练等方面的研究。数据与代码公开于 https://github.com/EIDOSLAB/Sintetic-dataset-DaVinci。