Cochlear Implant (CI) procedures involve performing an invasive mastoidectomy to insert an electrode array into the cochlea. In this paper, we introduce a novel pipeline that is capable of generating synthetic multi-view videos from a single CI microscope image. In our approach, we use a patient's pre-operative CT scan to predict the post-mastoidectomy surface using a method designed for this purpose. We manually align the surface with a selected microscope frame to obtain an accurate initial pose of the reconstructed CT mesh relative to the microscope. We then perform UV projection to transfer the colors from the frame to surface textures. Novel views of the textured surface can be used to generate a large dataset of synthetic frames with ground truth poses. We evaluated the quality of synthetic views rendered using Pytorch3D and PyVista. We found both rendering engines lead to similarly high-quality synthetic novel-view frames compared to ground truth with a structural similarity index for both methods averaging about 0.86. A large dataset of novel views with known poses is critical for ongoing training of a method to automatically estimate microscope pose for 2D to 3D registration with the pre-operative CT to facilitate augmented reality surgery. This dataset will empower various downstream tasks, such as integrating Augmented Reality (AR) in the OR, tracking surgical tools, and supporting other video analysis studies.
翻译:人工耳蜗植入手术需要通过侵入性乳突切除术将电极阵列插入耳蜗。本文提出一种创新流程,能够从单张人工耳蜗显微图像生成合成多视角视频。该方法利用患者术前CT扫描,通过专门设计的算法预测乳突切除后的表面形态。通过手动将预测表面与选定显微图像帧对齐,可获得重建CT网格相对于显微镜的精确初始位姿。随后采用UV投影技术将图像帧色彩映射至表面纹理。通过对纹理化表面进行新视角渲染,可生成包含真实位姿标注的大规模合成图像数据集。我们评估了使用Pytorch3D和PyVista渲染的合成视角质量,发现两种渲染引擎均能生成与真实图像高度吻合的新视角帧,两种方法的平均结构相似性指数约为0.86。具有已知位姿的大规模新视角数据集对于持续训练显微镜位姿自动估计算法至关重要,该算法可实现术中二维图像与术前三维CT的配准,从而推动增强现实手术应用。该数据集将赋能多种下游任务,包括手术室增强现实集成、手术器械追踪以及其他视频分析研究。