Improved distinct bone segmentation in upper-body CT through multi-resolution networks

Purpose: Automated distinct bone segmentation from CT scans is widely used in planning and navigation workflows. U-Net variants are known to provide excellent results in supervised semantic segmentation. However, in distinct bone segmentation from upper body CTs a large field of view and a computationally taxing 3D architecture are required. This leads to low-resolution results lacking detail or localisation errors due to missing spatial context when using high-resolution inputs. Methods: We propose to solve this problem by using end-to-end trainable segmentation networks that combine several 3D U-Nets working at different resolutions. Our approach, which extends and generalizes HookNet and MRN, captures spatial information at a lower resolution and skips the encoded information to the target network, which operates on smaller high-resolution inputs. We evaluated our proposed architecture against single resolution networks and performed an ablation study on information concatenation and the number of context networks. Results: Our proposed best network achieves a median DSC of 0.86 taken over all 125 segmented bone classes and reduces the confusion among similar-looking bones in different locations. These results outperform our previously published 3D U-Net baseline results on the task and distinct-bone segmentation results reported by other groups. Conclusion: The presented multi-resolution 3D U-Nets address current shortcomings in bone segmentation from upper-body CT scans by allowing for capturing a larger field of view while avoiding the cubic growth of the input pixels and intermediate computations that quickly outgrow the computational capacities in 3D. The approach thus improves the accuracy and efficiency of distinct bone segmentation from upper-body CT.

翻译：目的：基于CT扫描的自动化骨骼精细分割广泛应用于手术规划与导航流程。U-Net变体在监督式语义分割中表现出色，但在上半身CT骨骼精细分割中，需要大视场与高计算量的三维架构，导致低分辨率结果缺乏细节，或高分辨率输入因空间上下文缺失而产生定位误差。方法：我们提出通过结合多个不同分辨率的三维U-Net构建端到端可训练分割网络解决该问题。本方法扩展并泛化了HookNet与MRN，在低分辨率下捕获空间信息，并将编码信息跳传至处理小尺寸高分辨率输入的目标网络。我们评估了所提架构与单分辨率网络的性能，并对信息拼接与上下文网络数量进行了消融研究。结果：所提出的最佳网络覆盖全部125个骨类别的中位数Dice相似系数达到0.86，显著减少了不同位置相似骨骼的混淆。该结果优于本团队先前发布的三维U-Net基线及其他团队报告的骨骼精细分割结果。结论：本文提出的多分辨率三维U-Net通过捕获更大视场，同时避免三维中因输入像素与中间计算量三次方增长而超出计算容量的限制，解决了当前上半身CT骨骼分割的缺陷，从而提升了骨骼精细分割的准确性与效率。